Recently I mentioned a tool to analyze database queries, PQA (Practical Query Analysis). Originally this tool was designed to work with PostgreSQL and now has some support for MySQL. So I turned on query logging for my MySQL server that hosts my WordPress back end. Then I put together a little script to run PQA against the MySQL query log:
/home/joseph/bin/pqa.rb -file /var/db/mysql/query.log -top 5 -normalize -logtype mysql
Nothing fancy: told it where the log was, how many “top” queries I wanted, normalize the queries and that the this was for MySQL (instead of PostgreSQL). I don’t get a lot of traffic to my site, so I’ve let the log run for almost a week to get a sample size of more than 100,000 queries. Now that I’ve hit that here are the results:
######## Overall statistics
101332 queries (150 unique) parsed in 27.470525 seconds
######## Queries by type
SELECTs: 101191 (100%)
INSERTs: 35 (0%)
UPDATEs: 44 (0%)
DELETEs: 13 (0%)
######## Most frequent queries
16465 times: SELECT * FROM wp_users WHERE user_login = ”
14423 times: SELECT post_date, post_name FROM wp_posts WHERE ID = 0
5391 times: SELECT option_name, option_value FROM wp_options
5391 times: SELECT * FROM wp_users
5347 times: SELECT * FROM wp_categories WHERE 0=0
Before I get into the queries themselves, lets talk about what other info this provided. I’ve got 101,332 queries, of which there are only 150 unique queries. PQA took about 27 seconds to process all of this info. Far and away the most common type of query was SELECT, this shouldn’t come as any surprise. A few INSERTS (new blog entries, comments, trackbacks, etc) and a few UPDATES (editing blog entries) and a few DELETES. This isn’t quite normal because the DELETES were the result of a comment spam attack. Fortunately there were only 13 of them so this doesn’t really impact our results much.
Now let take a look at the top five most executed queries and see what we can learn.
#1 (16465/101332): SELECT * FROM wp_users WHERE user_login = ”
The wp_users table keeps information about the registered users of a WordPress install, in this case I’m the only one. What should strike you as odd is the WHERE clause in this query, it is looking for user details for a username that is empty. My gut feeling is that this query should never, ever, ever be run by WordPress. Unless your WordPress install allows for blank usernames then this query will always return an empty result set. Unfortunately it will have to look through the wp_users table first (hopefully it is looking at an index on user_login) to determine this. Being able to avoid more than 16% of the total queries should help with speed, especially on systems that are running several copies of WordPress.
#2 (14423/101332): SELECT post_date, post_name FROM wp_posts WHERE ID = 0
As you can probably guess, the wp_posts table stores WordPress blog entries. Unlike the #1 query this one looks like it might be reasonable, but once again that WHERE clause caught my eye and made me wonder. So I started looking through all of the rows in the wp_posts table for my WordPress install. None of them have an ID of 0. Just like the first query I suspect that this query should never need to be run because it will always return nothing. At this point I can’t say this with 100% certainty though because I don’t know enough about the WordPress internals. My gut feeling is again that this query could be done away with, removing another 14% of the total queries.
#3 (5391/101332): SELECT option_name, option_value FROM wp_options
This query doesn’t have a WHERE clause for me to pick on. The wp_options table is used to store all of the different options that can be set for a WordPress install. All this query does is fetch all of those options. I doubt this has any possibilities for removal or optimization.
#4 (5391/101332): SELECT * FROM wp_users
This should look familiar, the only difference between this query and #1 above is the lack of a WHERE clause. I’ve got mixed feelings about this query. On the one hand I’m suspect of most queries that simply do a SELECT *, especially without a WHERE clause to limit it. What holds me back here is that I don’t think this query could be changed without a lot of changes to the source code of WordPress. Unless you have a lot of users in your WordPress install there isn’t much (like none at all) optimization that can be done here.
#5 (5347/101332): SELECT * FROM wp_categories WHERE 0=0
A new table to look at, wp_categories. This is a small table that holds information about entry categories (big surprise eh?). Once again our friend the WHERE clause provides with a bad feeling. If you aren’t familiar with SQL, the clause 0=0 always evaluates to true. In the context of this query it is completely useless. Some of you might be saying, so what, how much of a penalty could something small be performance wise? I asked myself the same question, so I put together a simple benchmark that ran this query with and without the WHERE 0=0 clause. In my simple tests the addition of WHERE 0=0 on this query against my WordPress database added an additional 0.0001 seconds PER QUERY! So why am I getting so excited about this, after all in my case that means that I’m only loosing about 1 second every two weeks, give or take. Well you would be right, for me personally it won’t make a much of a difference, but for a system that hosts many WordPress installs it could add up very quickly. Also this is one of those small optimizations that can add up when thrown together with other little optimizations through out the system.
So what have we learned from the top five WordPress queries? Well, about 30% of the SELECTs could likely be done away with entirely (#1 and #2). Another %5 of the SELECTs can gain a slight speed improvement of about one ten-thousandths of a second (#5). I’ll definitely be using PQA to look at more queries on other projects in the future.
There are some things to keep in mind here. These are total queries run over about one week, not all of them are being run on the same page necessarily. It is entirely possible that each of these queries is being run without the other four being present for a given page. I don’t think this is likely, but certainly a possibility. Another data point that might be helpful is the number of page views turn the same time period. I decided it wasn’t worth worrying about that at this point, since my focus was simply of the queries themselves.
For those of you who are now looking at me thinking, hey ding-a-ling, WordPress is an open source project, why don’t you take this information and send it to the developers so that everyone can benefit from this. Well, I already did. Alright that isn’t really true, I sent them some similar reports about other queries. One example would be three queries with WHERE 1=1 in them. I even included a diff. You’ll notice that this was sent in back in June, more than 4 months ago and it is still open. I haven’t been able to get much attention to getting these sorts of changes into the WordPress source.
If you are interested in doing some additional research yourself into some of the SQL queries that are used in WordPress check out my entry on MySQL Queries In WordPress, which explains how to turn on displaying all the queries run at the bottom of each page.
UPDATE 10:40am 11 Dec 2004: It was suggested in a comment (see #4) that 1=1 is a code optimization that is faster than checking to see if variable is empty first. See comment #5 for my response. Bottom line: my simple tests show that 1=1 is always slower than if(empty($some_var)) and that 1=1 breaks simple scalability schemes (multiple front end web servers, one big db server).