From Jeremy’s MySQL wish list (emphasis mine):
e. Fix subquery optimization
Subqueries have been available in the same broken state for over 4 years now. Why are subqueries in an IN (…) clause still optimized in an incredibly stupid and slow way, such as to make them useless? We have customers tripping over this all the time. MySQL can check off “subqueries” on the to-do list, since they do in fact work. The SQL standard doesn’t say anything about not sucking.
Reading this, I had a mental image of the next SQL spec having a line in there about sub-queries that don’t suck
Greg’s Implementing a queue in SQL (Postgres version) is an interesting exercise. The goal was to implement a simple queue that can be managed via SQL. The example is a simple first in, first out (FIFO) queue, with a limit of 5 items.
Two methods are demonstrated to accomplish this, the first one makes use of the PostgreSQL Rules and is very short. The second one is a little bit longer and uses a PL/pgSQL function as a trigger. One advantage of the second method is that it lets you know how many rows in your queue were recycled. Both methods require only one table.
There’s also a queue in MySQL article as well.
Is it practical to implement FIFO queues in a DBMS? Maybe not, but it is a good example of thinking outside the box.
Greg is a very intelligent guy, even if he does use emacs
Four days after commenting on Oracle’s purchasing rumors of three companies, one of them has been confirmed. Sleepycat was purchased by Oracle today. The official announcement has virtually no details other than to confirm the transaction.
One can’t help but think that Oracle is giving MySQL every reason to just give up and be purchased by Oracle. As an outsider it sure looks like MySQL is the main target to either be bought or crushed by Oracle. One of the comments on Jeremy’s post about the Sleepycat purchase even suggests that MySQL was able to secure an additional $18.5 million in funding because of the potential return when/if Oracle buys MySQL.
We’ll have to wait and see if the rumors about JBoss and Zend turn out to true.
(Warning, I tend to use UUID instead of GUID. For purposes of this post consider them interchangeable.)
I’ve already mentioned the announcement of FeedLounge making the move from MySQL to PostgreSQL last month, but the discussion is still on going. If you haven’t yet, go read all of the comments on the announcement. This has to be some of the most productive and level headed (no flames yet) set of blog comments I’ve seen in a long time. For those of you who are coming late to this story let me bring you up to speed:
FeedLounge started with MySQL as their database, with the MyISAM table types. They ran into problems and so moved to the InnoDB table type in MySQL. Still not completely happy they did more research and decided to try out PostgreSQL for their database needs. Their tests showed that MySQL InnoDB database was 34GB, in PostgreSQL it was 9.6GB. Restores in MySQL took more than 24 hours, in PostgreSQL it was less than 5 hours. These numbers resulted in better performance and reduced time.
It has been more than two weeks since the initial announcement and there are still new comments being added. As of this morning there are
34 35 comments. Some of the comments have come from people who are very knowledgable about MySQL and the size of InnoDB indexes became the focus for why InnoDB was so much bigger than PostgreSQL. Using InnoDB with UUIDs as the primary key was identified as a problem, especially in the area of performance.
Things continued to get more interesting when Heikki Tuuri (the creator of InnoDB) left a comment confirming the size problems in InnoDB when using a UUID as a primary key. It seems that using a UUID in this way causes a couple of problems, the first being that InnoDB wants to keep things in index order, which is great for indexes on sequential numbers but bad for seemingly random data (PostgreSQL can do the same thing with cluster). The next problem is that secondary indexes in the same table get copies of the primary key index to making finding things faster. This, at least in part, was what caused the size difference.
Heikki’s solution (and others have mentioned this) is to use an auto_increment field as the primary key and create a unique secondary index on the UUID field. This would make the primary key index size much smaller and still allow InnoDB to do index lookups on the UUID field. That would seem to solve the problem, but there hasn’t been any feedback yet to confirm if it fixed all of the differences between InnoDB and PostgreSQL.
There is something about having to add an extra field to my table to make the database perform better that doesn’t seem quite right though. I don’t have anything concrete that I can point to showing that this isn’t a good idea, but it doesn’t feel right. For those who are running into these same types of problems though, it might be something to try, at least until you get a chance to try out PostgreSQL.
This could turn into an interesting case study, FeedLounge has switched databases, from MySQL to PostgreSQL. The two main factors they listed were database size (34Gig for MySQL/InnoDB vs. 9.6Gig for PostgreSQL) and restoration time (24+ hours to restore database in MySQL vs. less than 5 hours for PostgreSQL).
I’m surprised to here about the huge difference in database size between MySQL (w/InnoDB) and PostgreSQL. Perhaps the different indexing techniques are responsible for this, I’m not sure. For large databases the ability to cut your storage requirements by more than 1/3 is huge. My gut feeling though is that this would not be the norm though. More testing is definitely in order to make this comparison more meaningful.
The huge difference in database restoration time also strikes me as a bit high. Certainly another area where further tests would be interesting to see if this pattern holds true in general. Obviously being able to restore your database in less than 1/3 the time is a good thing.
The real kicker in all this though is the chance to make use of the additional features found in PostgreSQL (they point this out in their write up). I hope that Alex continues to write about how this transition progresses.
The chapter on SQL Injection from the book Guide to PHP Security by Ilia Alshanetsky is available in PDF format. This can be a rather fun topic so I downloaded a copy of the PDF and started reading. I’d read less two pages before I was ready to toss this chapter out the window.
The first example showed a very simple inject that terminated the original query with a ; and appended another query that deleted some data. The example used MySQL functions and the author noted the following after explaining the injection:
Fortunately, if you use MySQL, the mysql_query() function does not permit query stacking, or executing multiple queries in a single function call. If you try to stack queries, the call fails.
However, other PHP database extensions, such as SQLite and PostgreSQL, happily perform stacked queries, executing all of the queries provided in one string and creating a serious security problem.
The emphasis is mine. I couldn’t believe that the author was basically claiming that MySQL is more secure than SQLite and PostgreSQL because they allowed you to run multiple SQL statements in one string. Then something else came to mind, didn’t MySQL add this feature not too long ago? So I went hunting around and came up with the C API Handling of Multiple Query Execution in MySQL, which was introduced in version 4.1. Although still correct, the mysql_query() function in PHP doesn’t allow for multiple statements, the underlying C libraries do. I’d expect that at somepoint in the future PHP will catch up and support that feature for MySQL 4.1 and above.
But back to my first point, being able to run multiple queries in a single string is not a security threat. Not proplerly filtering data before using in an SQL statement is. If you allow unfiltered data to be injected into your SQL statement then you’ve got problems, even if you database doesn’t support multiple statements in a single string.
I hope that this isn’t typical of the rest of the book.
Jeremy talks about Oracle buying Innobase, the folks behind InnoDB for MySQL. I don’t really have too much to add, I suspect most people are at least a little bit nervous at what Oracle might do to InnoDB development. Even if Oracle doesn’t do anything (one way or the other), this will certainly make people wonder about the future of InnoDB.
What attraction does Innobase have for Oracle? I doubt that it is their technology, even with everything that InnoDB has brought to MySQL, Oracle’s database does much more. I can’t image that share holders would like the idea of Oracle funding a competitor out of kindness. It is possible that they are mostly interested in the people/talent at Innobase, but if those folks start working on Oracle projects where does that leave InnoDB development?