1

Sun Buys MySQL

Posted on January 16th, 2008 / 1 Comment »
Tags: , , ,

The big news today is that Sun purchased MySQL: Sun Press Kit, Sun Announcement, MySQL Announcement and Jonathan Schwartz’s Blog Announcement.

Let’s get the obvious issues/questions out of the way first. The purchase price is $1 billion, $800 million cash, $200 million in options. What is this going to mean for Sun’s PostgreSQL development? How will this impact the relationship between Sun and Oracle? Historically Sun has been one of the big iron Oracle platforms. Oracle owns a vital piece of MySQL, InnoDB. And of course, what will this mean to the MySQL community?

Now some of my own thoughts on this deal. This seems like something of a desperate move for both companies. In the case of MySQL, they need to find a way to stay intact as a company and bring in lots of cash. Guess all that talk of an IPO goes out the window now. So the list of potential buyers becomes pretty small. It would have to be to a company with lots of hard core tech but without a major database platform of their own, which rules out Oracle, IBM and Microsoft. Google and Yahoo might have smart folks who could make it better, but getting into the customer support business isn’t something that they’d want to do. So how many other big tech companies are left that would be willing or able to pay the kind of money that MySQL was looking for? Like I said, small list.

Then there is the Sun side of this deal. Recently Sun has been pushing PostgreSQL, but there isn’t one single company behind PostgreSQL, making it much more difficult to control in the manner that Sun prefers. In this respect MySQL fits perfectly, although it is Open Source (mostly), there really isn’t much of a developer community for it. Basically all development happens at MySQL the company. This would allow Sun to have the same massive grip over development that it has had for its other open source products. I think another factor is that Sun desperately wants to avoid becoming ignored in the world of big name web companies. They’ve developed some great stuff (Dtrace, ZFS) and even open sourced their OS (Solaris), but have still have failed to get their foot in the door in any big way for tech startups. So if you can’t convince them to like you, buy someone they already do. And better to do it now while you still have the money to do so.

Sun has just bought their way into becoming every tech company’s best friend. Maybe now they’ll change their stock ticker symbol from JAVA to MYSQL.

Mangus Hagander has a detailed how to for replicating from MS SQL Server to PostgreSQL. It certainly isn’t perfect, but for those willing to go through the work this would be a very interesting way to add PostgreSQL to a SQL Server shop. I don’t plan on using this right now, but I’m definitely going to file this away as something with potential.

1

Source Control for Databases

Posted on February 8th, 2006 / 1 Comment »
Tags: ,

Josh Berkus asks a great question, what options are there for source control management (SCM) of databases? He focuses on three specific issues:

  • Database Diffs are DDL
  • Some Data Gets Copied, Some Doesn’t
  • A Database SCM Tool Should Control Source and Update the Database in One Action

I’d love to be able to get a SQL based diff for database schemas for different versions of an application. A tool like that would be a huge step by itself. So what does it take to tokenize SQL syntax? Perhaps CPAN’s SQL::Statement would good enough. It seems to understand a fair bit of SQL syntax. Additional work would have to be done get a set of ALTER TABLE commands to update the schema.

Such a tool will also fall prey to the issue support multiple databases. The two obvious targets to start with would be PostgreSQL and MySQL.

(Warning, I tend to use UUID instead of GUID. For purposes of this post consider them interchangeable.)

I’ve already mentioned the announcement of FeedLounge making the move from MySQL to PostgreSQL last month, but the discussion is still on going. If you haven’t yet, go read all of the comments on the announcement. This has to be some of the most productive and level headed (no flames yet) set of blog comments I’ve seen in a long time. For those of you who are coming late to this story let me bring you up to speed:

FeedLounge started with MySQL as their database, with the MyISAM table types. They ran into problems and so moved to the InnoDB table type in MySQL. Still not completely happy they did more research and decided to try out PostgreSQL for their database needs. Their tests showed that MySQL InnoDB database was 34GB, in PostgreSQL it was 9.6GB. Restores in MySQL took more than 24 hours, in PostgreSQL it was less than 5 hours. These numbers resulted in better performance and reduced time.

It has been more than two weeks since the initial announcement and there are still new comments being added. As of this morning there are 34 35 comments. Some of the comments have come from people who are very knowledgable about MySQL and the size of InnoDB indexes became the focus for why InnoDB was so much bigger than PostgreSQL. Using InnoDB with UUIDs as the primary key was identified as a problem, especially in the area of performance.

Things continued to get more interesting when Heikki Tuuri (the creator of InnoDB) left a comment confirming the size problems in InnoDB when using a UUID as a primary key. It seems that using a UUID in this way causes a couple of problems, the first being that InnoDB wants to keep things in index order, which is great for indexes on sequential numbers but bad for seemingly random data (PostgreSQL can do the same thing with cluster). The next problem is that secondary indexes in the same table get copies of the primary key index to making finding things faster. This, at least in part, was what caused the size difference.

Heikki’s solution (and others have mentioned this) is to use an auto_increment field as the primary key and create a unique secondary index on the UUID field. This would make the primary key index size much smaller and still allow InnoDB to do index lookups on the UUID field. That would seem to solve the problem, but there hasn’t been any feedback yet to confirm if it fixed all of the differences between InnoDB and PostgreSQL.

There is something about having to add an extra field to my table to make the database perform better that doesn’t seem quite right though. I don’t have anything concrete that I can point to showing that this isn’t a good idea, but it doesn’t feel right. For those who are running into these same types of problems though, it might be something to try, at least until you get a chance to try out PostgreSQL.

This could turn into an interesting case study, FeedLounge has switched databases, from MySQL to PostgreSQL. The two main factors they listed were database size (34Gig for MySQL/InnoDB vs. 9.6Gig for PostgreSQL) and restoration time (24+ hours to restore database in MySQL vs. less than 5 hours for PostgreSQL).

I’m surprised to here about the huge difference in database size between MySQL (w/InnoDB) and PostgreSQL. Perhaps the different indexing techniques are responsible for this, I’m not sure. For large databases the ability to cut your storage requirements by more than 1/3 is huge. My gut feeling though is that this would not be the norm though. More testing is definitely in order to make this comparison more meaningful.

The huge difference in database restoration time also strikes me as a bit high. Certainly another area where further tests would be interesting to see if this pattern holds true in general. Obviously being able to restore your database in less than 1/3 the time is a good thing.

The real kicker in all this though is the chance to make use of the additional features found in PostgreSQL (they point this out in their write up). I hope that Alex continues to write about how this transition progresses.

0

PostgreSQL 8.1

Posted on November 8th, 2005 / No Comments »
Tags: ,

Good news just keeps coming. PostgreSQL 8.1 has been released. There are several changes in this new version, I suspect for many having pg_autovacuum tied in as part of the database server will be a big plus. Roles and Two Phase Commit are other nice features that will be very helpful. If you are into local GUI apps for database administration then having pgAdmin 1.4.0 included will be welcome, especially for Windows users. Overall this release seems like a solid step forward.

Now we just need PHP 5.1, Apache 2.1 to be released and we’ll have a completely updated stack when combined with FreeBSD 6.0 and PostgreSQL 8.1.

I’m working on a project that includes the need to store a fair amount of raw text. I’m using PostgreSQL to store this text, with the raw data being stored in a TEXT data type field. When it came time to decide which fields to index I included this field to see what would happen. It didn’t give me an error when creating the index, so I thought it would be fine. And it was, for awhile.

The first few dozen inserts went fine, the raw text for these weren’t very big. Later on the inserts start failing and PostgreSQL was logging the errors as:

postgres[6289]: [1-1] ERROR: index row size 5364 exceeds btree maximum, 2713

When inserting larger chunks of data the index update failed because it was too large. I removed the index on the raw text column and errors went away. So the lesson I learned here is that the B-tree index in PostgreSQL has a maximum size of 2713 bytes. Keep this in mind when you are going to index large VARCHAR or TEXT fields.

If you really need to index large TEXT fields then you’ll have to look at full text indexing with Tsearch2.

Ads