Skype has been PostgreSQL as their back end database and have made available their database projects as part of the Skype Developer Zone. For background how Skype has made us of PostgreSQL check out the whitepaper.
Skype has made the following projects available:
- SkyTools (made up of Londiste, PgQ and WalMgr)
- Enhancements to PL/Python that are part of PostgreSQL 8.2
Nice to see Skype giving back to the PostgreSQL community.
Greg’s Implementing a queue in SQL (Postgres version) is an interesting exercise. The goal was to implement a simple queue that can be managed via SQL. The example is a simple first in, first out (FIFO) queue, with a limit of 5 items.
Two methods are demonstrated to accomplish this, the first one makes use of the PostgreSQL Rules and is very short. The second one is a little bit longer and uses a PL/pgSQL function as a trigger. One advantage of the second method is that it lets you know how many rows in your queue were recycled. Both methods require only one table.
There’s also a queue in MySQL article as well.
Is it practical to implement FIFO queues in a DBMS? Maybe not, but it is a good example of thinking outside the box.
Mangus Hagander has a detailed how to for replicating from MS SQL Server to PostgreSQL. It certainly isn’t perfect, but for those willing to go through the work this would be a very interesting way to add PostgreSQL to a SQL Server shop. I don’t plan on using this right now, but I’m definitely going to file this away as something with potential.
There are a lot of open source licenses out there. For me open source license usually implies GPL, LGPL and BSD/MIT, although there are over 50 on the OSI‘s list. I tend to be partial towards the BSD/MIT licenses over (L)GPL, I’ve mentioned this before. For the purposes of this article thought, I’m going to focus on BSD licensed software. Don’t take this as a slam against the GPL, I just wanted to focus on the largest amount of flexibility and ease for developers.
It really is amazing that today you can develop an entire application or service using BSD licensed software up and down the stack. This particular license allows you to modify code and it is up to you if you want to share it. Although it is always recommended to participate in the community, this license doesn’t make it a requirement if you want to distribute modified code (this may be a good or a bad thing depending on your point of view). Thus all types of applications and services can be built upon BSD licensed code, from top to bottom. If you’ve ever tried to figure out which license you have to buy from a company that offers more types of licenses than cars on the road you know that not having to go through that maze can be a great time saver.
So here is what I’m thinking of when I talk about the BSD licensed application stack. At the lowest level we need an operating system. I prefer FreeBSD for this, but there is no reason why NetBSD, OpenBSD or DragonFly BSD couldn’t be used instead. Any of these will provide a complete operating system and depending on your wants or needs you may find one fits you better than the others.
Now that we have an OS (FreeBSD), we’ll need some place to store data that our application or service will be using. SQL databases have grown to fit this need quite well. Because we are focusing on BSD licensed open source software one database really stands above the rest, PostgreSQL. Not only is it a perfect fit for our criteria, it is a great piece database software. PostgreSQL supports many features that users of commercial databases have come to expect (Views, Functions, Schemas, etc).
The way to deploy applications and or services today is on the web. Here again we are fortunate because the most commonly used web server is open source and BSD licensed. The Apache web server is flexible (mod_rewrite anyone?) and powerful.
Finally we’ll need an a programming license to get things done. This one piece of the stack is probably the most difficult to pin down. My pick though would have to be PHP, whose license is close to the BSD license. It is also targeted at for web apps, but I’ve used it for command line applications as well.
The BSD Licensed Application STack (BLAST) is about software that does its job well and has a license that is easy to understand and gives you the ability to get distributed changes to yourself. Activity in the community is optional, but encouraged. For me this means FreeBSD, PostgreSQL, Apache and PHP.
Many of you reading this will be jumping up and down that this is just a rehash of L.A.M.P.. On one level this is true, Linux, Apache, MySQL and PHP/Perl (L.A.M.P.) do satisfy one part of BLAST, open source software that gets the job done. Unfortunately the licensing for some of these products is difficult to understand and in some cases the same license is interpreted in different ways (yes MySQL I’m looking at you).
The components of BLAST may change over time, (perhaps another language besides PHP?) but the intent and abilities will be the same. Good open source code with ability to do what you want with it.
(Warning, I tend to use UUID instead of GUID. For purposes of this post consider them interchangeable.)
I’ve already mentioned the announcement of FeedLounge making the move from MySQL to PostgreSQL last month, but the discussion is still on going. If you haven’t yet, go read all of the comments on the announcement. This has to be some of the most productive and level headed (no flames yet) set of blog comments I’ve seen in a long time. For those of you who are coming late to this story let me bring you up to speed:
FeedLounge started with MySQL as their database, with the MyISAM table types. They ran into problems and so moved to the InnoDB table type in MySQL. Still not completely happy they did more research and decided to try out PostgreSQL for their database needs. Their tests showed that MySQL InnoDB database was 34GB, in PostgreSQL it was 9.6GB. Restores in MySQL took more than 24 hours, in PostgreSQL it was less than 5 hours. These numbers resulted in better performance and reduced time.
It has been more than two weeks since the initial announcement and there are still new comments being added. As of this morning there are
34 35 comments. Some of the comments have come from people who are very knowledgable about MySQL and the size of InnoDB indexes became the focus for why InnoDB was so much bigger than PostgreSQL. Using InnoDB with UUIDs as the primary key was identified as a problem, especially in the area of performance.
Things continued to get more interesting when Heikki Tuuri (the creator of InnoDB) left a comment confirming the size problems in InnoDB when using a UUID as a primary key. It seems that using a UUID in this way causes a couple of problems, the first being that InnoDB wants to keep things in index order, which is great for indexes on sequential numbers but bad for seemingly random data (PostgreSQL can do the same thing with cluster). The next problem is that secondary indexes in the same table get copies of the primary key index to making finding things faster. This, at least in part, was what caused the size difference.
Heikki’s solution (and others have mentioned this) is to use an auto_increment field as the primary key and create a unique secondary index on the UUID field. This would make the primary key index size much smaller and still allow InnoDB to do index lookups on the UUID field. That would seem to solve the problem, but there hasn’t been any feedback yet to confirm if it fixed all of the differences between InnoDB and PostgreSQL.
There is something about having to add an extra field to my table to make the database perform better that doesn’t seem quite right though. I don’t have anything concrete that I can point to showing that this isn’t a good idea, but it doesn’t feel right. For those who are running into these same types of problems though, it might be something to try, at least until you get a chance to try out PostgreSQL.
This could turn into an interesting case study, FeedLounge has switched databases, from MySQL to PostgreSQL. The two main factors they listed were database size (34Gig for MySQL/InnoDB vs. 9.6Gig for PostgreSQL) and restoration time (24+ hours to restore database in MySQL vs. less than 5 hours for PostgreSQL).
I’m surprised to here about the huge difference in database size between MySQL (w/InnoDB) and PostgreSQL. Perhaps the different indexing techniques are responsible for this, I’m not sure. For large databases the ability to cut your storage requirements by more than 1/3 is huge. My gut feeling though is that this would not be the norm though. More testing is definitely in order to make this comparison more meaningful.
The huge difference in database restoration time also strikes me as a bit high. Certainly another area where further tests would be interesting to see if this pattern holds true in general. Obviously being able to restore your database in less than 1/3 the time is a good thing.
The real kicker in all this though is the chance to make use of the additional features found in PostgreSQL (they point this out in their write up). I hope that Alex continues to write about how this transition progresses.
Good news just keeps coming. PostgreSQL 8.1 has been released. There are several changes in this new version, I suspect for many having pg_autovacuum tied in as part of the database server will be a big plus. Roles and Two Phase Commit are other nice features that will be very helpful. If you are into local GUI apps for database administration then having pgAdmin 1.4.0 included will be welcome, especially for Windows users. Overall this release seems like a solid step forward.
The chapter on SQL Injection from the book Guide to PHP Security by Ilia Alshanetsky is available in PDF format. This can be a rather fun topic so I downloaded a copy of the PDF and started reading. I’d read less two pages before I was ready to toss this chapter out the window.
The first example showed a very simple inject that terminated the original query with a ; and appended another query that deleted some data. The example used MySQL functions and the author noted the following after explaining the injection:
Fortunately, if you use MySQL, the mysql_query() function does not permit query stacking, or executing multiple queries in a single function call. If you try to stack queries, the call fails.
However, other PHP database extensions, such as SQLite and PostgreSQL, happily perform stacked queries, executing all of the queries provided in one string and creating a serious security problem.
The emphasis is mine. I couldn’t believe that the author was basically claiming that MySQL is more secure than SQLite and PostgreSQL because they allowed you to run multiple SQL statements in one string. Then something else came to mind, didn’t MySQL add this feature not too long ago? So I went hunting around and came up with the C API Handling of Multiple Query Execution in MySQL, which was introduced in version 4.1. Although still correct, the mysql_query() function in PHP doesn’t allow for multiple statements, the underlying C libraries do. I’d expect that at somepoint in the future PHP will catch up and support that feature for MySQL 4.1 and above.
But back to my first point, being able to run multiple queries in a single string is not a security threat. Not proplerly filtering data before using in an SQL statement is. If you allow unfiltered data to be injected into your SQL statement then you’ve got problems, even if you database doesn’t support multiple statements in a single string.
I hope that this isn’t typical of the rest of the book.
While I was scanning through the Managing Hierarchical Data in MySQL article at MySQL.com I started to wonder if someone had come up with a way to manage hierarchical data using geometric types. PostgreSQL supports geometric types like point, line, box, circle and polygon and several geometric functions and operators.
If you started out with one box (the universe) and then created new child boxes as needed then you should be able to accurately describe a hierarchy. You’d have to be able to shrink and grow each box (and all the boxes above it) as needed to make room for more or less child boxes. It would also be nice to have an easy to way move a branch of the hierarchy (along with everything below it) to some other position in the hierarchy.
It would take some work to sit down and come up with the functions and triggers that would make managing something like this as simple as possible. Anyone know where something like this has already been implemented?