SQL Server 2000: Maximum Row Size 8060 Bytes

I’ve been reasonably happy with SQL Server 2000, especially after I got it working with PHP from FreeBSD. Today though, I ran into something that I was surprised to see in SQL Server. There is a maximum row size of 8060 bytes. I was disappointed that Microsoft’s current production database system would still have this kind of limitation.

According to Chris Hedgate this is sort of fixed in SQL Server 2005. You’ll be able to have longer row sizes if they contain variable length fields which don’t exceed the 8060 byte limit. So a fixed length field greater than 8060 bytes will still be an issue. This is a partial fix at best.

I’m going to invoke the Scobleizer (Robert Scoble) and ask that he point this out the SQL Server development team. It’s 2005, if PostgreSQL can have a maximum row size of 1.6 TB and MySQL has a maximum of 65,534 bytes then surely Microsoft can throw a few million at SQL Server and get it caught up to one or both of these competing products.

Hierarchical Data Using Geometric Types

While I was scanning through the Managing Hierarchical Data in MySQL article at MySQL.com I started to wonder if someone had come up with a way to manage hierarchical data using geometric types. PostgreSQL supports geometric types like point, line, box, circle and polygon and several geometric functions and operators.

If you started out with one box (the universe) and then created new child boxes as needed then you should be able to accurately describe a hierarchy. You’d have to be able to shrink and grow each box (and all the boxes above it) as needed to make room for more or less child boxes. It would also be nice to have an easy to way move a branch of the hierarchy (along with everything below it) to some other position in the hierarchy.

It would take some work to sit down and come up with the functions and triggers that would make managing something like this as simple as possible. Anyone know where something like this has already been implemented?

Technorati, Expanding A Broken Service?

I wasn’t really surprised when I read Kottke’s farewell to Technorati last week. I’ve been on the edge of giving up too, but I’m still hoping that they’ll be able to turn things around. I get errors of one kind or another when trying to use Technorati more than 50% of the time I try to use their site. Unfortunately it has only been getting worse.

So when I read todays announcement about Technorati and Newsweek I was upset at first. It is great and all for Newsweek to be bringing in blog commentary on the articles, this is a big step for them. But I was ticked that Technorati, who can’t even get their site to work half of the time, is expanding their services to others instead of making the darn thing work in the first place.

Then it hit me, perhaps Technorati is getting a boat load of money to provide this service to Newsweek. I mean a lot of money, in multiple shipping containers. Hopefully this is the case, because they need some way to make their site work and I’m hoping that a metric ton of money will make that possible. Otherwise they are going to sink futher down the drain and users will move to services that don’t return a page bragging about their site being broken because they are recieving too much traffic.

UPDATE 1:05pm 25 Aug 2005: One of the reasons I’ve continued to hold on to hope with Technorati is because Dave Sifry still takes the time to respond to posts like this (see the first comment). I’m hoping that someone with that much passion for their business will find some way to make things better.

PEAR Command Line Options

When PEAR first came out I wasn’t very impressed. Although it still has issues, some of the modules are quite handy (if you haven’t looked at HTML_QuickForm do so now). I’ve been working on a couple of projects where I want to have all of the code, including PEAR modules, together so that it can be used without having to worry about additional dependencies. The simple way to do this is to download go-pear, install it with php go-pear local and change the directories to install PEAR with your project code. Simple enough, but things didn’t work the way I expected out of the box. Here are some additional details that I learned the hard way.

When doing a local install PEAR will create a pear.conf file containing the paths you told it to use during install. In order to make use of this file, you have call pear with the -c option, like this:

pear -c /path/to/pear.conf ...

Until I figured that out the pear command kept trying to make use of the system install of PEAR. I believe the default location for the pear.conf file is what ever your configure $prefix to be during the install. Assuming that the pear script is installed in $prefix/bin, calling pear with pear -c ../pear.conf seems to work just fine.

The next problem I ran into was trying to get Spreadsheet_Excel_Writer installed. This module isn’t available under the stable tag, only the alpha tag. I didn’t want to change the default tag to alpha, so that is where the -d option came in. The -d option allows you to override a default configuration, like so:

pear -d config_parameter=newvalue

To install a module under the alpha tag you would use pear -d preferred_state=alpha install module_name.

I was surprised when I was unable to find this information in the PEAR Manual, instead I had to hunt around Google picking up pieces here and there.

Kottke: WebOS

Kottke posted some ideas on how the WebOS will develop. It’s an interesting read that looks at the development of a WebOS from the point of view of Google, Yahoo, Mozilla, Microsoft and Apple. No matter how the details play out, I’d expect all of the players involved to fight pretty hard to for their share of the market. This will be most true of Microsoft, who has the most resources to fight with and has the most to loose.

Google Talk

So the rumors about Google offering an IM (Instant Message) service appear to be true. There are already instructions on how to connect to their Jabber server. All you need is a GMail account. I’m on Google Talk right now using iChat on Mac OS X.

Servername: talk.google.com
Username: yourusername@gmail.com
Password: yourgmailpassword.

I’m curious to see where they’ll take this. The obvious thing that Google could add to IM is the ability to include your logged conversations in search results.

UPDATE 9:00pm 23 Aug 2005: The official Google Talk site is now live.

UPDATE 8:45am 24 Aug 2005: Someone has already discovered an easter egg in Google Talk:

In the about box (right click the taskbar icon) there should be this
play 23 21 13 16 21 19 . 7 1 13 5
substitute letters for numbers and you get
“wumpus.game”
add this to to your freinds list minus the quotes and play an old irc game!

I haven’t been able to make this work yet, I’m still waiting for my invitation to be approved.

UPDATE 10:15am 24 Aug 2005: An announcement was posted to the Google Blog this morning.

PHP::Interpreter

Bricolage announced a new release today which adds support for PHP5 templating. They are able to do this via a new a CPAN module, PHP::Interpreter. This module allows you to embed a PHP5 interpreter into Perl. On top of that you can reach back into Perl from PHP. Neat stuff.

It would be nice to see the same sort of thing for PHP, a Perl interpreter that could be embedded into PHP and also reach back into PHP for functions and objects. Wasn’t all of this one of the goals of Parrot, that with a common base languages running on top of Parrot could make use of each other libraries?

Email Tags

I’m just going to come out and say it: I want the ability to tag my email. I know, I know it sounds like I’m jumping on the tagging bandwagon. So what, I want to be able to tag my email. I really only have one reason for this request, to make email search better.

More and more searching for emails has become better at finding old information than the traditional hierarchy method. Sure Gmail, Thunderbird and Mail on Mac OS X have popularized this idea, but it goes beyond that. With the desktop search features heating up, most of them supporting searching email stored on your system, we need to have betters ways for search engines to rank our data. This is about finding the information you need as quickly and easily as possible.

Another point I want to make clear, I’m not interested in pushing email tagging as a replacement for hierarchy storage of email. If you want to go all the way and have one big archive folder with lots of tagged emails, fine. If you want to create a folder tree for you email, that is fine too. Tags should be used to enhance what were are doing, not necessarily replace it.

How about some nuts and bolts stuff, how should email tagging work?

Where should email tags be stored?

My first thought on this is in the headers. So far I still think this is the best place for them. When an email client adds or removes a tag the headers get changed accordingly and the message gets updated on the server (if they are using something like IMAP). I’m not sure what format would be best for these tags, for now I’d envision something as simple CSV (comma separated values). This would allow for spaces, but probably not newlines. I think embedded newlines in the headers would break things.

We’d have to come up with a new header, perhaps x-tags?

What email clients could do with tags.

Once we get tag support in email clients there are a couple of things I’d like to see. The obvious one is the ability to add, remove and edit tags for a given email. It should keep a running list of tags you’ve already used so that you can pick from a list. Yes, it could even generate a tag cloud if you like. A plus would be the ability to suggest tags based on content of the email and how you’ve tagged similar emails in the past. Maybe an autotag feature, that looks at the content of the email for words and terms that match tags you’ve used in the past and selects those for you. From there you could then refine what tags you want assign to the email. For bonus points I’d like Procmail to have some these abilities too, that way your email server could do some basic tagging for you.

Email clients should provide some additional smart folders also. A smart folder called tags could have subfolders that correspond to each tag you’ve ever used. You should be able to sort these alphabetically and by frequency (most commonly used tags at the top or bottom). Bonus points for the ability to drag and drop these smart tag subfolders to do searches. So dragging the ‘work’ tag subfolder onto the ‘sql’ tag subfolder should automatically do start a new tag search subfolder that contained only emails that have those tags. By dragging the ‘mysql’ tag subfolder onto it, the search should be refined even further.

When performing a free form search in email clients any tag matches should get increased importance. If I happen to use the word ‘oracle’ in a search and it is also one of my email tags, then emails with that tag should be ranked higher than emails without it.

How to make this happen.

I suppose someone should write an RFC on email tags for starters. That will outline the technical issues involved and allow vendors to start commenting and working on implementing these features. It would be really great to see someone like GMail support this because once they turn this feature on then everyone will have access to it. It also isn’t that far away from their label feature. Other good targets would include open source projects like Thunderbird and Squirrelmail.

It’s 2005, I hope that I don’t have to wait until 2010 before email tagging becomes widely available.

Colin Percival, New FreeBSD Security Officer

Last week Jacques Vidrine announced that Colin Percival is the new FreeBSD Security Officer. For more details about the Security Officer and the Security Team, check out the security information page at FreeBSD.org. You’ll find information on who the security officer and team are and their charter.

A big thank you to everyone who has spent time making FreeBSD more secure, I’ve certainly benefitted from your work.

O'Reilly Web Spam

Another in the line of folks who should know better, O’Reilly has web spam on some of their sites. This doesn’t appear to be quite as bad as the WordPress web spam because they aren’t using CSS to make the content invisible to visitors. The placement of these “ads” are rather out of the way though, on the bottom left hand column.

With a little bit of looking around I was able to find these ads on oreillynet.com (on the article pages also), windowsdevcenter.com (on the article pages also), macdevcenter.com (on the article pages also), ondotnet.comt (on the article pages also), onjava.com (on the article pages also), onlamp.com (on the article pages also), perl.com (on the article pages also) and xml.com (on the article pages also).

The numbers involved don’t appear to be quite as bad as the WordPress incident either, with Google finding less than 600 pages with these ads for oreillynet.com. If the other sites have a similar number of pages with ads then all told it would less than 5000 pages. A lot of these ads point to freehotelsearch.com, which seems to offer a legitimate service (I only looked up reservations, I didn’t actually place one).

I think one could argue that these ads aren’t completely wrong. The argument would come down to intent, are these links there in hopes that people will actually click on them, or are they more of an effort to trick search engines to increase their importance? They are links, so it is possible that someone might click on them, but they aren’t nearly as prominent as the rest of their ads. I’m leaning more towards the idea that these ads are there more to boost their search engine ranking than as traditional ads. Tim is going to have a tough time making this look legit.

UPDATE 8:30am 24 Aug 2005: Tim O’Reilly has a posted an initial response to the complaints about the ads. The short version: while not completely wrong (and not nearly as bad as the WordPress spam) these types of ads aren’t good for the long term.