Link Dump

Battle for Wesnoth: Multi-platform open source turn based strategy game.

Windows “Longhorn” scheduled for 2006: Microsoft announces that the next version of Windows (called Longhorn at this point) won’t be released until sometime in 2006. They are scaling back on the features to be included in Longhorn in order to try and make the release in 2006. Why do companies even bother trying to predict product releases further than a year out?

Synergy: Sounds like networked KVM software. Works on Windows, Mac and Un*x. Cool.

GmailFS: This is so scary and wrong I don’t know what to say. Perhaps Google should just expose it via WebDAV? (via Jeremy)

FreeBSD’s CVS 3rd Party Import Instructions: If you have a project that uses CVS then this example is helpful on understanding how to import 3rd party code used by your project into CVS without having CVS screw it up.

Blog Entry References

I whipped up a quick WordPress plugin to add entry reference searching using Bloglines, Feedster, Technorati and Blogdigger. It is pretty basic, but does the job. I got the idea from Jeremy Zawodny’s Blog, where I saw it for the first time earlier today.

This is one of those obvious things that I kick myself over because I should have thought of it a long time ago.

Update 5:50 pm 27 Aug 2004: I modified the plugin a little bit so that I can use it to create links that search for references to this site instead of just a specific blog entry. Check out the ‘reference search’ area of the side section just above the calendar.

Update 12:10 pm 28 Aug 2004: I’ve removed Blogdigger from the reference search plugin because their search seems to only work occasionally.

WebDAV As The Common File Share Protocol

It just feels wrong that SMB has become the de-facto universal file share protocol. For the most part this seems to work, with the ability to mount SMB shares on FreeBSD, Mac OS X and Linux (and probably many others). But it still feels plain wrong. With WebDAV support in the last few versions of Windows I’m very tempted to think that WebDAV should be heavily pushed as the new universal file share protocol. Both Linux (davfs) and Mac OS X support mounting WebDAV shares and I’d expect that it wouldn’t be too hard to have it supported in FreeBSD and other un*x systems. Serving WedDAV shares already works on all systems that support Apache 2 (and Apache 1.x with mod_webdav) and IIS on Windows.

This would require some other changes though. Permissions would no longer be in the filesystem that the share resides on, it would be in .htaccess (or equivalent). I’m sure there are more efficient ways than HTTP to transfer data, but if performance was good enough then trading off efficiency for simplicity and ease would be worth it. Security of the file share is easily addressed with the addition of SSL on WebDAV shares. This would encrypt data as well as authentication for the shares. Being able to open up firewalls for remote share access would be easy and could even be moved to another port besides 80 or 443. For that matter a firewall could implement port knocking and only open up the webdav share after the correct knock had been given.

Samba has done a great job of bringing file share serving to non-Windows systems, but that doesn’t change the fact that they will always be chasing Microsoft in one form or another. An open protocol (like WebDAV) could bring to file shares the same sort of level playing ground that HTTP brought to web servers in the first place.

Gmail Invite Giveaway

I suppose there is no reason to let my Gmail invites just sit around doing nothing. I’ve currently got 6 5 4 3 2 1 0 invites, leave a comment on this entry and I’ll send you one. Make sure that you fill out the email field when posting the comment other wise I won’t know who to send it to.

Update 7:50 pm 26 Aug 2004: All of the Gmail invites are now gone. I guess Gmail invites are a good way to bring people out of the wood work.

More Sacramento Temple Ground Breaking Coverage

The Sacramento Bee had another article covering the Sacramento Temple ground breaking. A lot of focus was given to President Hinckley’s sense of humor. I guess most people won’t expect a 94 year old man to have a pleasant sense of humor.

The picture is from the Bee’s article. If you look closely at the background, just right of center, you will see a young man with a white shirt and suit. He is a friend of mine, Brian Pebbles. Fun to see a picture of him and President Hinckley in the paper.

Apache Module Idea mod_ping

While trying to get my thoughts down on feed searching a thought came to me about one of the problems with traditional web search engines, figuring out when pages are updated. For dynamic content like feeds this problem has been addressed by using XML-RPC to ping sites to let them know there is new information available. So why couldn’t this be used for static pages that are updated via FTP, WebDAV or FrontPage? Adding pings to those protocols isn’t realistic, but the web server would be able determine when content changes pretty easily. Enter the idea of an Apache module, mod_ping. This module would have a small database (dbm perhaps?) that simply stores the time stamps for the files being served out by Apache. If the time stamp on the file matches the most recent one in the database for that file then mod_ping would do nothing. If the time stamp on the file was newer than the one in the database then it would ping a configured list of services (like Google and Yahoo) letting them know that the page has been updated and that they should come along and reindex it when they get the chance.

Configuration of such a beast would have to be pretty flexible. Once enabled in Apache the configuration of mod_ping could be done in individual .htaccess files. Perhaps even using regexs so that you could easily exclude or include certain types of files, or directories. It would also need configuration directives to specify what URLs to ping.

There are two hurdles to mod_ping right now. The first is that it only exists in my head. The second is that Google and Yahoo would have to be convinced to support ping requests. Perhaps it should be a for pay service with the benefit to customers being that their pages are being indexed in a timely manner. For that matter this could be done with blogs too, pinging Google and Yahoo when a brand new page comes into being because of a new post. Its all about getting web pages indexed faster so that searches are more meaningful.

Why Hasn't Anyone Figured Out How To Do Feed Searches?

For searching the web in general most people go to Google or perhaps Yahoo. For the last couple of weeks I’ve been interested in just searching through feeds (RSS and ATOM) for information. Say I wanted to track what people are saying about PostgreSQL. This can’t really be done with the traditional search engines (Google, Yahoo, etc) because they base their results on popularity (in one form or another). This doesn’t help me because I’m interested in what people are saying right now, not who has said the most popular thing. So I started using the feed search sites to see how they stacked up. The results were extremely disappointing.

Technorati
These guys have probably been around as long as anybody in the feed search area. They not only allow simple searches, but if you put in a URL they’ll give you a list of all the recent links to that URL. This feature is handy, but is extremely limited because once a link to a URL is no longer “current” then it drops off the list. There doesn’t appear to be any way to get a list of all the pages that have ever linked to a given URL.

The regular term search provides a similar results page from feeds that are considered “current”. Once again there doesn’t appear to be a way to get results further back than “current”. After using this for awhile to find out what people are saying about PostgreSQL I found that their database appeared to be updated in a rather jittery way. There were some occasions where I was able to find an entry about PostgreSQL via other sources that were more recent than all of the “current” results from Technorati. I’m not talking about a close couple of hours, but between one and two days. Technorati’s search also has a problem that is common among feed search sites, lots of entry duplicates. When almost 50% of the search results are duplicates the search becomes almost useless.

Technorati supports being pinged for feed updates, which is supported by Ping-o-Matic. There seems to be problems here also, in some cases it has taken days for some of my entries to show up in search results. In rare occasions some of entries never made it into search results. There are several points were this failure could have happened, but the result was the same, feed entries that should have been in search results but weren’t.

They have some additional for-pay features and also show advertising on their search results page. This looks like their only two forms of revenue, hopefully it is enough.

Bloglines
Although Bloglines primary service is as a feed aggregator they also have the ability to “Search All Blogs”. The Bloglines search has more options than Technorati, allowing options like: all of these words, exact phrase, at least one of these words, without these words, sort by popularity or date. The search can be limited to all blogs, only your subscriptions and excluding your subscriptions. The search looks up not just entries that have search terms in them but blog titles and descriptions that contains those terms.

Because Bloglines is already keeping all of the feed entries around for their aggregator accounts their search is limited to Technorati’s idea of “current” pages. They don’t have Technorati’s ability to lookup entries that link to a given URL in their search they do keep track of references per entry in their aggregator. This is a nice trade off for the aggregator, but it makes their search a little lacking, especially if you use this feature in Technorati a lot.

Perhaps it is because their primary focus in feed aggregating, they seem to be more up to date than Technorati. Unfortunately their search results are chock full of duplicates. I suspect this also stems from their aggregator focus. Bloglines doesn’t appear to support being pinged when a feed is updated. At the very least they aren’t listed on Ping-o-Matic.

Bloglines doesn’t display any advertisements, but they do have some for pay services. Just like Technorati I have to wonder if this will be enough to keep them going.

Feedster
In a superficial way Feedster has a similar style to their pages as Google. The search feature looks to be pretty basic, although it does support some additional filtering: limit to an RSS URL, limit to OPML URL and exclude RSS URL. Sorting of the search results can be done by relevance (popularity) or by date. There is another feature that is supposed to take a URL and find all of the feeds that it provides. This search works, but the links it provides to “All Posts” and “All Links” don’t appear to work.

Their search results page also has a similar style to Google. Unfortunately their usability is pretty poor. No matter how often I set the option to search by date the results pages indicates that it still searching by relevance. Another strange thing is that as you click on next to go through the search results, the number of results on each page seems to vary. Sometimes there will be 10 results listed then other times there will only be 4 results shown on a page. Not a huge problem, but it makes the site feel a bit funky.

The search results seem to be about as fresh Bloglines, but the number of duplicates doesn’t appear to be as high. This makes their results probably the best out of the three, but without the flexibility of Bloglines and the link search of Technorati. Add in the odd usability feel and you end up with something that is probably the best out of the three for results, moderate for power and poor the feel of the site.

I don’t know if they offer any for pay services, but they do show advertising in a similar style to Google. Hopefully following a model that has already been successful will be work well for them to generate revenue.

Update 10:40 pm 24 Aug 2004: Scott Johnson of Feedster left a comment pointing the Feedster Help Section. It looks like there are a lot more powerful search term features in there that didn’t jump out at me. I still like to see the duplicates reduced. I’ve tried to stick to talking about features, but I still think Feedster just feels funny. Considering that my atheistic design skills are pretty poor you may want to take that with a grain of salt and try it out for yourself.

Waypath (Added: 12:15 pm 24 Aug 2004)
This one was pointed out to me by Mark. I’d come across this just briefly in the past but didn’t play with it much. Now that I’ve started writing up some my thoughts I think I can look Waypath as it compares to the other three. The superficial look makes me think that if Feedster is using the Google “style” then Waypath is trying to go the Yahoo “style” with their new Topic Streams feature. This reminds me a lot of Yahoo’s origins as a categorized set of links. This feature is still a beta so I’d expect it to change with time.

Waypath looks like the only feed search site to understand the basic set of search term possibilities via their advanced search features (things like AND, OR, wildcards, single terms and phrases). Bloglines has this to some extent, but Waypath looks more complete. They also support finding entries that relate and link to a specific entry. This is kind of combination of the Bloglines reference system and the Technorati URL link search. You can also filter out or limit searches at the weblog level. I’d like to see them have a search syntax for this, not just just icons once you get a set of search results. Those icons need to be more unique also, it would be easy to mistake one for the other.

One thing that I should have included in my other reviews were use of other “interesting” features, one specifically, bookmarklets. Suffice it to say that if your feed search site isn’t making use of this then it should be. Waypath a couple of nice bookmarklets and I believe Bloglines also some. Another feature that is probably more gee-whiz than anything else is their Buzz Maker. Give it a couple of terms and it graphs them using entries for the last 45 days. They were even smart enough to provide HTMl you can cut and paste to use in your site to use these graphs. Waypath also makes some plugins for different blogging systems. If I get the time I’d also like to try out their XML-RPC services.

After playing with all of these little toys I get the sense that these guys might “get it” more than the other three, at least in terms of searching feeds. Unfortunately all is not perfect. Their search results are severely lacking, there just aren’t enough of them. This is probably because they aren’t indexing as many feeds as the other sites. I also didn’t see a way to ping them for updates (and they aren’t listed at Ping-o-Matic even if they do).

I didn’t see anything that indicated there were for pay services, but there are some ads along the side of the search results. Who knows if this is enough to bring enough revenue though. Overall these guys have some cool features, but if they don’t start indexing a lot more feeds all of those features won’t be very useful.

PubSub (Added: 10:00 pm 24 Aug 2004)
Another comment pointed out PubSub as another possibility. Their approach is different from others on this list. Instead of searching through existing feed entries you create a watch list that is used to scan feed entries as they come in. For certain applications this is great, like my example of keeping up with what people are saying about PostgreSQL. This narrow focus gives them certain advantages, but heavily limits their audience. I suspect that everyone else on this list should look at what PubSub does and integrate that feature as one component of what they provide.

They support pings and are listed on Ping-o-Matic. I couldn’t find any information of for pay services. I haven’t seen advertising yet, although I just signed up for a watch list feed for “PostgreSQL” so perhaps they advertise in the watch list RSS feed? I’m assuming they have some sort revenue model other wise they may not be very long for this network. Hmmm, combined their narrow focus and possible minimum revenue and they may be the most likely to be acquired.

Blogdigger (Added: 10:25 pm 24 Aug 2004)
One more feed search site suggested in the comments. I’d never heard of these guys so I’m only just getting a feel for their site. Their search appears to be ok compared to the others with one big difference, the number of duplicates seems to greatly reduced. They probably need to be indexing more sites to fill out their search results a little better, but what they do have seems to be well taken care of. They use meta feeds in several places like feeds for a search and link search (which didn’t seem to do much when I tried it). There are two beta features that look promising, Blogdigger Groups and Blogdigger Media.

The groups feature allows to create a blog made up of entries from other blogs. This is great subject matter blogs. The media feature provides feeds to track the latest .torrent, .wav, .mp3, .mov and .avi links that are in feed entries. I like the idea of these very dynamic meta feeds, it has the potential to make tracking interesting tidbits that much easier.

Blogdigger is using Google’s Adsense to advertise on their site. I didn’t see any other for pay services listed.

Conclusion
All of the big three feed search sites fall short of their potential. One problem that they all have in common is duplicate search results. As the number of entries increases being able to deal with duplicates is going to become a bigger and bigger problem. Someone needs to solve this, preferably sooner rather than later. When the number of duplicates becomes too high it makes the search results almost useless. Most the search features are pretty simple on these sites. While having more advanced features will undoubtedly require more intelligent and powerful systems I think the site that can integrate these the best will get a huge leap over the others as long as the other problems (like duplicates) don’t over power them. Look at Google’s Advanced Search page and think about what sort of features feeds make possible.

For me personally the biggest feature disappoint was the lack of cool advanced features involving dates. Virtually every feed entry has a date associated with it, this makes searching by date a possibility. No one seems to be doing this yet though, probably because of the additional power that would be required to do this. Maybe no one has every looked at the Google Groups Advanced Search page, where you can limit your search to certain time frame. We should be able to do this feed entries.

Another looming question is why hasn’t Google or Yahoo come up with a way to integrate feed searches into their web searches in a meaningful way? Maybe they are just waiting to let smaller companies research this and then buy one of them. I guess I’m just disappointed that people who already know so much about searching on the web haven’t applied that knowledge to this current problem.

Update 3:03 pm 24 Aug 2004: Fixed the spelling of Technorati (see Dave’s comment).

Finished Reading 1984

The other day I finished reading 1984 by George Orwell aka Eric Arthur Blair. This book pretty much only has two levels: depressing and disturbing. In an exaggerated way this is the definitive book about the government taking over the lives of its people. Although this was written as a futuristic book the technology mentioned in the book plays out as kind of minor reoccurring characters. I was a bit disappointed with how some of the story progressed because it was so predictable, but the story as a whole is still a good read.

The most interesting development from this book is that Sarah has started reading it. There aren’t a lot of fiction books that Sarah and I have read in common so it should be fun to compare our impressions of the book after she is done with it.