Recently, as commercial search engines such as Google, Yahoo, Ask Jeeves, and MSN Search are escalating their reach of web surfers while growing stronger by the moment, there has been a reaction by the “non-commercial” crowd to bring about an alternative. Open source activists have put together Nutch search engine technology which may bring an alternative to the regular search engine field.
According to the Nutch project they provide a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.) All existing major search engines have proprietary ranking formulas, and will not explain why a given page ranks as it does. Additionally, some search engines determine which sites to index based on payments, rather than on the merits of the sites themselves. Nutch, on the other hand, has nothing to hide and no motive to bias its results or its crawler in any way other than to try to give each user the best results possible.
Over the past week, three open source search engines have gathered the attention of the searching community, two of which are using Nutch and one of which is still in the idea/development stage.
MozDex
This month MozDex, an open source search engine built entirely using different open source technologies, has been tweaking and refining its search results while in beta testing. While in “deep crawl,” MozDex plans on full indexing within the upcoming weeks.
Mozdex.com offers one of the first OPEN search systems based on publicly available software, APIs and algorithms, said Byron Miller, President at Small Productions. There is no secrecy into understanding the results or ranking thereof offering the first public insight into an open index.
Objects Search
Objects Search has launched a clustering search engine based on open source technology Nutch (www.nutch.org). Clustering Engine is a system for clustering textual data.This engine automatically categorizes search results on-the-fly into hierarchical clusters.
Search results clustering attempts to overcome the problem of information overload, since most search engines are based on keyword-based queries and endless lists of matching documents. Unfortunately, even when exceptional ranking algorithms are used, relevance sorting inevitably promotes quality based on some notion of popularity of what can be found on the Web.
One approach is to automatically group search results into thematic categories, called clusters. Assuming clusters descriptions are informative about the documents they contain, the user spends much less time following irrelevant links.
OpenIndex
According to Research Buzz, OpenIndex is not quite an open search engine project, but more of an index (as states the simple name) or a community-built search engine. Claiming that they do not have the hardware to power a huge web index (As of yet) OpenIndex is open to ideas of users who join their community.
OpenIndex puts forth the idea of a decentralized, multi-computer powered search index; “Although we wouldn’t likely have large computers available, we could have many small ones, contributed by interested volunteers, and distributed across the community – even across the globe. Perhaps it’s the only way to have a publically-owned and operated index.- it certainly seems appropriate.
A distributed system of servers would apportion all of the tasks of running an index among them. This would create a massive system of computers running in parallel, doing tasks as they are required. Costs would be distributed among the servers.”