Jimmy Wales, the founder of Wikipedia once said, “Search is part of the fundamental infrastructure of the Internet. And, it is currently broken.” And while he’s dabbling with the concept of an open-source search engine, Dr. Riza C Berkan of Hakia.com believes the fix is in the semantics.
The CEO and founder of the “meanings” based search engine believes that the future of search lies in understanding what the user wants rather than fishing out results with keyword matches.
Semantic engines are one among the plethora of alternative search options that are now debuting over the web. While many of the new entrants are focusing on improved UIs, clustering or grouping data from engines already there, semantic search is perhaps the most daunting technological approach to the problem.
Hakia is based on an ever expanding database of concepts matching that it says will help make better sense of long complex queries and even questions based queries that present day systems don’t address adequately.
What runs semantics?
Considering the task of understanding the deluge of information that the web really is, Hakia’s technology backbone seems it’s got the recipe to scale up. So without further ado, here are the facts:
Hakia’s semantic search is essentially built around three evolving technologies:
- OntoSem (sense repository)
- QDEX (Query indexing technique)
- SemanticRank algorithm
- OntoSem is Hakia’s repository of concept relations, in other words, a linguistic database where words are categorized into the various “senses” they convey.
- QDEX is Hakia’s replacement for the inverted index that most engines use to save web content. QDEX extracts all possible queries relating to the content (leveraging the OntoSem for meaning) and these become the gateways to the original document. This process greatly reduces the data set that the indexer has to deal with while querying data on-the-fly. An advantage when you considering the wide swath of data the engine would have to search if it were an inverted index.
- Finally, the SemanticRank algorithm independently ranks content on the basis of more sentence analysis. Credibility and age of the content is also used to determine relevancy.
Hakia performs pure analyses of content irrespective of links or clickthroughs among the documents (they are opposed to statistical models for determining relevance).
With purely content based indexing and searching, there is no requirement to monitor user activity via cookies or JavaScript (allaying privacy concerns). Infact, no data is saved on the user’s system without explicit user permission.
A Google beater?
With the increased focus on creating the next generation of the web on Semantic standards where data can seamlessly be interpreted by machines, semantic search engines will be all the more relevant.
Hakia’s engineers themselves claim that their model mimics human learning (accelerated exponentially). However, as far as Google trumping (SEOMarketTips) is concerned, the folks at GooglePlex have been hiring Machine Learning and Natural Language processing experts for a long time. Hence it can’t be said that Google is oblivious to the potential of semantic search.
Also, Hakia is emerging at a time when the Internet is increasingly transitioning from a “textual” web to a more interactive experience (video and audio). Hence the question of semantics expands to making sense of content as a whole. And this is where Hakia has to capitalize.
To conclude, I would go with what Dr. Riza commented in a post at ReadWriteWeb – “Semantic search is definitely an antidote for poor relevancy; but only time will tell how well this can be done”.