Semantics based search is the flavor of my articles and in this one I’ll be focusing on the Textonomy suite of products from Crystal Semantics, now acquired by the e-advertising firm ad pepper media.
The company was co-founded by Professor David Crystal, a world authority in linguistics. The textonomy product has at its heart a “sense-engine” , a product of more than 8 years of linguistic research that evaluates the semantics and stylistics involved in word usage. While semantics is the common ingredient for all meaning based search engines, stylistics refers to the localized usage of words (contextual distinctiveness).
Disambiguating the Textonomy search scheme
The origins of the linguistic project date back to the need for classification of data for the Cambridge encyclopedia. The requirement extended to incorporate data from a number of other encyclopedias and the huge database of information served as a taxonomical repository to marry the dictionary definitions with encyclopedic classifications. The result: an engine to compute the contextual meaning of words by relating the dictionary words to encyclopedic categories.
At the core of the engine are three components
- A page analyzer that analyses HTML content and extracts data to be sent to a “black box for categorization”.
- A black box that matches the text on the page with the categories in the taxonomy (upto 2500 categories) and the categories are ranked according to the usage of the words.
- The reporting interface that can present the data in a user defined or XML format to be used to place ads or generate results as required by the client.
The company is headquartered in England and the products are available for advertising companies to enhance their offerings for relevant ad placing. Recently the company has made available technologies that operate both at the server and client side (effectively addressing both ends of the Ad delivery spectrum)
Linguistic search needs clear differentiators
As mentioned in my previous articles, search oriented companies need to increasingly focus on integrating non-text based meaning engines as more and more content online gets media rich.
There are a number of firms leveraging linguistic techniques for targeted advertising, but a major differentiator would be a firm that targets not just meaning in text, but meaning in media as a whole.