Archive for October, 2007

USPTO App: Simplifing the Long Tail of Search Queries (0070100795)


« 5 October 2007 | No Replies »

United States Patent Application: 0070100795

The second (and most valuable perhaps) of three major inventions I made and patented while at Yahoo! Research.

All three are based on a form of collaborative filtering of search result url’s, using modified search engine similarity metrics, and the same’s inverted index techniques for performance. This solves the tail quey rewrite problem - mapping extremely rare “tail” search queries into more common, (and more importantly bidded), search queries. The Search Engine then displays ads applied to these keywords. Experiments showed very high levels of coverage.

A simple explanation: Most rewrite systems have used either orthographic (spell checking) solutions, or thesaurus-term-based (eg auto = car = automolbile) solutions (or in practice a statistically-based mix of these). This is fairly effective, however, it fails horribly for reasonably simple problems. For example - a user types: “Marius De Vries” - say what? (my very reaction in fact). However - if you look at the url result set and compare it to the top N million search phrases, then it becomes very quickly obvious, this guy was the Musical Director of Moulin Rouge, and the best search term rewrite is “Moulin Rouge” (and suggesting displaying ads for the CD or Movie).

A relatively recent paper from Google tries to prove that this approach has a theoretical flaw. However the proof is based on the total number of concepts in the world approaching infinity. This does not appear to be a reasonable assumption. Certainly a large number of concepts exist — into the Ns of billions (eg each individual person), but from a practical point of view, the amount of “discourse” in a search engine is relatively finite, and our technique provides a way to map 10000s of different ways of talking about a thing to a small set of terms that are MOST used for talking about it.



USPTO App: Content Based Advertising Using Collaborative Filtering (0070100813)


« 5 October 2007 | No Replies »

United States Patent Application: 0070100813

The first of three major inventions I made and patented while at Yahoo! Research. This was the original start of the series, and was trigger through discussions internally with Deepa Joshi.

All three are based on a form of collaborative filtering of search results, using modified search engine similarity metrics, and the same’s inverted index techniques for performance. This one deals with assigning keywords to new webpages, to provide content-based advertising. One way to assign meaning to a page is to count the search referral keywords. This is great, if the page is popular and in the index. If it isnt, this technique allows to assign keywords based on the keywords similar pages have created.



USPTO App: Pricing New Search Terms Using Collaborative Filtering (0070129997)


« 5 October 2007 | No Replies »

United States Patent Application: 0070129997

The last of three major inventions I made and patented while at Yahoo! Research.

All three are based on a form of collaborative filtering of search result url’s, using modified search engine similarity metrics, and the same’s inverted index techniques for performance. This one deals with assigning a bid price to an unknown term.