March 17, 2004, 12:00 AM

Beyond easily searchable documents lies the invisible web, says Overture

Though search engine indices may pull from as many as 5 billion documents, even more web content is “invisible” if web crawlers miss deep or dynamic content. Now Overture and Yahoo are out to reveal that invisible content.

Most search engine indices contain about 4 to 5 billion documents, but adding in documents from the so-called “invisible web” raises the number of documents for potential inclusion in search indices considerably higher, Chris Bolte, vice president of strategic alliances at Overture Service Inc., tells Internet Retailer. Overture this month with parent Yahoo Inc. launched a new service that aims to surface the web’s invisible content. While Bolte says deep content on academic or government sites is one example of web content less easily discovered and indexed by web crawlers -- and is, therefore, “invisible” – areas of commercial sites face the same challenge.

Catalogers, for example, might have dynamic content. “A cataloger would want all its products and the most current pricing for those products to go out to the broadest audience,” Bolte says. However, web crawlers face technical limitations in discovering and indexing dynamic content. If the algorithm changes or the product changes, a listing may simply be dropped. For a search engine to stay updated on dynamic content, “You have to build a feed between the content provider and the search engine to transmit that information into the search engine on a regular basis,” Bolte says.

Beyond the limitations dynamic content poses to web crawlers, web content also may be “invisible” if it’s from a complex site, or proprietary. Overture’s and Yahoo’s new Content Acquisition Program – which involves a cost-per-click component for commercial sites -- seeks to make it easier for Yahoo crawlers to find such deep content with content submission guidelines and technology. “We’re essentially telling content providers exactly how to deliver information to us and establish a relationship so we can get access to proprietary, dynamic and complex content, and then ensure that it happens on a continuing basis,” Bolte says.


Sign In to Make a Comment

Comments are moderated by Internet Retailer and can be removed.

Not a member? Signup for free today!




Relevant Commentary


Jason Squardo / Mobile Commerce

Five tips for achieving high mobile search rankings

Searches on mobile devices will soon exceed those on computers, Google says. Retailers that keep ...


Sergio Pereira / B2B E-Commerce

Quill turns to its B2B customers for new ideas

Coming in April is a new section of that will let customers and Quill ...