March 17, 2004, 12:00 AM

Beyond easily searchable documents lies the invisible web, says Overture

Though search engine indices may pull from as many as 5 billion documents, even more web content is “invisible” if web crawlers miss deep or dynamic content. Now Overture and Yahoo are out to reveal that invisible content.

Most search engine indices contain about 4 to 5 billion documents, but adding in documents from the so-called “invisible web” raises the number of documents for potential inclusion in search indices considerably higher, Chris Bolte, vice president of strategic alliances at Overture Service Inc., tells Internet Retailer. Overture this month with parent Yahoo Inc. launched a new service that aims to surface the web’s invisible content. While Bolte says deep content on academic or government sites is one example of web content less easily discovered and indexed by web crawlers -- and is, therefore, “invisible” – areas of commercial sites face the same challenge.

Catalogers, for example, might have dynamic content. “A cataloger would want all its products and the most current pricing for those products to go out to the broadest audience,” Bolte says. However, web crawlers face technical limitations in discovering and indexing dynamic content. If the algorithm changes or the product changes, a listing may simply be dropped. For a search engine to stay updated on dynamic content, “You have to build a feed between the content provider and the search engine to transmit that information into the search engine on a regular basis,” Bolte says.

Beyond the limitations dynamic content poses to web crawlers, web content also may be “invisible” if it’s from a complex site, or proprietary. Overture’s and Yahoo’s new Content Acquisition Program – which involves a cost-per-click component for commercial sites -- seeks to make it easier for Yahoo crawlers to find such deep content with content submission guidelines and technology. “We’re essentially telling content providers exactly how to deliver information to us and establish a relationship so we can get access to proprietary, dynamic and complex content, and then ensure that it happens on a continuing basis,” Bolte says.

comments powered by Disqus




From IR Blogs


Ken Westin / E-Commerce

Cybersecurity: Understand the risks and plug the holes

Criminals know how to exploit Internet vulnerabilities to penetrate retailers' networks. Here are some insights ...


Miroslav Zubachevsky / E-Commerce

Don’t write off Russia

Despite the political conflict in the Ukraine, there remain big opportunities for Western brands to ...