Currently, Google Inc., Yahoo Inc. and other top search companies voluntarily respect a Web sites wishes as declared in a text file known as “robots.txt,” which a search engines indexing software, called a crawler, knows to look for on a site.
The formal rules allow a site to block indexing of individual Web pages, specific directories or the entire site, though some search engines have added their own commands.
The new proposal, to be unveiled Thursday by a consortium of publishers at the global headquarters of The Associated Press, seeks to have those extra commands — and more — apply across the board. Sites, for instance, could try to limit how long search engines may retain copies in their indexes, or tell the crawler not to follow any of the links that appear within a Web page.
The current system doesnt give sites “enough flexibility to express our terms and conditions on access and use of content,” said Angela Mills Wade, executive director of the European Publishers Council, one of the organizations behind the proposal. “That is not surprising. It was invented in the 1990s and things move on.”
[…] News publishers complained that Google was posting their news summaries, headlines and photos without permission. Google claimed that “fair use” provisions of copyright laws applied, though it eventually settled a lawsuit with Agence France-Presse and agreed to pay the AP without a lawsuit filed. Financial terms haven’t been disclosed.
Wade said ACAP could thwart future legal battles and make Web sites more comfortable about putting more material online, including scholarly journals and other items requiring subscriptions.
[…] Like the current robots.txt, ACAP’s use would be voluntary, so search engines ultimately would have to agree to recognize the new commands. Search engines also could ignore them and leave it to courts to rule on any disputes over fair use.