Website publishers are accusing artificial intelligence (AI) startup Anthropic of “aggressively” scraping data from their sites.
Such actions could violate publishers’ terms of services, the Financial Times (FT) reported Friday (July 26), citing interviews with those website owners.
Data scraping is the automated process of pulling information from websites or other digital sources, often without the express permission of the content owners. Companies that generate content have a vested interest in safeguarding that material to maintain revenues.
As the FT noted, Anthropic was founded by former OpenAI researchers who wanted to develop “responsible” AI systems.
But Matt Barrie, CEO of Freelancer.com, accused the startup of being “the most aggressive scraper by far” of his freelance work portal, which gets millions of visits per day.
The site got 3.5 million visits from an Anthropic-linked web “crawler” in the span of four hours, according to data shared with the FT. That makes Anthropic “probably about five times the volume of the number two” AI crawler, Barrie said.
“We had to block them because they don’t obey the rules of the internet,” Barrie said. “This is egregious scraping [which] makes the site slower for everyone operating on it and ultimately affects our revenue.”
According to the report, other web publishers say Anthropic is swarming their sites and ignoring their requests to cease collecting their content to train its models.
PYMNTS has contacted Anthropic for comment but has not yet received a reply. The company told the FT it was looking into the Freelancer.com case and that it endeavored not to be “intrusive or disruptive.”
The dispute is happening at a moment when, as PYMNTS wrote earlier this month, “businesses are grappling with the unauthorized harvesting of their online content, prompting new defensive measures that could reshape the digital landscape.”
For example, the web infrastructure company Cloudflare recently unveiled a new tool against content scraping that could derail major AI companies’ training operations.
“The software is designed to prevent automated data collection and has the potential to reshape how AI models are developed and trained,” that report said.
And as companies scramble to safeguard their digital assets, industry experts predict a spike in demand for similar protective measures, potentially bringing about a new market for anti-AI scraping services.
When a businesses’ “information is scraped, especially in near real time, it can be summarized and posted by an AI over which they have no control, which in turn deprives the content creator of getting its own clicks — and the attendant revenue,” HP Newquist, executive director of The Relayer Group and author of “The Brain Makers,” told PYMNTS.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.