Readers across Britain meet sudden bot checks as news sites fight scraping. Real people get mislabelled, blocked, and baffled daily.
You load a page, a stark warning appears, and your morning read stalls. Publishers say they must clamp down, while legitimate visitors wonder why a site thinks their taps and swipes look robotic. The stand-off now touches millions, and it is reshaping how we all reach the news.
What triggered that warning on your screen
News Group Newspapers, the publisher behind titles including The Sun, bars automated access to its content. That ban covers scraping, text or data mining and any use for AI, machine learning or large language models. The rules sit in its terms and conditions. The system can flag patterns that resemble scripts running at inhuman speed. That is how an ordinary reader can suddenly see a barricade.
Automated access and text/data mining are prohibited, including for AI and LLMs. Commercial users must seek permission.
Occasionally a detection tool misreads normal browsing. Quick scrolling on a mobile, a privacy extension that strips browser hints, or a VPN exit that hundreds share can all look suspicious. The company recognises that genuine users get misidentified. It asks blocked readers to contact customer support at [email protected]. Those seeking licensed, commercial access should write to [email protected].
The bigger picture: bots outnumber people on parts of the web
Industry analysts estimate that automated systems now account for roughly half of global traffic. One widely cited figure sits near 49%. That surge includes benign crawlers and “bad bots” that hoover up content. Publishers fear copycat sites, server strain, and unlicensed AI training that undercuts their journalism. Readers feel the collateral damage when systems err on the side of caution.
- Revenue risk: mass scraping repackages reporting without paying for it.
- Server resilience: automated hits at scale degrade performance for real readers.
- Legal exposure: unapproved reuse can breach copyright and contracts.
- Trust: AI models can regurgitate outdated or warped snippets without context.
Verification walls are not designed to punish readers; they exist to keep journalism sustainable and usable.
How publishers decide what looks “automated”
Signals your browser sends
Sites assess user-agent strings, time to load scripts, cookie support, and subtle fingerprints such as canvas draws. Headless browsers, blocked JavaScript, or missing headers raise suspicion. A household router used by dozens of devices can also blur the picture.
Behavioural clues
Tools watch how you scroll, dwell and click. Ten links opened in a second? That rarely matches human rhythm. Repeating the exact same pattern across multiple pages screams automation.
Network fingerprints
Datacentre IP ranges, anonymous proxies, and VPN endpoints get more scrutiny. If thousands of visits arrive from the same endpoint within minutes, some will trip the wire.
| Trigger | What it means | What to try |
|---|---|---|
| Blocked JavaScript or cookies | Site cannot run checks to verify you | Enable scripts and first‑party cookies |
| VPN or proxy | Traffic appears to come from a shared or risky IP | Disable VPN, or switch to a residential endpoint |
| Rapid-fire clicks | Behaviour resembles automation | Slow down, let pages load fully |
| Privacy extensions | Missing signals confuse fraud filters | Allow the site, reduce aggressive blocking on news pages |
| Multiple tabs per second | Scraper pattern suspected | Open fewer tabs, use on-site search instead |
Your next steps if you are blocked
- Refresh the page and complete any visual or audio test carefully.
- Turn off VPNs or anonymous proxies and retry on your normal connection.
- Allow essential cookies and enable JavaScript in your browser settings.
- Disable heavy privacy or scraping extensions for the site, then reload.
- Update your browser. Old versions often trigger stricter checks.
- Try a different device on the same network to isolate the cause.
- Contact support at [email protected] if the issue persists.
Stuck for work or research purposes? Request licensed access at [email protected] to avoid automated blocks.
Why the AI rush makes this more urgent
Generative AI models learn at scale, and newsrooms argue that unlicensed training siphons value from human reporting. Contracts and website terms now set clear boundaries: no automated collection, no text/data mining for models, and no resale. Some publishers pursue formal licences with tech firms. Others restrict bot access at the edge, using fingerprints and rate limits.
Robots.txt alone lacks teeth. Enforcement now leans on terms of service, technical barriers and, when needed, legal action. That approach lets publishers protect archives while still welcoming real readers and approved partners.
The privacy question
Verification systems increasingly rely on behavioural signals. That raises fair concerns about data use. Responsible sites outline the legal basis, typically legitimate interests, and define short retention windows. You can ask what is collected, request corrections, or object to profiling under UK data law. Transparency pages should explain the tests that run and how to contact a data protection lead.
A quick checklist to avoid being mistaken for a bot
- Keep a modern browser with security patches current.
- Permit first‑party cookies and the site’s core scripts.
- Limit rapid tab sprees; use in‑page navigation where possible.
- Sign in if the publisher offers accounts; identity reduces friction.
- Avoid datacentre VPN endpoints; choose a residential route when privacy matters.
- Whitelabel reputable news domains in privacy tools you trust.
What the message really tells you
The wording is direct because the stakes are high. It warns that automated access is not allowed. It specifies that text and data mining, including for AI, machine learning and LLMs, is off limits. It points you to two clear contacts: [email protected] for genuine readers facing trouble, and [email protected] for commercial requests. Those details cut through the confusion and show where the gate opens.
Practical extras for readers and researchers
Set up a small test: visit a news homepage on Wi‑Fi with a clean browser profile, then repeat on a VPN using the same device. If one path triggers a challenge, you have found the culprit. Adjust extensions one by one until the warning disappears. That method saves time when deadlines loom.
For academic teams, consider a licensed content feed or an on-site reading room with stable IPs and human authentication. That route avoids blacklisting and preserves reliable access during peak hours. Publishers often accommodate such arrangements when approached early and transparently.
Small tweaks keep you reading: allow core scripts, pace your browsing, and use the official channels when you need help.



Is the 49% bot-traffic stat independently verified, or just one vendor’s estimate?