A sudden screen asks you to prove you’re human. You hesitate. The bandwidth drops. Somewhere, a newsroom’s servers tense.
Readers across the UK are meeting verification pages that challenge mouse clicks, VPNs and overzealous browser extensions. Behind the friction sits a harder fight: publishers blocking automated scraping and text or data mining by AI systems without a paid licence.
Why you are seeing a bot check
News Group Newspapers Limited, publisher of titles including the Sun, has tightened defences against non‑human traffic. The system looks at patterns that suggest automation: dozens of rapid requests, blocked cookies, disabled JavaScript, or network addresses linked to bulk crawlers. Genuine users can trip these alarms too, especially on public Wi‑Fi or when privacy tools mask identity.
Automated access, collection and text or data mining of publisher content sits behind a licence wall. AI training, machine learning and LLM harvesting fall under that rule.
Sometimes the check is brief. You pass a challenge, then continue reading. On other occasions, the page explains why access was halted and points to a route back in for human readers.
What News Group Newspapers says
The company sets clear lines: no bots, scrapers or text/data mining tools may lift content from its services. That includes systems built for AI, machine‑learning models or large language models. The policy sits within the terms and conditions that govern use of the sites.
Human and misflagged visitors can contact customer support at [email protected]. For paid, commercial content use or crawling, email [email protected] and request a licence.
The message is blunt because the stakes are real. Publishers argue that unlicensed scraping erodes revenue, undermines subscriptions and siphons value into products that may never credit or compensate the original reporting.
How to get back in
Quick fixes for readers
- Refresh the page and wait 60 seconds before retrying, to avoid rate‑limit triggers.
- Enable JavaScript and cookies. Many verification tools need them to prove you are real.
- Disable aggressive ad‑blocking, fingerprinting or script‑blocking extensions for the site.
- Turn off VPN or proxy temporarily, or switch to a UK endpoint with a fresh IP.
- Close background tab refreshers and news aggregators that hit pages in parallel.
- Restart the browser in a clean profile. If the issue persists, try another device or network.
- If you still hit the block, write to [email protected] with the time, IP, and a screenshot.
Common triggers and remedies
| Likely cause | What it looks like | What to try |
|---|---|---|
| High request rate | Pages load, then stall with repeated checks | Slow down, close auto‑refresh, reload after a pause |
| Blocked scripts | Blank verification widget or error text | Allow JavaScript and third‑party scripts for the site |
| VPN/proxy ranges | Immediate “are you human?” splash screen | Switch to a residential UK exit or turn VPN off |
| Privacy extensions | Challenge loops, missing cookies | Whitelist the domain, clear cookies, sign back in |
| Headless or automated browser | Access denied, reference ID shown | Use a regular browser; seek permission for automation |
The bigger picture: AI scraping and the law
Across Europe and the UK, the debate over text and data mining has moved from academic circles to boardrooms. The EU’s copyright rules permit certain data‑mining uses, but allow publishers to opt out through terms and machine‑readable signals. The UK has not created a blanket exception for commercial AI training. Media groups have seized that space, setting explicit bans and offering licences for paid, controlled access.
Technical measures back up the paperwork. Robots.txt blocks, challenge pages, and anomaly detection systems now sit on most major news sites. AI firms that ignore those signals risk more than reputational harm. Breaching terms can trigger contract claims. Circumventing protective measures can raise serious legal and ethical issues. That is why many models now advertise dedicated user‑agents and promise to respect opt‑outs, though publishers continue to test those assurances.
Publishers’ tools and the privacy balance
Verification technology has become subtler. Tools such as invisible challenges measure mouse movement, timing and device consistency rather than asking you to click on traffic lights. The aim is to spare human readers while blocking high‑volume scraping. This balance remains delicate. People with accessibility needs, older devices or strict privacy setups can suffer more friction. Teams are tuning thresholds so that real readers pass with minimal fuss, while bots face a hard stop.
What this means for you and your wallet
For everyday readers, the message is simple: you can keep reading, but your browser must look like a person is driving it. Cookies and scripts help prove that. For publishers, the sums are not abstract. Rights owners see major value in their archives, live reporting, and exclusive investigations. Unlicensed harvesting can drain that value into services that never pay for the work. As licensing deals expand, more AI projects will sit on formal contracts with rate limits, attribution terms and fees.
Human readers are welcome. Bots need permission, a user‑agent that can be identified, and a paid licence where required.
If you run automated systems
Developers who need content for legitimate products have options that avoid blocks and disputes. Build automation that respects the rules, and contact the publisher before you code a crawler. A short email can save months of rework.
- Read the site’s terms and conditions. Note any ban on automated access and data mining.
- Check robots.txt and published AI opt‑out signals. Do not circumvent them.
- Use a clear, verifiable user‑agent and an email contact in your requests.
- Throttle requests tightly. Think in pages per minute, not hundreds per second.
- Cache responsibly. Avoid scraping near real‑time content that changes rapidly.
- Seek a written licence for commercial use via [email protected].
- Prefer official feeds or APIs when offered. They reduce breakage and legal risk.
Practical scenarios and tips
Testing a newsroom site on a locked‑down laptop? Start by re‑enabling JavaScript for a single session and retrying without a VPN. If a hotel network shares the same IP among many guests, the system may treat it as suspicious. Tether to mobile data for a clean IP, then switch back once you’re through. If you see a reference ID on the error page, take a screenshot. Support staff can use it to trace the block and clear it faster.
Running research that needs snippets for non‑commercial analysis? Contact the publisher first. Many will grant time‑boxed access for research, provided safeguards exist. If your project grows into a product, move to a commercial licence. That shift keeps data quality stable and protects you from sudden blocks that can derail launches.
For site owners, consider humane verification. Options such as low‑friction challenges, rate‑based gating and device reputation score well without punishing readers. Pair that with clear messaging that explains why a check appeared, and where genuine users can get help. Transparency reduces frustration and cuts support tickets.
For readers who rely on privacy tools, create a site‑specific profile. Allow cookies and scripts only for trusted news domains. This compromise preserves protections elsewhere while keeping journalism usable. If you pay for a subscription, add your account to the whitelist so session cookies stick. Most blocks vanish once the browser can prove continuity between clicks.



If I’m on hotel Wi‑Fi with a VPN, am I basically doomed? How are you handling accessibility users who fail “invisible” checks or have stricter privacy defaults?
I, for one, welcome our new CAPTCHA overlords… just kidding. The quick fixes actually worked for me after disabling an extension 🙂