#web-scraping
#web-scraping

Tech industry

The TechBeat: Downside Liquidity: A Hypothesis on Short Pools for EVM (7/22/2025) | HackerNoon

Tech industry

The TechBeat: Welcome to the Museum of AI Hallucinations (7/20/2025) | HackerNoon

Tech industry

The TechBeat: Downside Liquidity: A Hypothesis on Short Pools for EVM (7/22/2025) | HackerNoon

more#cloud-storage

fromMarcin Wanago Blog - JavaScript, both frontend and backend

The HackerNoon Newsletter: Outsmarting Akamais Bot Detection with JA3Proxy (7/19/2025) | HackerNoon

The Machine Economy represents not just process optimization but a profound shift in the underlying forces that drive economics, as machines take more control over economic functions.

Tech industry

#data-extraction

Web frameworks

Web Scraping with Playwright

5 months ago

Scala

Scala Web Scraping: Step-by-Step Tutorial 2025

fromMarcin Wanago Blog - JavaScript, both frontend and backend

Artificial intelligence

What Does Your AI Agent Need to Conquer the Web? | HackerNoon

fromRaymondcamden

6 days ago

Web development

Extracting Data from Web Pages with AgentQL and BoxLang

Web frameworks

Web Scraping with Playwright

5 months ago

Scala

Scala Web Scraping: Step-by-Step Tutorial 2025

Artificial intelligence

What Does Your AI Agent Need to Conquer the Web? | HackerNoon

fromRaymondcamden

6 days ago

Web development

Extracting Data from Web Pages with AgentQL and BoxLang

more#data-extraction

#large-language-models

Web development

Scrape Smarter, Not Harder: Let MCP and AI Write Your Next Scraper for You | HackerNoon

Business intelligence

Teaching Your AI to Read: A Guide to Scraping, RAG, and Smart Data Insights | HackerNoon

Web development

Scrape Smarter, Not Harder: Let MCP and AI Write Your Next Scraper for You | HackerNoon

more#large-language-models

Business intelligence

Teaching Your AI to Read: A Guide to Scraping, RAG, and Smart Data Insights | HackerNoon

Kasada Anti-Bot Bypass Techniques: Save Money with These Open-Source Solutions | HackerNoon

The easiest way to detect when a website is using Kasada is by asking it for Wappalyzer, which has a browser extension you can use while visiting a website to detect its tech stack.

E-Commerce

#internet-security

Privacy technologies

Paid proxy servers vs free proxies: Is paying for a proxy service worth it?

Privacy technologies

Anubis: Fighting off the hordes of LLM bot crawlers

Privacy technologies

Paid proxy servers vs free proxies: Is paying for a proxy service worth it?

Privacy technologies

Anubis: Fighting off the hordes of LLM bot crawlers

more#internet-security

This open-source bot blocker shields your site from pesky AI scrapers - here's how

F5 reports that over half of all web visits originate from data scrapers like OpenAI and Google, raising concerns about the impact of AI on online resources.

Privacy technologies

#cloudflare

fromArs Technica

Cloudflare turns AI against itself with endless maze of irrelevant facts

Cloudflare introduces 'AI Labyrinth' to combat unauthorized AI web scraping by serving fake content that wastes scraper resources.

9 months ago

Cloudflare reins in AI scraper bots with new Audit panel

Cloudflare enhances AI bot defense for customers, enabling analytics on web scrapers to improve control over unwelcome content.

Cloudflare is luring web-scraping bots into an 'AI Labyrinth'

Cloudflare's new AI Labyrinth tool confuses web scrapers with decoy pages instead of blocking them.

JavaScript

Bypassing JavaScript Challenges for Effective Web Scraping | HackerNoon

Privacy technologies

Cloudflare will now block AI crawlers by default

fromWIRED

Privacy professionals

Cloudflare Is Blocking AI Crawlers by Default

fromArs Technica

Cloudflare turns AI against itself with endless maze of irrelevant facts

Cloudflare introduces 'AI Labyrinth' to combat unauthorized AI web scraping by serving fake content that wastes scraper resources.

9 months ago

Cloudflare reins in AI scraper bots with new Audit panel

Cloudflare enhances AI bot defense for customers, enabling analytics on web scrapers to improve control over unwelcome content.

Cloudflare is luring web-scraping bots into an 'AI Labyrinth'

Cloudflare's new AI Labyrinth tool confuses web scrapers with decoy pages instead of blocking them.

JavaScript

Bypassing JavaScript Challenges for Effective Web Scraping | HackerNoon

Privacy technologies

Cloudflare will now block AI crawlers by default

fromWIRED

Privacy professionals

Cloudflare Is Blocking AI Crawlers by Default

more#cloudflare

#data-collection

3 years ago

Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon

Web scraping is essential for journalists to extract public information and hold authorities accountable.

Privacy professionals

Web Scraping in 2025: Staying on Track with New Rules | HackerNoon

Artificial intelligence

AI and Proxies: Are They Connected? | HackerNoon

Privacy professionals

Scraping Proxies: Why They're a Game-Changer for Modern Web Scraping

E-Commerce

Antidetect Browser + Automation: A Safe Setup for Web Scraping and Botting

3 years ago

Behind the Scenes of Using Web Scraping and AI in Investigative Journalism | HackerNoon

Web scraping is essential for journalists to extract public information and hold authorities accountable.

Privacy professionals

Web Scraping in 2025: Staying on Track with New Rules | HackerNoon

Artificial intelligence

AI and Proxies: Are They Connected? | HackerNoon

Privacy professionals

Scraping Proxies: Why They're a Game-Changer for Modern Web Scraping

E-Commerce

Antidetect Browser + Automation: A Safe Setup for Web Scraping and Botting

more#data-collection

4 weeks ago

This proxy provider I tested is the best for web scraping - and it's not IPRoyal or MarsProxies

Oxylabs offers a significantly larger pool of residential proxy machines, boasting over 175 million proxies compared to competitors who have fewer than 1 million.

Marketing tech

#nodejs

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

Node JS

How to Export Your Scraped Data to Json, CSV, or a Database (node.js)

more#nodejs

Reddit sues Anthropic for scraping its users' content without consent

Reddit sues Anthropic for breaching user privacy by scraping content without consent, amid increasing legal challenges to AI content usage.

#bots

fromNature

Artificial intelligence

Web-scraping AI bots cause disruption for scientific databases and journals

Marketing tech

Bots now generate majority web traffic

fromNature

Artificial intelligence

Web-scraping AI bots cause disruption for scientific databases and journals

Marketing tech

Bots now generate majority web traffic

more#bots

fromSpeckyboy Design Magazine

How to Combat AI Bot Traffic on Your Website - Speckyboy

AI tools significantly aid web development but raise concerns over copyright and resources.

Blocking AI bots from scraping websites poses challenges for developers.

How to Build a No-Limits Stock Market Scraper with Python | HackerNoon

Building a custom web scraping solution allows for unrestricted access to financial data without the limitations of traditional APIs.

E-Commerce

fromEntrepreneur

How Web Data Helps You Stay Ahead of the Competition | Entrepreneur

Ecommerce businesses need to leverage public web data for better decision-making across industries.

Bootstrapping

How to Build a No-Limits Stock Market Scraper with Python | HackerNoon

Building a custom web scraping solution allows for unrestricted access to financial data without the limitations of traditional APIs.

E-Commerce

fromEntrepreneur

How Web Data Helps You Stay Ahead of the Competition | Entrepreneur

Ecommerce businesses need to leverage public web data for better decision-making across industries.

more#data-analysis

#ai

Wikimedia Foundation bemoans AI bot bandwidth burden

Web-scraping bots are straining Wikimedia's resources, increasing bandwidth usage by 50% since January 2024, heavily impacting project sustainability.

Women in technology

The HackerNoon Newsletter: TechWomen is Back Online! (3/28/2025) | HackerNoon

OMG science

The HackerNoon Newsletter: How The Internet Will Pay You (4/6/2025) | HackerNoon

Artificial intelligence

Wikimedia is dealing with a 50 percent increase in bandwidth due to AI crawlers

Wikimedia Foundation bemoans AI bot bandwidth burden

Web-scraping bots are straining Wikimedia's resources, increasing bandwidth usage by 50% since January 2024, heavily impacting project sustainability.

Women in technology

The HackerNoon Newsletter: TechWomen is Back Online! (3/28/2025) | HackerNoon

OMG science

The HackerNoon Newsletter: How The Internet Will Pay You (4/6/2025) | HackerNoon

Artificial intelligence

Wikimedia is dealing with a 50 percent increase in bandwidth due to AI crawlers

Cryptocurrency

The TechBeat: Bybit's $1.5 Billion Hack Proves Crypto's Biggest Flaw Isn't the Blockchain (4/7/2025) | HackerNoon

Cryptocurrency

The TechBeat: Swift init(), Once and for All (4/5/2025) | HackerNoon

Cryptocurrency

The TechBeat: Your Next Tech Job? Vibe Coding (4/3/2025) | HackerNoon

Cryptocurrency

The TechBeat: Bybit's $1.5 Billion Hack Proves Crypto's Biggest Flaw Isn't the Blockchain (4/7/2025) | HackerNoon

Cryptocurrency

The TechBeat: Swift init(), Once and for All (4/5/2025) | HackerNoon

Cryptocurrency

The TechBeat: Your Next Tech Job? Vibe Coding (4/3/2025) | HackerNoon

A Guide on How to Legally Web Scrape EU Data | HackerNoon

The Markup emphasizes the importance of web scraping for data journalism while navigating legal risks, especially in the EU.

Privacy technologies

fromArs Technica

AI bots strain Wikimedia as bandwidth surges 50%

AI crawlers are circumventing established rules, creating challenges for content platforms.

Wikimedia is focusing on a systemic initiative to address scraping issues and protect its infrastructure.

Privacy technologies

Brave wants court to endorse scraping of News Corp content

Brave's legal action seeks to protect its AI summaries from potential copyright claims by News Corp.

Marketing tech

fromForbes

New Data Shows Just How Badly OpenAI And Perplexity Are Screwing Over Publishers

AI-powered search engines are sending significantly less referral traffic to news sites compared to traditional search engines.

Web frameworks

fromInfoWorld

Get started with async in Python

Asynchronous programming in Python allows handling multiple tasks efficiently, reducing waiting times.