What is bot traffic and how to stop it

Bot traffic is any non-human traffic to a website or app. The term often carries a negative connotation, but bot traffic is not inherently good or bad. It depends entirely on the purpose of the bots.

Some bots are essential for useful services such as search engines and digital assistants (e.g. Siri, Alexa). Most companies welcome these bots on their sites.

Other bots can be malicious: those used for credential stuffing, data scraping, and launching DDoS attacks. Even some of the more benign “bad” bots, such as unauthorized web crawlers, can disrupt site analytics and generate click fraud.

Using Tracking Rules you can exclude traffic from certain sources by filtering with IP addresses or domains.

How can bot traffic be identified?

Marfeel helps detect bot traffic by surfacing analytics anomalies. The following patterns are the hallmarks of bot activity:

Sawtooth wave pattern: Bots are often scheduled at a fixed frequency, producing a distinctive repeating wave in traffic charts.
Spike from an unexpected traffic source: A sudden surge in users from one particular source, especially Direct traffic, often indicates bot activity. Understanding how traffic attribution classifies sources helps distinguish legitimate visits from bots.
Spike from an unexpected location: A sudden surge in users from a region unlikely to have many speakers of the site’s native language can signal bot traffic.
Abnormally high pageviews: A sudden, unprecedented spike in pageviews without a clear editorial cause likely means bots are clicking through the site.
Unusual session duration: Session duration should remain relatively steady. An unexplained increase may indicate bots browsing at an unusually slow rate. An unexpected drop could mean bots are clicking through pages much faster than a human user would.

How does bot traffic hurt analytics?

Unauthorized bot traffic distorts analytics metrics: page views, bounce rate, session duration, geolocation of users, and conversions. These deviations create significant frustration for site owners because it becomes very difficult to measure real site performance. Efforts to improve the site, such as A/B testing and conversion rate optimization, are undermined by the statistical noise bots generate.

How to filter bot traffic from Marfeel

Marfeel supports discarding bot and synthetic traffic created by good citizen bots that identify themselves via a user agent.

How to validate the source of an IP

When identifying the source of an IP address, you can determine whether it belongs to an ISP or a hosting/cloud provider (where bots are typically hosted) by using:

Command line tools like whois and host
Equivalent web tools such as DomainTools and IPLocation

Check the Organization field. Names like AWS, Hetzner, OVH indicate the IP is likely assigned to a server, suggesting a bot. If the organization is an ISP like AT&T, O2, Vodafone, or Orange, the requests are more likely from real users.

Once you have identified the offending IPs, blacklist them using one of the solutions below.

How can websites manage bot traffic?

Several tools and strategies help mitigate abusive bot traffic:

robots.txt: The first step is including a robots.txt file. This file provides instructions for bots crawling the page and can be configured to prevent bots from visiting or interacting with a webpage. Only good bots will respect robots.txt; it will not stop malicious bots.
Bot management solutions: The most effective way to stop bad bot traffic is with a bot management solution that uses behavioral analysis to block malicious bots before they reach a website. Options include Cloudflare Bot Management, Datadome Bot Protection, Fastly Bot Management, and AWS WAF.
Rate limiting: A rate limiting (alternative) solution can detect and prevent bot traffic from a single IP address, though this still overlooks some malicious bot traffic.
CAPTCHA with bot detection: Deploying CAPTCHA challenges with bot detection mechanisms differentiates human users from bots. Options include Google reCAPTCHA and Datadome CAPTCHA.
Web Application Firewalls (WAFs): WAFs inspect incoming traffic and filter out suspicious requests. They block known bot IP addresses and patterns, reducing bot-related incidents. WAF functionality is available through CDN providers (Cloudflare WAF, Fastly WAF) or as an external module for web servers.
Traffic pattern monitoring: Regularly monitoring traffic patterns helps identify unusual spikes or suspicious activity. A Marfeel Operator can analyze a site’s traffic and identify suspicious network requests, providing a list of IP addresses to block with a WAF. This is a labor-intensive process and still only stops a portion of malicious bot traffic.

How can bot traffic be identified in analytics?

Bot traffic shows distinctive analytics anomalies: sawtooth wave patterns from scheduled requests, sudden spikes from a single traffic source (especially Direct), unexpected traffic from regions unlikely to speak the site’s language, abnormally high pageviews, and unusually high or low session durations.

How does bot traffic hurt website analytics?

Unauthorized bot traffic distorts key metrics including page views, bounce rate, session duration, geolocation data, and conversions. This statistical noise makes it difficult to measure real site performance and undermines efforts like A/B testing and conversion rate optimization.

What are the best ways to stop bot traffic?

Effective strategies include configuring a robots.txt file, deploying a bot management solution (Cloudflare, Datadome, Fastly, or AWS WAF), applying rate limiting by IP address, implementing CAPTCHA with bot detection, using a Web Application Firewall, and monitoring traffic patterns to identify and block offending IPs.