🚀 Is building a website too difficult? Let me guide you step by step—Learn about the 「Naibabiji WordPress Website Building Coaching Service」 →

How to Block the Spam Spider AhrefsBot with AhrefsBot Crawler IP List

In the previous article „How to Set Up Nginx Hotlink Protection in Baota Panel and LNMP Environments", it was mentioned that Naibabiji's site exceeded its traffic limit. Initially, it was thought to be due to hotlinking of images. After analyzing the logs, it was discovered that the AhrefsBot spam bot was crawling the site like crazy, making over 6,000 requests in less than a day. Damn it. I immediately researched how to block AhrefsBot.

What is AhrefsBot

AhrefsBot is a foreign search engine spider. However, for your website, it offers no benefits other than wasting resources.

Simply put, AhrefsBot is a crawler spider for marketing websites, responsible for analyzing your site's link information. This tool is useless for domestic users.

For a detailed introduction, you can check the English explanation on their official website: https://ahrefs.com/robot

AhrefsBot IP Ranges

After analyzing a day's worth of website logs, guess how many different AhrefsBot spider IPs were crawling the site data?

There were actually 561 IPs, and that's just from less than a day's log records.

The officially announced AhrefsBot crawler IP ranges are as follows:

54.36.148.0/24
54.36.149.0/24
54.36.150.0/24

195.154.122.0/24
195.154.123.0/24
195.154.126.0/24
195.154.127.0/24

Alright, since it's this insane, let's start figuring out how to block AhrefsBot's crawling.

Directly Block AhrefsBot IP Ranges

The server of the site crawled by AhrefsBot spider uses Alibaba Cloud. Alibaba Cloud's backend has security groups available, so directly blocking AhrefsBot's IP ranges is the simplest, most brutal, and immediately effective method.

Go to the Alibaba Cloud backend, enter your server list, click on the server's security group, and configure the security group rules.

禁止AhrefsBot蜘蛛

Configure as shown in the image above, and add all these IP ranges. (Naibabiji directly blocked all IPs in the 54.36.*.* and 195.154.*.* ranges.)

54.36.148.0/24
54.36.149.0/24
54.36.150.0/24

195.154.122.0/24
195.154.123.0/24
195.154.126.0/24
195.154.127.0/24

Block using robots.txt

Generally, any spider or crawler that follows robots rules can be blocked using robots.txt. AhrefsBot officially claims to follow this rule, but in reality, if you don't add this rule from the start, you won't know when its spider will recrawl your robots.txt file to update the crawling rules.

So, more violently, directly blocking IPs is faster. If you want to add it, the rule is as follows:

User-agent: AhrefsBot
Disallow: /

Block using Apache or Nginx

This method refers to the previous article:Methods to Block Specific Bots and Crawlers from Accessing WordPress Websites

If using Nginx, you can also add the following code snippet to your virtual host configuration file to block AhrefsBot.

if ($http_user_agent ~* AhrefsBot) {
   return 403;
}

 

🚀 Still feeling confused after reading the tutorial? Let me guide you step-by-step instead.

「Naibabiji WordPress Website Building Coaching」 — From selecting a domain and purchasing hosting to installing themes and publishing posts, I「ll guide you through every step, helping you avoid detours and reach your goals directly.

👉 Learn about Website Building Coaching Service
🔒

Comments are closed

The comment function for this article is closed. If you have any questions, please feel free to contact us through other channels.

×
二维码

Scan to Follow