In the previous article „How to Set Up Nginx Hotlink Protection in Baota Panel and LNMP Environments", it was mentioned that Naibabiji's site exceeded its traffic limit. Initially, it was thought to be due to hotlinking of images. After analyzing the logs, it was discovered that the AhrefsBot spam bot was crawling the site like crazy, making over 6,000 requests in less than a day. Damn it. I immediately researched how to block AhrefsBot.
What is AhrefsBot
AhrefsBot is a foreign search engine spider. However, for your website, it offers no benefits other than wasting resources.
Simply put, AhrefsBot is a crawler spider for marketing websites, responsible for analyzing your site's link information. This tool is useless for domestic users.
For a detailed introduction, you can check the English explanation on their official website: https://ahrefs.com/robot
AhrefsBot IP Ranges
After analyzing a day's worth of website logs, guess how many different AhrefsBot spider IPs were crawling the site data?

There were actually 561 IPs, and that's just from less than a day's log records.
The officially announced AhrefsBot crawler IP ranges are as follows:
54.36.148.0/24 54.36.149.0/24 54.36.150.0/24 195.154.122.0/24 195.154.123.0/24 195.154.126.0/24 195.154.127.0/24
Alright, since it's this insane, let's start figuring out how to block AhrefsBot's crawling.
Directly Block AhrefsBot IP Ranges
The server of the site crawled by AhrefsBot spider uses Alibaba Cloud. Alibaba Cloud's backend has security groups available, so directly blocking AhrefsBot's IP ranges is the simplest, most brutal, and immediately effective method.
Go to the Alibaba Cloud backend, enter your server list, click on the server's security group, and configure the security group rules.

Configure as shown in the image above, and add all these IP ranges. (Naibabiji directly blocked all IPs in the 54.36.*.* and 195.154.*.* ranges.)
54.36.148.0/24 54.36.149.0/24 54.36.150.0/24 195.154.122.0/24 195.154.123.0/24 195.154.126.0/24 195.154.127.0/24
Block using robots.txt
Generally, any spider or crawler that follows robots rules can be blocked using robots.txt. AhrefsBot officially claims to follow this rule, but in reality, if you don't add this rule from the start, you won't know when its spider will recrawl your robots.txt file to update the crawling rules.
So, more violently, directly blocking IPs is faster. If you want to add it, the rule is as follows:
User-agent: AhrefsBot Disallow: /
Block using Apache or Nginx
This method refers to the previous article:Methods to Block Specific Bots and Crawlers from Accessing WordPress Websites
If using Nginx, you can also add the following code snippet to your virtual host configuration file to block AhrefsBot.
if ($http_user_agent ~* AhrefsBot) {
return 403;
}

Comments are closed
The comment function for this article is closed. If you have any questions, please feel free to contact us through other channels.