🚀 Is building a website too difficult? Let me guide you step by step—Learn about the 「Naibabiji WordPress Website Building Coaching Service」 →

How to Block the Spam Spider AhrefsBot (Including AhrefsBot Crawler IP List)

As mentioned in the previous article „How to Set Up Nginx Hotlink Protection in Baota Panel and LNMP Environments", one of Naiba's sites exceeded its traffic limit. Initially, it was thought to be caused by image hotlinking. After analyzing the logs, it was discovered that the spam robot AhrefsBot was crawling the site frantically, making over 6000 requests in less than a day. I immediately researched how to block AhrefsBot.

What is AhrefsBot?

AhrefsBot is a foreign search engine spider. However, for your website, it offers no benefits other than wasting resources.

Simply put, AhrefsBot is a crawling spider for marketing websites, responsible for analyzing your site's link information. This tool is useless for domestic users.

For a detailed introduction, you can check the English explanation on their official website. https://ahrefs.com/robot

AhrefsBot IP Ranges

Naiba analyzed a day's worth of website logs. Guess how many different AhrefsBot spider IPs came to crawl the website data?

There were actually 561 IPs, and that's from less than a day's log records.

The officially announced IP ranges for the AhrefsBot crawler are as follows:

54.36.148.0/24
54.36.149.0/24
54.36.150.0/24

195.154.122.0/24
195.154.123.0/24
195.154.126.0/24
195.154.127.0/24

Alright, since it's this extreme, let's start figuring out how to block AhrefsBot's crawling.

Directly Block AhrefsBot IP Ranges

The server of the site crawled by the AhrefsBot spider uses Alibaba Cloud. The Alibaba Cloud backend has security groups available, so directly blocking AhrefsBot's IP ranges is the simplest, most brute-force method with immediate effect.

Go to the Alibaba Cloud backend, enter your server list, click on the server's security group, and configure the security group rules.

禁止AhrefsBot蜘蛛

Configure according to the method in the image above, just add all these IP ranges. (Naiba directly blocked all IPs from 54.36.*.* and 195.154.*.*)

54.36.148.0/24
54.36.149.0/24
54.36.150.0/24

195.154.122.0/24
195.154.123.0/24
195.154.126.0/24
195.154.127.0/24

Block Using robots.txt

Generally speaking, any spider or crawler that follows robots rules can be blocked using robots.txt. AhrefsBot officially states it follows this rule, but in reality, if you didn't add this rule from the start, you wouldn't know when its spider would recrawl your robots.txt file to update the crawling rules.

So, more violently, directly blocking IPs is faster. If you want to add it, the rule is as follows:

User-agent: AhrefsBot
Disallow: /

Block Using Apache or Nginx

This method refers to a previous article:How to Block Specific Bots and Crawlers from Accessing a WordPress Website

If using nginx, you can also add the following code snippet to your virtual host configuration file to block AhrefsBot.

if ($http_user_agent ~* AhrefsBot) {
   return 403;
}

 

🚀 Still feeling confused after reading the tutorial? Let me guide you step-by-step.

「Naibabiji WordPress Website Building Coaching Service」—From choosing a domain and buying hosting, to installing a Theme and publishing content, I「ll coach you through every step, helping you avoid detours and reach your goal directly.

👉 Learn about the Website Building Coaching Service
🔒

Comments are closed

The comment function for this article is closed. If you have any questions, please feel free to contact us through other channels.

×
二维码

Scan QR Code to Follow

AI Website Building Assistant

🤖
Hello! I am the Naibabiji AI Assistant. How can I help you?
Quick Consultation: