🚀 Is building a website too difficult? Let me guide you step by step—Learn about the 「Naibabiji WordPress Website Building Coaching Service」 →

What is robots.txt_Correct WordPress robots.txt Writing Methods and Generation Tools

For new website owners, they may not necessarily know what the robots.txt file is used for, let alone the relationship between robots.txt and website SEO. Today, Naibabiji shares how to correctly write a robots.txt file to boost website SEO.

What is robots.txt

robots.txt, also known as the robots protocol, is a widely accepted ethical standard in the international internet community. robots.txt is a text file located in the root directory of your website, used to inform search engines which pages can be crawled and which cannot; it can block some larger files on the website, such as images, music, videos, etc., to save server bandwidth; it can block some dead links on the site, making it easier for search engines to crawl website content; it can set up sitemap links to guide spiders in crawling pages.

How to create a robots.txt file

You only need to use a text editing software, such as Notepad, to create a text file named robots.txt, and then upload this file to the website root directory to complete the creation. You can also userobots generation toolto generate it online.

How to write robots.txt rules

Simply creating a robots.txt file is not enough; the essence lies in writing rules suitable for your own website.robots.txt supports the following rules
User-agent: * 这里的*代表的所有的搜索引擎种类,*是一个通配符
Disallow: /admin/ 这里定义是禁止爬寻admin目录下面的目录
Disallow: /require/ 这里定义是禁止爬寻require目录下面的目录
Disallow: /ABC/ 这里定义是禁止爬寻ABC目录下面的目录
Disallow: /cgi-bin/*.htm 禁止访问/cgi-bin/目录下的所有以".htm"为后缀的URL(包含子目录)。
Disallow: /*?* 禁止访问网站中所有包含问号 (?) 的网址
Disallow: /.jpg$ 禁止抓取网页所有的.jpg格式的图片
Disallow:/ab/adc.html 禁止爬取ab文件夹下面的adc.html文件。
Allow: /cgi-bin/ 这里定义是允许爬寻cgi-bin目录下面的目录
Allow: /tmp 这里定义是允许爬寻tmp的整个目录
Allow: .htm$ 仅允许访问以".htm"为后缀的URL。
Allow: .gif$ 允许抓取网页和gif格式图片
Sitemap: 网站地图 告诉爬虫这个页面是网站地图
It is recommended to use the webmaster tool's robots generation tool to write rules, as it is simpler and clearer.robots generation tool
Naiba's Tip:Note: If Disallow: is not followed by a slash, it means allowing crawling of the entire site.

Recommended robots.txt rules for WordPress

After WordPress is installed, it will virtually create a robots.txt rule file by default (meaning you cannot see it in the website directory, but you can access it via „网址/robots.txt"). The default rules are as follows:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
This rule means that all search engines are prohibited from crawling the contents under thewp-adminfolder, but are allowed to crawl the/wp-admin/admin-ajax.phpfile. However, for website SEO and security considerations, Naiba suggests improving the rules further.Below are the current robots.txt rules of Naibabiji.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-content/plugins/
Disallow: /?s=*
Allow: /wp-admin/admin-ajax.php

User-agent: YandexBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: YaK
Disallow: /

Sitemap: https://blog.naibabiji.com/sitemap_index.xml
The above rules add the following two lines to the default rules:
Disallow: /wp-content/plugins/ 
Disallow: /?s=*
Disallow crawling/wp-content/plugins/Folders and URLs are/?s=*web pages./wp-content/plugins/is the WordPress Plugin directory, avoid being crawled to prevent privacy risks (for example, some plugins have privacy leakage bugs that could be crawled by search engines.)Disallow crawling of search result pages to prevent others from using them to boost ranking:URL is/?s=*web page, this is also a bug recently discovered by Naiba that is being exploited by SEO gray-hat projects./?s=*URL is the default search results page of a WordPress website, as shown in the figure below:搜索网址结果Basically, the vast majority ofWordPress Themesearch page titles are in the form of „keyword + website title“ combination. But this creates a problem: Baidu has a chance to crawl such pages. For example, Naiba has a site that unfortunately was exploited.网页快照The following rules are to disallow specific search engine crawling rules and a sitemap address link,Several Methods for WordPress to Generate Sitemaps_Recommended Sitemap Plugins

How to check if robots.txt is effective

After creating and writing the robots.txt rules, you can use Baidu Webmaster's robots detection tool to check if it takes effect.Baidu robots detectionHowever, Baidu's tool does not support detection for HTTPS websites; you can use Aizhan's tool instead.Aizhan robots detection Related articles: Limit search engine crawl rates like Bing to reduce server load

🚀 Still feeling confused after reading the tutorial? Let me guide you step-by-step instead.

「Naibabiji WordPress Website Building Coaching」 — From selecting a domain and purchasing hosting to installing themes and publishing posts, I「ll guide you through every step, helping you avoid detours and reach your goals directly.

👉 Learn about Website Building Coaching Service
🔒

Comments are closed

The comment function for this article is closed. If you have any questions, please feel free to contact us through other channels.

×
二维码

Scan to Follow