What is robots.txt_Correct WordPress robots writing methods and generation tools

For new webmasters, they may not necessarily know what the robots.txt file is used for. So they are even less aware of the relationship between robots.txt and website SEO.

Today, Naibabiji will share how to correctly write the robots.txt file to boost website SEO.

What is robots.txt

robots.txt, also known as the robots protocol, is a moral code commonly followed in the international internet community.

robots.txt is a text file located in the root directory of your website. It tells search engines which pages can be crawled and which cannot; it can block large files on the site, such as images, music, videos, etc., saving server bandwidth; it can block some dead links on the site, making it easier for search engines to crawl content; and it can set the sitemap link to guide spiders to crawl pages.

How to create a robots.txt file

You only need to use a text editor, such as Notepad, to create a text file named robots.txt, and then upload this file to the root directory of your website to complete the setup.

You can also userobots generation toolGenerate online.

How to write robots.txt rules

Simply creating a robots.txt file is not enough; the essence lies in writing appropriate robots rules for your own website.

robots.txt supports the following rules

User-agent: * 这里的*代表的所有的搜索引擎种类，*是一个通配符
Disallow: /admin/ 这里定义是禁止爬寻admin目录下面的目录
Disallow: /require/ 这里定义是禁止爬寻require目录下面的目录
Disallow: /ABC/ 这里定义是禁止爬寻ABC目录下面的目录
Disallow: /cgi-bin/*.htm 禁止访问/cgi-bin/目录下的所有以".htm"为后缀的URL(包含子目录)。
Disallow: /*?* 禁止访问网站中所有包含问号 (?) 的网址
Disallow: /.jpg$ 禁止抓取网页所有的.jpg格式的图片
Disallow:/ab/adc.html 禁止爬取ab文件夹下面的adc.html文件。
Allow: /cgi-bin/　这里定义是允许爬寻cgi-bin目录下面的目录
Allow: /tmp 这里定义是允许爬寻tmp的整个目录
Allow: .htm$ 仅允许访问以".htm"为后缀的URL。
Allow: .gif$ 允许抓取网页和gif格式图片
Sitemap: 网站地图 告诉爬虫这个页面是网站地图

It is recommended to use the webmaster tool's robots generation tool to write rules, which will be simpler and clearer.

robots generation tool

Naiba Tip: Tip: If Disallow: is not followed by a slash, it means allowing crawling of the entire site.

Recommended robots.txt rules for WordPress

After WordPress installation is complete, a virtual robots.txt rule file is created by default (meaning you cannot see it in the website directory, but you can access it via „网址/robots.txt"access)

Default rules are as follows:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

This rule means that all search engines are prohibited from crawling the content under thewp-adminfolder, but are allowed to crawl the/wp-admin/admin-ajax.phpThis file.

However, for SEO and security considerations, Naiba suggests refining the rules further.Below are the current robots.txt rules for Naibabiji.

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-content/plugins/
Disallow: /?s=*
Allow: /wp-admin/admin-ajax.php

User-agent: YandexBot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: YaK
Disallow: /

Sitemap: https://blog.naibabiji.com/sitemap_index.xml

The above rules add the following two lines to the default rules:

Disallow: /wp-content/plugins/ 
Disallow: /?s=*

Prohibit crawling of the/wp-content/plugins/folder and web pages with the URL/?s=*.

/wp-content/plugins/is the WordPress plugin directory, to avoid privacy risks from being crawled (e.g., some plugins have privacy leakage bugs that could be crawled by search engines).

Prohibit crawling of search result pages to prevent others from using them to boost ranking:

Web pages with the URL/?s=*, which is also a bug recently discovered by Naiba being exploited by SEO gray hat projects.

/?s=*URL is the default search results page for WordPress websites, as shown below:

Basically, the vast majority ofWordPress ThemesThe search page title is in the format of „keyword + site title“.

But this creates a problem: Baidu may crawl such pages. For example, one of Naiba's sites was unfortunately exploited by someone.

The last few rules are to prohibit specific search engines from crawling and a sitemap address link.Several Methods for Generating Sitemaps in WordPress_Recommended sitemap plugins