How the Baidu spider reacts to common HTTP status codes

 

There is a semi-official group on Baidu Tieba where Baidu tech staff Lee answers SEO questions from webmaster representatives.

 

According to a recent post from Lee, he reveals how the Baidu spider treats different HTTP status codes and provides some suggestions when using those status codes.

 

 

HTTP Codes

 

404 Not Found

If the URL returns the status code 404, Baidu will think the page is no longer in service and normally will also remove the page from the search results. If the Baidu spider discovers this URL again, it will not crawl the page.

 

503 Service Unavailable

If the URL returns the 503 status code, Baidu will think the website is temporarily not able to be visited. It’s normally caused by the temporary shutdown of the website or limited bandwidth. The Baidu spider will not remove the URL from its database and will visit again later.

 

By the next time the spider comes, if it returns normal, the spider will crawl the site as normal. If it still returns 503, the Baidu spider will still keep coming back a few times. However, if the page keeps returning the 503 status code, Baidu will treat it as a broken link and remove it from its search results.

 

403 Forbidden

In the case when the URL returns the code 403, Baidu will think the website is forbidden to visit the page for the time being. Under such circumstances, if it’s a new URL to the Baidu spider, Baidu will not crawl it and will come back again later to check.

 

If it’s an existing URL, Baidu will not remove it immediately and will come back again later to check. However, if the Baidu spider keeps getting the 403 status code, it will eventually remove the URL from its search results.

 

301 Moved Permanently

If the status code 301 is returned, Baidu will think the page is redirected to the new URL. Using a 301 is recommended when migrating a website, changing a domain or updating a website. Lee also admits that it takes a bit longer for Baidu to process 301’s.

 

Lee also made some suggestions on choosing the right HTTP Status Code for different situations as follows:

 

1.       When the website is temporarily down and people cannot open web pages, try using 503’s instead of 404’s. The 503 tells the Baidu spider the site is temporarily down and please come back later.

 

2.       When you experience bandwidth pressure from spider crawling, again use a 503 rather than a 404 to make sure the spider will come at another time.

 

3.       Sometimes you want Baidu to index completed pages and reviewed content. Then you can use the 403 status code for the new content or pages until you complete and review the pages.

 

4.       When changing domains or migrating websites, use 301’s to redirect the old URL to new URL.