A Perspective from Q&A with Baidu Engineer – Lee

Baidu had a major algorithm change in November 2010 to fight content farms. Many Baidu SEO’s find themselves suffering as their old tricks of building up multi-websites and selling offsite links to their clients websites don’t work anymore.


Since then the talk has been that link building is not that important now and Baidu cares more about the freshness and originality of content with user experience being the key.



Baidu Link Building Q&A

So the question is how much value can link building still bring to Baidu Optimization?  We took some of the Q&A from webmasters and Baidu engineer, Lee, to seek the answer.


Does Baidu spider crawl URL’s that don’t exist?

Lee: Baidu spider crawls URL’s that exists on the internet. If Baidu spider crawls a large number of URLs that don’t exist in your website, there might be two reasons. 1. Certain pages of your site use incorrect URL’s to link to other pages. 2. Other websites link to your website with the wrong URL.

The Take Out: Baidu crawls other website links to identify your pages.



I already set Robots.txt to disallow Baidu spider from visiting my site, so why are my pages are still displayed in the search engine result pages (“SERP”) of Baidu?

Lee: If other websites link to your pages that you disallow Baidu spider to crawl, those pages will still be displayed in the SERPs of Baidu. But the real content of those pages wouldn’t be crawled, indexed and shown by Baidu, because the content displayed in the SERPs of Baidu is only the description of your pages by other websites.

The Take Out: Baidu crawls other website links to find your page URL’s and displays your pages with the description appearing on other sites.



Why is my page title in the SERP of Baidu different from the actual page title?

Lee: The main reason could be that Baidu spider fails to extract the title tag, due to web design issues, for example sites designed with flash or Ajax, or because of Robots.txt disallow (Baidu won’t crawl the pages but will keep the URL), so the system scrapes text from other places and uses that text as the title.

The Take Out: Baidu crawls other website links to find your page URL and displays your page titles using text from other sites.



In Summary

From the above Q&A, we can see that the Baidu spider doesn’t comply with the commands from Robots.txt as well as the Google Robot does. Instead of simply being able to submit a sitemap.xml file to Google Webmaster tools, Baidu tends to depend on a wide variety of offsite links, as demonstrated above, to identify new pages or websites which have emerged online, and adds them to its indexation.


Hence Google best practice is not enough to increase your websites footprint on Baidu, especially for those multinational company websites, who tend to focus on Google only, and have multi-language content, which itself is a barrier for Baidu spider to visit. So link building really does continue to matter for companies in improving their presence on the major Chinese search engine, Baidu.