It’s normal for website owners to have myriad questions about Baidu and its limits on content and character lengths.
“I have a webpage on which the content is more than 3,000 lines and it includes Chinese and English characters. The Baidu snapshot and the text simulated through the crawl diagnostic tool shows as incomplete. Will this affect my website performance on Baidu? If so, how big is the impact?”
We broke this broad issue down into 4 practical Q&As—and confirmed their accuracy with Baidu engineers—to help you better understand Baidu and its content and character limits.
Q1: Does Baidu limit the length of webpage content?
A1: There is no limit to the length of the webpage content.
There is, however, a limit on the length of the source code. If the source code is too long, only the code that appears before the cut-off will be included.
At the time of publishing this article, we could not find any official documentation on the exact limit of the source code length. So, for now just bear in the mind that the simpler the source code, the better.
Q2: If the snapshot shows the page is incomplete, does that mean Baidu did not index the page properly?
A2: Not necessarily.
The snapshot generation is affected by many factors. And there are many reasons for an incomplete display. We cannot simply assume that it is not indexed correctly.
Q3: If the crawl diagnostic tool shows the page is incomplete, does that mean Baidu did not index the page properly?
The tool only shows the first 200KB of the page source code. When Baidu engineers designed the tool, they researched common webpage content sizes. Generally speaking, it would have been enough to show just the first 100KB, but they doubled that to be safe.
Q4: Does Baidu require any special characters on the webpage?
There is no such requirement. Baidu does not have any stipulations for special characters on the page.