What is an XML sitemap? A fundamental SEO tool!
Before starting any SEO project, there is one thing we always need to check first: Is my website getting indexed properly?
If your webpages aren’t getting indexed, or even crawled by search engines, it is pointless to do optimization.
We all know that such technical issues can cause headaches (we really know!), but not all technical SEO implementation is that hard to understand or execute.
An XML sitemap is one of those seemingly technical but actually simple SEO implementations; it is an easy-to-use tool that can solve lots of problems on crawlability and indexation.
Here, we give you insights and tips on how to best implement your XML sitemap.
So what exactly is an XML sitemap?
XML, standing for Extensible Markup Language, is a format specifically designed to be easy for search engine crawlers to interpret, simplifying the process of sitemap reading. An XML sitemap is a list of webpages on your website. You can include the URLs that you want Google to index in your XML sitemap and improve your website’s crawlability as a result.
In addition to the standard XML sitemap, which links all the website pages you’d like Google to crawl and index, there are several other commonly utilized sitemaps:
- Image Sitemap: This includes all images on your site, assisting Google in locating them easily.
- Video Sitemap: This sitemap aids Google in crawling, indexing, and understanding the video content present on your site.
- News sitemap: this allows the site owner to control which content is submitted to Google News and helps Google quickly find all the news content on your site.
So how does an XML sitemap help search engines crawl your website content faster and easier? As the image below illustrates, Google can find page 9 immediately with an XML sitemap. In contrast, Google will have to jump through 5 other pages to find it if the sitemap does not exist.
Image caption: Sitemaps can facilitate Google’s crawling of your website
Alt txt: On the left is an illustration of a sitemap with 9 pages. On the right is the structure of a website with 9 pages and 6 levels.
And that is why the XML sitemap is important for SEO. If a search engine cannot crawl and index your webpage, how can it send traffic to you?
So, when doing SEO, a websites’ crawlability should always be the first priority.
how to check if you have a sitemap?
Determining whether a website has a sitemap can be crucial for SEO efforts. Here are four different methods you can utilize to find out:
One commonly used method to verify the presence of a sitemap is manually checking a few standard locations. Simply enter the URL into your browser and test it with various adjustments. The most frequently used variation is:
Then you’ll be directed to a standard sitemap where you can identify the location of all URLs that you wish for Google to crawl and index. Also displayed will be the last modified date, change frequency, and priority.
image caption: An example of a sitemap
alt txt: an example of a sitemap
In instances where your site hosts more than one sitemap, they are typically organized into an index. The URL for this sitemap index usually takes the following form:
Image caption: An example of a sitemap index
Alt txt: an example of a sitemap index
Clicking each sitemap within the index will guide you to individual sitemaps, where you can view detailed information regarding the URLs.
If the above URL variations aren’t successful, here are some alternatives you can explore:
Use Search Operators
Alternatively, we can leverage Google search operators to discover the sitemap. Search operators are distinct phrases you can input into the search bar to return more precise results. In this case, we particularly use the “site”, “filetype”, and “inurl” search operators as below:
Image caption: An example of using site operators to find the sitemap
Alt txt: an example of using site operators to find the sitemap
These two commands should help you in discovering all sitemaps related to the website, provided they exist and have been indexed by Google.
Check the Robots.txt File
A robots.txt file communicates to search engine crawlers which URLs on your site can be accessed and which ones are off-limits. If it follows best practices, the location of your website’s sitemap should be in the robots.txt file. You can access the robots.txt file by entering the following into your browser:
Image caption: An example of using robots.txt file to find the sitemap
Alt txt: an example of using robots.txt file to find the sitemap
Next, search for “Sitemap” within the robots.txt file. You should then be able to identify the location of your website’s sitemap.
Check Google Search Console
Finally, if you have access to Google Search Console (GSC), you also have another avenue to locate the sitemap. After logging in to GSC, look for “Sitemaps” under the “Index” section. Here, you’ll see a section titled “Submitted sitemaps.” If a sitemap has been previously submitted, you should be able to find it in this area.
Image caption: Find the sitemap under the Indexing section in Google Search Console
Alt txt: find the sitemap under the Indexing section in Google Search Console
Image caption: Find the sitemap in Submitted sitemaps in Google Search Console
Alt txt: find the sitemap in Submitted sitemaps in Google Search Console
WANT DIGITAL INSIGHTS STRAIGHT TO YOUR INBOX?
Do I need an XML sitemap?
If, after deploying all of the methods mentioned above, you’re still unable to locate a sitemap, it’s likely that your site doesn’t possess one. As such, you might be wondering: is the XML sitemap a must if we want our websites to be crawled and indexed by Google? Well technically, the answer is no. As suggested by Google, if a website has its webpages properly linked, then their web crawlers should be able to discover most of the website.
However, it is never harmful to make Google’s job easier. Google particularly recommends websites that are new, large, and with webpages that are not well-linked to include an XML sitemap, so that Google will not overlook some of its webpages.
Including a URL in our XML sitemap doesn’t necessarily mean that Google will crawl your page as the XML sitemap only works as a suggestion to Google. Still, submitting the sitemap to Google can increase the chance that your webpage gets crawled. So again—it’s no harm to make Google’s job easier!
TIP TO CREATING AN SEO XML SITEMAP: ONLY INCLUDE URLS THAT YOU WANT GOOGLE TO INDEX
The name “XML sitemap” may give you an impression that a sitemap should be a roadmap of the whole website. But that is wrong—you don’t need to include every page of your website in the sitemap. Instead, you should only include webpages that you want Google to crawl.
When Google’s web crawler comes to a website, the crawler has a crawl budget to limit the number of webpages that it will crawl. You won’t want Google’s crawler to waste the crawl budget for visiting webpages that you don’t need indexed. Instead, you will hope the crawler focuses on those optimized webpages.
Including URLs in the XML sitemap indicates to Google that those webpages are more important than pages that are not included. It is a signal to Google to prioritize crawling those webpages ahead of others. As a result, the sitemap can help your website to efficiently use crawl budget.
So what kind of webpages should be excluded from the XML sitemap? Here are our recommendations:
- Pages requiring log-in
- Utility pages (e.g.: review forms, wish lists, etc.)
- Duplicate pages
- Paginated pages
- URLs with parameters or session IDs
- Any URLs that are redirected
- Any URLs that no longer work (broken URL)
- Pages disallowed by robots.txt
- Pages with noindex
Other best practices for sitemaps
Besides the above-mentioned rule, you should also follow these best practices when creating sitemaps:
- Break up large sitemaps: In Individual sitemaps are limited to either 50MB or 50,000 URLs. If your website is substantial, you’ll need to divide your sitemap into multiple smaller ones. You can create an index to accommodate all individual sitemap files and submit this index file to Google.
- Use UTF-8 encoding: Google stipulates that all sitemap files must be UTF-8 encoded.
- Use absolute URLs in the sitemap: Google will crawl your site exactly as listed in the sitemap, so you should use fully qualified, absolute URLs. For example, if your website is www.example.com, use www.example.com/blog/, not simply /blog/.
- Use the priority tags sparingly: Within your sitemap, you can designate the priority of each URL with a number from 0.1 to 1.0. While this can be useful in guiding Google on which URLs should be prioritized for crawling, Google may not necessarily adhere strictly to your preferences. Ultimately, Google will crawl your site according to its own set of rules.
- Align your sitemap with Robots.txt file: It is important that your sitemap works together with the robots.txt file. In cases where you’ve disallowed a page via your robots.txt file, ensure consistency by excluding it from your sitemap. Inconsistencies may lead to sending mixed signals to search engines. The coordination of your sitemap and robot.txt file reinforces the clarity of your website’s navigational structure, ensuring a more efficient and effective indexing by search engines like Google.
How do I submit my XML sitemap to Google?
Once you have created your XML sitemap and uploaded it to your domain (your URL will end with “/sitemap.xml”), you can submit the sitemap to Google via Google Search Console.
Once you login to Google Search Console, you can find Sitemaps on the left, under Index.
Image caption: Submit a sitemap under the Indexing section in Google Search Console
Alt txt: submit a sitemap under the Indexing section in Google Search Console
After clicking Sitemaps, you can input your website’s sitemap URL in the Add a new sitemap column, and submit to Google.
And it’s done! It may take a few days for Google to process your submission and crawl your website with the assistance of an XML sitemap. You can check the status in Sitemaps.