SEO Relevant Pages in XML Sitemaps

Now, let’s say Google crawls those 475 pages and algorithmically decides that 175 are “A” grade, 200 are “B+,” and 100 “B” or “B-.” That’s a strong average grade and probably indicates a quality website to which to send users. Contrast that against submitting all 1,000 pages via the XML sitemap. Now, Google looks at the 1,000 pages you say are SEO-relevant content and sees over 50 percent are “D” or “F” pages.

Your average grade isn’t looking so good anymore, and that may harm your organic sessions. But remember, Google will use your XML sitemap only as a clue to what’s important on your site. Just because it’s not in your XML sitemap doesn’t necessarily mean that Google won’t index those pages. When it comes to SEO, overall site quality is a key factor. To assess the quality of your site, turn to the sitemap-related reporting in Google Search Console (GSC). Key Takeaway Manage crawl budget by limiting XML sitemap URLs only to SEO-relevant pages and invest time to reduce the number of low-quality pages on your website.

The sitemaps section in the new Google Search Console is not as data-rich as what was previously offered. Its primary use now is to confirm your sitemap index has been successfully submitted. Suppose you have chosen to use descriptive naming conventions rather than numeric.

Different types of SEO pages

In that case, you can also get a feel for the number of different types of SEO pages that have been “discovered” – aka all URLs found by Google via sitemaps and other methods such as following links. In the new GSC, the more valuable area for SEOs regarding sitemaps is the Index Coverage report. The report will default to “All known pages.” Here you can: Address any “Error” or “Valid with warnings” issues. These often stem from conflicting robots directives. Once solved, be sure to validate your fix via the Coverage report.

Look at indexation trends. Most sites continually add valuable content, so “Valid” pages (aka those indexed by Google) should steadily increase. Understand the cause of any dramatic changes. Select “Valid” and look in details for the type “Indexed, not submitted in the sitemap.” These are pages where you and Google disagree on their value. For example, you may not have submitted your privacy policy URL, but Google has indexed the page. In such cases, there are no actions to be taken. It would be best if you looked out for indexed URLs that stem from poor pagination handling, poor parameter handling, duplicate content, or pages being accidentally left out of sitemaps.

Exclusion of sitemap URLs

Afterward, limit the report to the SEO-relevant URLs you have included in your sitemap by changing the drop-down to “All submitted pages”. Then check the details of all “Excluded” pages. Reasons for exclusion of sitemap URLs can put into four action groups:

1. Quick wins: For duplicate content, canonicals, robots directives, 40X HTTP status codes, redirects, or legalities exclusions, put the appropriate fix in place.

2. Investigate page: For both “Submitted URL dropped” and “Crawl anomaly” exclusions, investigate further using the Fetch as a Google tool.

3.Improve page: For “Crawled – currently not indexed” pages, review the page (or page type as generally it will be many URLs of a similar breed) content and internal links. Chances are, it’s suffering from thin content, unoriginal content or orphaned.

4. Improve domain: For “Discovered – currently not indexed” pages, Google notes the typical reason for exclusion. As they “tried to crawl the URL but the site was overloaded.” Don’t be fooled. It’s more likely that Google decided “it’s not worth the effort” to crawl due to poor internal linking or low content quality seen from the domain. If you see a larger number of these exclusions. Review the SEO value of the page (or page types) you have submitted via sitemaps. Focus on optimizing crawl budget, and review your information architecture, including parameters, from both a link and content perspective.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *