Now, let’s say Google crawls those 475 pages and algorithmically decides that 175 are “A” grade, 200 are “B+,” and 100 “B” or “B-.” That’s a strong average grade and probably indicates a quality website to which to send users. Contrast that against submitting all 1,000 pages via the XML sitemap. Now, Google looks at the 1,000 pages you say are SEO-relevant content and sees over 50 percent are “D” or “F” pages.
Your average grade isn’t looking so good anymore, and that may harm your organic sessions. But remember, Google will use your XML sitemap only as a clue to what’s important on your site. Just because it’s not in your XML sitemap doesn’t necessarily mean that Google won’t index those pages. When it comes to SEO, overall site quality is a key factor. To assess the quality of your site, turn to the sitemap-related reporting in Google Search Console (GSC). Key Takeaway Manage crawl budget by limiting XML sitemap URLs only to SEO-relevant pages and invest time to reduce the number of low-quality pages on your website.
The sitemaps section in the new Google Search Console is not as data-rich as what was previously offered. Its primary use now is to confirm your sitemap index has been successfully submitted. Suppose you have chosen to use descriptive naming conventions rather than numeric.
Different types of SEO pages
In that case, you can also get a feel for the number of different types of SEO pages that have been “discovered” – aka all URLs found by Google via sitemaps and other methods such as following links. In the new GSC, the more valuable area for SEOs regarding sitemaps is the Index Coverage report. The report will default to “All known pages.” Here you can: Address any “Error” or “Valid with warnings” issues. These often stem from conflicting robots directives. Once solved, be sure to validate your fix via the Coverage report.
Exclusion of sitemap URLs
Afterward, limit the report to the SEO-relevant URLs you have included in your sitemap by changing the drop-down to “All submitted pages”. Then check the details of all “Excluded” pages. Reasons for exclusion of sitemap URLs can put into four action groups:
1. Quick wins: For duplicate content, canonicals, robots directives, 40X HTTP status codes, redirects, or legalities exclusions, put the appropriate fix in place.
2. Investigate page: For both “Submitted URL dropped” and “Crawl anomaly” exclusions, investigate further using the Fetch as a Google tool.
3.Improve page: For “Crawled – currently not indexed” pages, review the page (or page type as generally it will be many URLs of a similar breed) content and internal links. Chances are, it’s suffering from thin content, unoriginal content or orphaned.
4. Improve domain: For “Discovered – currently not indexed” pages, Google notes the typical reason for exclusion. As they “tried to crawl the URL but the site was overloaded.” Don’t be fooled. It’s more likely that Google decided “it’s not worth the effort” to crawl due to poor internal linking or low content quality seen from the domain. If you see a larger number of these exclusions. Review the SEO value of the page (or page types) you have submitted via sitemaps. Focus on optimizing crawl budget, and review your information architecture, including parameters, from both a link and content perspective.