Each record consists of the directory field (allow directives, disallow directives, host or user-agent), two-spot, and a value. Empty spaces are not required but recommended for better readability. You can place comments anywhere in the file and mark them with the # symbol. “#” is the symbol meant for comment descriptions. Google bots do not count everything mentioned between the # symbol and the next newline. The general format is: :<#comment (optional)>. Empty spaces at the beginning and the end will be ignored. Letter case for an element does not matter. Letter case might be important for the element, depending on the element.
What to Hide with Robots.txt
Obviously, you do not want to show search engines your private technical page, customers’ data, and duplicate content. Robots.txt files can be used to exclude certain directories, categories, and pages from search. To that end, use the “disallow” directive. Here are some pages you should hide using a robots.txt file: Pages with duplicate content Pagination pages On-site search pages Dynamic product and service pages Account pages Admin pages Shopping cart Chats Thank-you pages
How to Use Robots.txt
Robots.txt files are pretty flexible and can be used in many ways. However, their main benefit is that they enable SEO experts to “allow” or “disallow” multiple pages at once without having to access the code page by page. Here is an example of how I instruct Googlebot to avoid crawling and indexing all pages related to user accounts, cart, and multiple dynamic pages that are generated when users look for products in the search bar or sort them by price, and so on.
For example, you can block all search crawlers from content, like this: User-agent: * Disallow: / Or hide your site’s directory structure and specific categories, like this: User-agent: * Disallow: /no-index/ It’s also useful for excluding multiple pages from search. Just parse URLs you want to hide from search crawlers. Then, add the “disallow” command in your robots.txt, list the URLs, and, voila! – the pages are no longer visible to Google.
though, is that a robots.txt file allows you to prioritize certain pages, categories, and even bits of CSS and JS code. Have a look at the example below:
We have disallowed WordPress pages and specific categories, but wp-content files, JS plugins, CSS styles, and blogs are allowed. This approach guarantees that spiders crawl and index useful code and categories firsthand. One more important thing: A robots.txt file is one of the possible locations for your sitemap.xml file.
It should be placed after User-agent, Disallow, Allow, and Host commands. Like this: You can also add your robots.txt file manually to Google Search Console and, in case you target Bing, Bing Webmaster Tools. Even though robots.txt structure and settings are pretty straightforward, a properly set up file can either make or break your SEO campaign. Be careful with settings: You can easily “disallow” your entire site by mistake and then wait for traffic and customers to no avail