Understanding your Shopify robots.txt file
Let's put the robots.txt file under the technical SEO jargon that confuses a lot of people.
So what exactly is a robots.txt file and how does it work?
The robots.txt file is a plain text file that lists a bunch of rules about how to find pages on a website that web robots read (e.g. Google's website crawler). Essentially, rules that bots will attempt to follow when crawling your site as a way to discover all of your URLs.
The good news is that Shopify creates a default robots.txt file for you right out of the box. The default robots file from Shopify is good as it is and you don't need to make any changes to it.
Seriously, ignore the SEO guru who told you to edit this file.
Although Shopify began letting merchants edit the robots.txt file in 2021, I strongly advise against it. If you don't know what you're doing, you could hurt your SEO which could take months to repair. One typo or wrong rule could tell Google to completely remove your site from its search results.
What does it all mean?
Perhaps the easiest way to demystify the robots.txt lines is to talk about each piece so you can better understand what it all means.
The following is meant to help you understand the robots.txt protocol. Please don't make any changes to your robots file. No action is required from you because Shopify already does this for you.
A dictionary of sorts
-
*
- a wild card that means all or any -
User-agent: *
- all user agents (aka bots) -
Disallow:
- means the urls that match this should not be crawled automatically -
*text*
- any URL that has the word "text" regardless of what's before or after (replace text with any word, it's just an example) -
%2b
or%2B
- code for the+
symbol
Shopify specific examples
Disallow: /admin
We don't want the backend of your Shopify site to be crawled. Neither Google nor your customers can access your backend because it's password-protected. Asking Google to crawl this would provide no value and would waste crawling efforts.
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
We don't want any theme previews to be crawled because these aren't your live theme. Imagine if your Black Friday deals leaked in September because of this.
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /carts
Disallow: /account
Disallow: /policies/
Disallow: /search
You may think you want the above pages crawled, but they won't provide any value to customers searching for your products. Since each of these pages is specific to each customer, Google's crawler won't find anything at these urls. So when we talk about "wasting crawl budget" the above URLs would do just that.
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: */collections/*filter*&*filter*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Any collection or blog URL with "sort_by", "+", or "filter" is not to be crawled. These are often used for faceted navigation which allows the customer to filter the results based on specific criteria.
This does not mean don't crawl any of your collection pages or blog posts. These lines refer to pages that use technical query strings (URL parameters) to only show a selection of results based on the customer's input.
For example, customers can choose to filter products to only show blue and green colors. That URL would look like /?filter.v.color=blue&filter.v.color.green
.
Crawling filtered collections and blog posts is redundant. Google already crawls your unfiltered pages. If they also crawled filtered pages, you'd run into duplicate content issues. Then you risk Google only indexing filtered results instead of all blog posts or all products in that collection.
Other examples
User-agent: AhrefsBot
Crawl-delay: 10
User-agent: AhrefsSiteAudit
Crawl-delay: 10
User-agent: MJ12bot
Crawl-delay: 10
User-agent: Pinterest
Crawl-delay: 1
You'll see other user agents besides Google however these also include a crawl delay. Note that these delays are specific to these bots and do not impact Google's rate of crawling.
The crawl delay is for your protection so that these crawlers don't overload the site with various requests. Shopify will automatically slow these bots down so you don't have to do anything.
What does the robots.txt file do?
The robots.txt file is a directive to search engines that says please don't waste time crawling these pages because they aren't helpful. Unless specifically called out to disallow in your robots.txt file, all pages are allowed for crawling.
The robots.txt file is NOT to keep your pages out of search results. That's the noindex directive.
In other words,
- stop crawling these pages = robots
- don't show these pages in search results = noindex tag
So in its most simplistic form, the robots.txt file is used to manage crawler traffic to your site from bots.
Pages may still show in search results even if disallowed in your robots.txt file.
As mentioned, the robots.txt file tells search engines like Google not to crawl the page. However, they may still discover the URL from other means besides crawling. So if a page that has non-public information is discovered by other means, it's possible the contents of that page may be indexed as well.
This is the number one reason why blocking a page you don't want in search results doesn't work with the robots.txt file. If you don't want the contents of the page to be made public, it's always best to use the noindex tag instead.
By including these rules in the robots.txt file, Shopify ensures that search engines focus on crawling and indexing the important pages of a store while avoiding sensitive or irrelevant pages.
Although JSON-LD for SEO doesn't do anything with your robots.txt file or indexing in general, I'm often asked about the robots.txt file.
If you have any further questions on your robots.txt file, I recommend reaching out to Shopify Support. They are best equipped to answer specific questions about your site.
As you can see, the robots.txt file is really important to your store. Shopify has already tuned it for 99% of stores. Unless you really, truly know what you're doing, you should not be changing your robots.txt file at all.
JSON-LD for SEO
Get more organic search traffic from Google without having to fight for better rankings by utilizing search enhancements called Rich Results.