robots.txt - Disallow folder but allow files within folder

I seem to have a conflict between my sitemap.xml and my robots.txt

All the images on my site are stored in the folder /pubstore When google crawls that folder it finds nothing because I an not including a listing of files in that folder.

This in turn generates hundreds of 404 errors in google search console.

What I decided to do, is block google from crawling the folder by adding:

Disallow: '/pubstore/'

What now happens is that files within that folder or in a sub-directory in that folder are block for google and thus Google is not indexing my images.

So an example scenario,

I have a page that uses the image /pubstore/12345/image.jpg

Google doesn't fetch it because /pubstore is blocked.

My end result is that I want the actual files to be crawlable but not the folder or its subdirectories.

Allow:

/pubstore/file.jpg
/pubstore/1234/file.jpg
/pubstore/1234/543/file.jpg
/pubstore/1234/543/132/file.jpg

Disallow:

/pubstore/
/pubstore/1234/
/pubstore/1234/543/
/pubstore/1234/543/132/

How can this be achieved?

Comments


  • Gary

    If you don’t link to /pubstore/ and /pubstore/folder/ on your site, there is typically no reason to care about 404s for them. It’s the correct response for such URLs (as there is no content).

    If you still want to use robots.txt to prevent any crawling for these, you have to use Allow, which is not part of the original robots.txt specification, but supported by Google.

    For example:

    User-agent: Googlebot
    Disallow: /pubstore/
    Allow: /pubstore/*.jpg$
    Allow: /pubstore/*.JPG$
    

    Or in case you want to allow many different file types, maybe just:

    User-agent: Googlebot
    Disallow: /pubstore/
    Allow: /pubstore/*.
    

    This would allow all URLs whose path starts with /pubstore/, followed by any string, followed by a ., followed by any string.

Add Comment