Robots.txt Generator

Create a properly formatted robots.txt file to control search engine crawlers. Block AI bots, prevent duplicate content, and manage Google indexing.

TL;DR: A robots.txt file tells search engine crawlers which pages to access on your site. Get it wrong and you might accidentally block Google from indexing your best content. This free robots.txt generator creates a properly formatted file in seconds, with options to block AI bots and preserve your crawl budget.

What Is Robots.txt and Why Does It Matter?

Every day, search engine crawlers visit your website. They follow links, read content, and decide what to add to Google's index. But here's the thing: you have control over where they go.

Robots.txt is a plain text file that sits in your website's root directory (example.com/robots.txt). It's part of the Robots Exclusion Protocol, and every major search engine respects it. When Googlebot arrives at your site, the first thing it does is check your robots.txt for instructions.

Without a robots.txt file, crawlers access everything. That's fine for simple sites. But if you have admin pages, staging environments, duplicate content, or limited server resources, you need a robots.txt file to guide crawlers to what matters.

How to Use This Robots.txt Generator

This robots.txt builder creates a properly formatted file based on your selections. No syntax errors, no guessing.

  1. Select your platform (WordPress, Shopify, custom) for platform-specific recommendations.
  2. Choose what to block from the common options: admin areas, search results, staging content.
  3. Toggle AI bot blocking if you want to prevent AI crawlers from training on your content.
  4. Add your sitemap URL so search engines can find all your pages.
  5. Copy the generated code and save it as robots.txt in your root directory.

After uploading, use Google Search Console's robots.txt tester to validate your file and check for errors before they affect your Google indexing.

Robots.txt Syntax: The Complete Reference

Understanding robots.txt syntax prevents costly mistakes. Here's every directive you need to know:

Robots.txt Directives

Directive Purpose Example
User-agent Specifies which crawler the rules apply to User-agent: Googlebot
Disallow Blocks access to a path Disallow: /admin/
Allow Permits access (overrides Disallow) Allow: /admin/public/
Sitemap Points to your XML sitemap Sitemap: https://example.com/sitemap.xml
Crawl-delay Seconds between requests (not supported by Google) Crawl-delay: 10

Pattern Matching in Robots.txt

Robots.txt supports wildcards and pattern matching:

  • * (asterisk) matches any sequence of characters. Disallow: /*.pdf blocks all PDF files.
  • $ (dollar sign) matches the end of a URL. Disallow: /*.php$ blocks URLs ending in .php but not /page.php?id=1.
  • Paths are case-sensitive. /Admin/ and /admin/ are different.

Robots.txt for WordPress: Best Practices

WordPress sites have specific directories that should typically be blocked. Here's a recommended robots.txt for WordPress:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /trackback/
Disallow: /feed/
Disallow: /?s=
Disallow: /search/

Sitemap: https://yourdomain.com/sitemap.xml

Important: The Allow: /wp-admin/admin-ajax.php line is critical. Many WordPress themes and plugins use admin-ajax.php for frontend functionality. Blocking it can break your site's features.

If you're using an SEO plugin like Yoast or Rank Math, they create a virtual robots.txt that you can edit in the dashboard. Check your plugin settings before creating a separate file.

How to Block AI Bots with Robots.txt

AI companies crawl websites to train their models. If you don't want your content used for AI training, you can block these crawlers. Here are the major AI bots and their user-agents:

AI Bot User-Agents

Company Bot Name User-Agent
OpenAI GPTBot GPTBot
OpenAI ChatGPT User ChatGPT-User
Google Google Extended Google-Extended
Anthropic Claude anthropic-ai
Anthropic ClaudeBot ClaudeBot
Common Crawl CCBot CCBot
Meta Meta AI FacebookBot

To block all AI bots, add this to your robots.txt:

# Block AI Training Bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

Note: Blocking Google-Extended stops your content from being used in Bard/Gemini training but doesn't affect regular Google Search indexing. Googlebot and Google-Extended are separate.

Robots.txt vs Noindex: When to Use Each

This confuses a lot of people. Robots.txt and meta robots noindex do different things:

Method What It Does Best For
Robots.txt Disallow Prevents crawling but URL can still appear in index Saving crawl budget, blocking admin areas
Meta robots noindex Allows crawling but prevents indexing Keeping pages out of search results entirely

Critical mistake: If you block a page with robots.txt AND add noindex, Google can't crawl the page to see the noindex tag. The page might stay in the index with a "URL blocked by robots.txt" message.

Rule of thumb: Use robots.txt for things you don't care if they appear in search (like /wp-admin/). Use noindex for content you want crawled but not indexed (like thank-you pages or internal search results).

Common Robots.txt Mistakes That Hurt Google Indexing

I've seen these mistakes on hundreds of sites. Each one can silently wreck your search visibility:

  • Blocking CSS and JavaScript — Google needs these to render your pages. Blocking them can tank your rankings.
  • Using Disallow: / on production — This blocks your entire site. Great for staging, disastrous if left on production.
  • Wrong file location — Robots.txt must be at the root: example.com/robots.txt, not example.com/pages/robots.txt.
  • Case sensitivity errors/Admin/ doesn't block /admin/. Check your actual URLs.
  • Blocking important pages accidentally — A broad rule like Disallow: /p blocks /products/, /pricing/, /privacy-policy/, and everything else starting with /p.

How to Test Your Robots.txt File

Before uploading your robots.txt, validate it. Here's how:

  1. Google Search Console — Go to Settings → Crawl Stats → Open report → click "robots.txt" to see what Google sees. Use the robots.txt tester to check specific URLs.
  2. Check the live file — Visit yourdomain.com/robots.txt in your browser. Make sure it loads as plain text, not HTML.
  3. Test specific URLs — In Search Console's robots.txt tester, enter important page URLs to verify they're not accidentally blocked.

Frequently Asked Questions

Where do I put my robots.txt file?

Place it in your website's root directory so it's accessible at yourdomain.com/robots.txt. On most web hosts, this is the public_html or www folder. Each subdomain needs its own robots.txt file (blog.example.com/robots.txt is separate from example.com/robots.txt).

Is robots.txt necessary for my website?

Not strictly required. Without one, search engines crawl everything they can access. But if you have admin areas, duplicate content, or want to block AI crawlers, you need a robots.txt file. For any site with more than a handful of pages, having one is best practice.

Does robots.txt affect my SEO rankings?

Indirectly, yes. Robots.txt doesn't directly affect rankings, but it controls what gets crawled and indexed. Blocking important pages hurts rankings. Blocking low-value pages can help Google focus on what matters. Think of robots.txt as guiding search engine crawlers to your best content.

How do I find my WordPress robots.txt?

WordPress creates a virtual robots.txt by default. Visit yourdomain.com/robots.txt to see it. If you're using Yoast SEO or Rank Math, edit it in the plugin settings under Tools → File Editor (Yoast) or General Settings → Edit robots.txt (Rank Math). For a physical file, create robots.txt in your WordPress root folder via FTP.

Can robots.txt completely block a page from Google?

No. Robots.txt prevents crawling, but Google can still index the URL if it's linked from other sites. The page will appear in search results with "A description for this result is not available because the page's robots.txt blocks access." To fully remove a page from Google, use a meta robots noindex tag instead.

How often does Google check robots.txt?

Google caches your robots.txt file and checks for updates roughly every 24 hours, though it can vary. After making changes, you can request a refresh in Google Search Console by going to Settings → robots.txt and clicking "Submit" to expedite the process.

What's the difference between robots.txt and meta robots?

Robots.txt is a file that controls crawling at the URL level before the page is accessed. Meta robots is an HTML tag on individual pages that controls indexing after the page is crawled. Use robots.txt for broad crawl management. Use meta robots noindex for specific pages you want crawled but not indexed.

Take Control of Your Site's Crawling

Your robots.txt file is one of the first things search engines see. A properly configured file helps Google focus on your important pages, protects sensitive areas, and can block AI crawlers from using your content without permission.

Use the robots.txt generator above to create your file, test it in Google Search Console, and upload it to your root directory. Small file, big impact on how search engines interact with your site.

Related Free SEO Tools

Your robots.txt is just one piece of technical SEO. These tools help with other essentials:

  • XML Sitemap Builder — Create a sitemap to help search engines discover all your pages. Reference it in your robots.txt.
  • Bulk HTTP Status Checker — Verify that pages you're blocking actually exist and return the right status codes.
  • On-Page SEO Analyzer — Check your pages for meta robots tags and other on-page SEO issues.