A typical WordPress robots.txt looks like the following:
This robots.txt tells all robots (defined by the * asterisk, which is a placeholder that represents any possible string of characters) to not crawl the /wp-admin/ folder (ergo, disallow) except for the /wp-admin/admin-ajax.php file (ergo, allow) within this folder.
In general, as long as you don’t disallow any sections of your site, crawlers will assume the entire site is allowed to crawl:
As you can see, no directories or files are disallowed. The entire website is allowed to be crawled. Attention, though! The following robots.txt would block all crawlers from you entire site:
You can add several lines of User-agents and their directives (Allow, Disallow, Crawl-delay) to instruct different robots differently:
This case is exactly like the example above, except that Googlebot is not allowed to crawl the /example/ folder and any sub-folders. It also tells Googlebot to wait 20 milliseconds between crawling each page on your site.
You can also tell search engines where to find your XML sitemap in your robots.txt file:
This, however, is not necessarily needed as long as you’ve added your sitemaps to Google Search Console and other search engines’ respective webmaster tools.
To sum up, the most important robots.txt directives:
User-agent is used to identify a particular crawler or group of crawlers.
Disallow tells these crawlers which folders or files they are not allowed to crawl.
Allow, in turn, tells these crawlers if there are any exceptions within a disallowed folder, which they are allowed to crawl. While this is not a standard directive and might not be understood by every robot, most crawlers (among them Googlebot) will understand and follow this directive.
Sitemap let search engines know where to find your XML sitemaps.