Pattern matching
Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it.
Matching a sequence of characters using *
You can use an asterisk (*) to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry:
User-agent: Googlebot
Disallow: /private*/
To block access to all URLs that include a question mark (?), you could use the following entry:
User-agent: *
Disallow: /*?
To block access to all URLs containing the word "private", you could use:
User-agent: *
Disallow: /*private*
Matching the end characters of the URL using $
You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:
User-agent: Googlebot
Disallow: /*.asp$
You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:
User-agent: *
Allow: /*?$
Disallow: /*?
The Disallow:/ *? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).
The Allow: /*?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).
Izvor:
http://www.google.com/support/webmas...n&answer=40367