DevProTalk - Pogledajte određenu poruku - robots.txt

Marko Medojevic · 28. 06. 2009.

Pattern matching

Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it.

Matching a sequence of characters using *
You can use an asterisk (*) to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry:

User-agent: Googlebot
Disallow: /private*/

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?

To block access to all URLs containing the word "private", you could use:

User-agent: *
Disallow: /*private*

Matching the end characters of the URL using $
You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:

User-agent: Googlebot
Disallow: /*.asp$

You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

User-agent: *
Allow: /*?$
Disallow: /*?

The Disallow:/ *? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

The Allow: /*?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

Izvor:
http://www.google.com/support/webmas...n&answer=40367

28. 06. 2009.	#5
Marko Medojevic član Certified Datum učlanjenja: 12.05.2007 Lokacija: Beograd Poruke: 82 Hvala: 20 293 "Hvala" u 7 poruka	Pattern matching Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it. Matching a sequence of characters using * You can use an asterisk () to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry: User-agent: Googlebot Disallow: /private/ To block access to all URLs that include a question mark (?), you could use the following entry: User-agent: * Disallow: /? To block access to all URLs containing the word "private", you could use: User-agent: Disallow: /private Matching the end characters of the URL using $ You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry: User-agent: Googlebot Disallow: /.asp$ You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows: User-agent: Allow: /?$ Disallow: /? The Disallow:/ ? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string). The Allow: /?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?). Izvor: http://www.google.com/support/webmas...n&answer=40367