robots.txt - sprijeciti indexiranje odredjnih linkova

mb_sa · 28. 06. 2009.

Zdravo.

Ne znam gdje da postavim temu, a i pitanje je dosta početničko

U robots.txt stavio sam:

Kôd:

User-agent: *
Disallow: /posalji-email-*

Cilj je bio da spriječim indexiranje linkova tipa posalji-email-neki_ID, medjutim google je indexirao već par takvih linkova.

Gdje grijesim?

Hvala i pozdrav.

Peca · 28. 06. 2009.

mislim da pretrazivaci ne prepoznaju * u Disallow.
ukratko - sa robots.txt neces uspeti to da uradis.
stavi nofollow u linkovima ka tim stranicama, a preko google webmaster tools obrisi te indeksirane stranice.

mb_sa · 28. 06. 2009.

Stavio sam rel="nofollow" u te linkove, ali čini mi se da je google ipak prepoznao te linkove kao zabrajene jer u "URL restricted by robots.txt" u google webmasters tool pise da je zabranjeno nekih 120 linkova, a indexirano je svega nekih 10-tak takvih linkova. Ti linkovi koji su indexirani su takodjer u listi Restricted by robots.txt.

E sad se ne mogu sjetiti 100%, ali mislim da sam naknado dodao ova pravila u robots.txt pa da je google uspio da indexira 10-tak tih linkova u jednom danu.

Hvala na pomoci.

Iskren da budem nisam nikada do sada korostio google webmasters tool.

Pozdrav.

Peca · 28. 06. 2009.

https://www.google.com/webmasters/tools/home?hl=en-GB

Marko Medojevic · 28. 06. 2009.

Pattern matching

Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it.

Matching a sequence of characters using *
You can use an asterisk (*) to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry:

User-agent: Googlebot
Disallow: /private*/

To block access to all URLs that include a question mark (?), you could use the following entry:

User-agent: *
Disallow: /*?

To block access to all URLs containing the word "private", you could use:

User-agent: *
Disallow: /*private*

Matching the end characters of the URL using $
You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry:

User-agent: Googlebot
Disallow: /*.asp$

You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows:

User-agent: *
Allow: /*?$
Disallow: /*?

The Disallow:/ *? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string).

The Allow: /*?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?).

Izvor:
http://www.google.com/support/webmas...n&answer=40367

Peca · 28. 06. 2009.

ovo meni treba

tnx.

mb_sa · 14. 05. 2010.

Zna li neko zasto Google indexira stranice u kojima je u okviru head tagova stavljeno <meta name="robots" content="noindex" /> ?

Format tih adresa je:
http://domen.com/forum/viewtopic.php...t=0&view=print

I u robots.txt sam stavio Disallow: /forum/*&start=0&view=print i Disallow: /forum/*view=print ali dzaba, jer u Google Webmasters Centar pod 'Restricted by robots.txt' ih nema.

Imali li kakva fora za masnovo slanje zahtijevaza birsanje u Google Webmasters Centru, tipa da obrise sve linkove koje u sebi sadrze 'print' ili se mora jedan po jedan

?

EDIT: I kada posaljem zatijev da se ukloni iz pretrage jedan od tih linkova, oni budu ukoljeni, a inace da bi bili uklonjeni moraju biit ili 404 ili noindex ili restriced by robots.txt

28. 06. 2009.	#1
mb_sa profesionalac Qualified Datum učlanjenja: 19.05.2007 Poruke: 123 Hvala: 13 3 "Hvala" u 3 poruka	robots.txt - sprijeciti indexiranje odredjnih linkova Zdravo. Ne znam gdje da postavim temu, a i pitanje je dosta početničko U robots.txt stavio sam: Kôd: User-agent: * Disallow: /posalji-email-* Cilj je bio da spriječim indexiranje linkova tipa posalji-email-neki_ID, medjutim google je indexirao već par takvih linkova. Gdje grijesim? Hvala i pozdrav.

28. 06. 2009.	#2
Peca Super Moderator Knowledge base Datum učlanjenja: 02.10.2006 Lokacija: Niš Poruke: 1.618 Hvala: 263 275 "Hvala" u 104 poruka	mislim da pretrazivaci ne prepoznaju * u Disallow. ukratko - sa robots.txt neces uspeti to da uradis. stavi nofollow u linkovima ka tim stranicama, a preko google webmaster tools obrisi te indeksirane stranice. __________________ Vesti \| MyCity \| Igrice \| Zaštita od virusa Poslednja izmena od Peca : 28. 06. 2009. u 16:18.

28. 06. 2009.	#4
Peca Super Moderator Knowledge base Datum učlanjenja: 02.10.2006 Lokacija: Niš Poruke: 1.618 Hvala: 263 275 "Hvala" u 104 poruka	https://www.google.com/webmasters/tools/home?hl=en-GB __________________ Vesti \| MyCity \| Igrice \| Zaštita od virusa

28. 06. 2009.	#6
Peca Super Moderator Knowledge base Datum učlanjenja: 02.10.2006 Lokacija: Niš Poruke: 1.618 Hvala: 263 275 "Hvala" u 104 poruka	ovo meni treba tnx. __________________ Vesti \| MyCity \| Igrice \| Zaštita od virusa

14. 05. 2010.	#7
mb_sa profesionalac Qualified Datum učlanjenja: 19.05.2007 Poruke: 123 Hvala: 13 3 "Hvala" u 3 poruka	Zna li neko zasto Google indexira stranice u kojima je u okviru head tagova stavljeno <meta name="robots" content="noindex" /> ? Format tih adresa je: http://domen.com/forum/viewtopic.php...t=0&view=print I u robots.txt sam stavio Disallow: /forum/&start=0&view=print i Disallow: /forum/view=print ali dzaba, jer u Google Webmasters Centar pod 'Restricted by robots.txt' ih nema. Imali li kakva fora za masnovo slanje zahtijevaza birsanje u Google Webmasters Centru, tipa da obrise sve linkove koje u sebi sadrze 'print' ili se mora jedan po jedan ? EDIT: I kada posaljem zatijev da se ukloni iz pretrage jedan od tih linkova, oni budu ukoljeni, a inace da bi bili uklonjeni moraju biit ili 404 ili noindex ili restriced by robots.txt Poslednja izmena od mb_sa : 14. 05. 2010. u 21:59.

28. 06. 2009.	#5
Marko Medojevic član Certified Datum učlanjenja: 12.05.2007 Lokacija: Beograd Poruke: 82 Hvala: 20 293 "Hvala" u 7 poruka	Pattern matching Yes, Googlebot interprets some pattern matching. This is an extension of the standard, so not all bots may follow it. Matching a sequence of characters using * You can use an asterisk () to match a sequence of characters. For instance, to block access to all subdirectories that begin with private, you could use the following entry: User-agent: Googlebot Disallow: /private/ To block access to all URLs that include a question mark (?), you could use the following entry: User-agent: * Disallow: /? To block access to all URLs containing the word "private", you could use: User-agent: Disallow: /private Matching the end characters of the URL using $ You can use the $ character to specify matching the end of the URL. For instance, to block an URLs that end with .asp, you could use the following entry: User-agent: Googlebot Disallow: /.asp$ You can use this pattern matching in combination with the Allow directive. For instance, if a ? indicates a session ID, you may want to exclude all URLs that contain them to ensure Googlebot doesn't crawl duplicate pages. But URLs that end with a ? may be the version of the page that you do want included. For this situation, you can set your robots.txt file as follows: User-agent: Allow: /?$ Disallow: /? The Disallow:/ ? line will block any URL that includes a ? (more specifically, it will block any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string). The Allow: /?$ line will allow any URL that ends in a ? (more specifically, it will allow any URL that begins with your domain name, followed by a string, followed by a ?, with no characters after the ?). Izvor: http://www.google.com/support/webmas...n&answer=40367

Slične teme
Tema	Početna poruka teme	Forum	Odgovori	Poslednja poruka
robots.txt	GaVrA	(X)HTML, JavaScript, DHTML, XML, CSS	4	14. 11. 2008. 19:34
Drupal robots.txt ne radi kako treba	BluesRocker	Marketing i SEO	2	11. 08. 2008. 23:18
Statistike Awstats Robots/Spiders visitors	novi	Sva početnička pitanja	4	28. 01. 2008. 12:12
robots-nocontent tag	Eniac	Marketing i SEO	0	03. 05. 2007. 11:33
Google indexiranje foruma kao php nuke modula	bukovski	Marketing i SEO	4	11. 11. 2006. 08:47