vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vBulletin 3.0 Full Releases (https://vborg.vbsupport.ru/forumdisplay.php?f=33)
-   -   Stop Spammers with rel=nofollow in URLs! (https://vborg.vbsupport.ru/showthread.php?t=74703)

kall 01-20-2005 04:43 PM

Quote:

Originally Posted by neocorteqz
thanks.

one small question.

How do we verify it's working?

View the source of any page with a posted URL or a signature.

CTRL-F for nofollow. :)

neocorteqz 01-20-2005 06:14 PM

Quote:

Originally Posted by kall
View the source of any page with a posted URL or a signature.

CTRL-F for nofollow. :)

thanks.

yoyoyoyo 01-20-2005 07:30 PM

ADDON "HACK":You can also add the "no index and no follow" rule to each page header as well as the URL.

Go to your admin control panel, and open the style manager, and choose to edit the headinclude template and look for:

PHP Code:

<meta http-equiv="Content-Type" content="text/html; charset=$stylevar[charset]/> 

and add the following AFTER:
PHP Code:

<meta name="robots" content="no index, no follow" /> 

if you want to have the page indexed, but still have the no follow rule stay in effect use this instead:
PHP Code:

<meta name="robots" content="no follow" /> 


Erwin 01-20-2005 08:39 PM

Interesting. :)

Dean C 01-20-2005 08:43 PM

Not to criticise your modification but I'd say this was a poor way of implementing this. As soon as you put no follow on the links it'll:

Quote:

1. NOT follow through to that page.
2. NOT count the link in calculating PageRank link popularity scores.
3. NOT count the anchor text in determining what terms the page being linked to is relevant for.

Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

kall 01-20-2005 09:01 PM

Quote:

Originally Posted by Dean C
Not to criticise your modification but I'd say this was a poor way of implementing this. As soon as you put no follow on the links it'll:



Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

*seethes at criticism* :)

But seriously...it's not spambots that are 'targeted' by this hack, it's the Spammers that send them out.

The theory goes that if people were to implement this idea, there would be no reason for the Spammers to send out the bots in the first place. At least, it removes the advantage of having PR from the sites they are spamming be added to their site.

How would you go about implementing this?

regarding the addon: I don't know why he is suggesting to have noindex in the header of each page...not something I would do myself.

yoyoyoyo 01-20-2005 09:21 PM

Quote:

Originally Posted by Dean C
Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

Quote:

Originally Posted by kall
Regarding the addon: I don't know why he is suggesting to have noindex in the header of each page...not something I would do myself.

Not to hijack the thread, but there are many good reasons not to be indexed, but it is up to each person to decide if they want to or not. That is why I gave the alternate of only including the "no follow" meta tag instead of having both the no index and no follow. It seems to me that doing one without doing the other (including the no follow in the URL, but not in the meta) is only half of the solution.

You can also tell the spider to ignore only specific parts of your site in a few different ways. One way is to use a "robots.txt" file. The robots.txt is a TEXT file (not HTML!) which has a section for each robot to be controlled. Each section has a user-agent line which names the robot to be controlled and has a list of "disallows" and "allows". Each disallow will prevent any address that starts with the disallowed string from being accessed. Similarly, each allow will permit any address that starts with the allowed string from being accessed. The (dis)allows are scanned in order, with the last match encountered determining whether an address is allowed to be used or not. If there are no matches at all then the address will be used.

Using a robots.txt file is easy. If your site is located at:
http://domain.com/mysite/index.html
you will need to be able to create a file located here:
http://domain.com/robots.txt

Here's an example:

Code:

user-agent: FreeFind
  disallow: /mysite/test/
  disallow: /mysite/cgi-bin/post.cgi?action=reply
  disallow: /a

In this example the following addresses would be ignored by the spider:

Code:

http://domain.com/mysite/test/index.html
  http://domain.com/mysite/cgi-bin/post.cgi?action=reply&id=1
  http://domain.com/mysite/cgi-bin/post.cgi?action=replytome
  http://domain.com/abc.html

and the following ones would be allowed:

Code:

http://domain.com/mysite/test.html
  http://domain.com/mysite/cgi-bin/post.cgi?action=edit
  http://domain.com/mysite/cgi-bin/post.cgi
  http://domain.com/bbc.html

It is also possible to use an "allow" in addition to disallows. For example:

Code:

user-agent: FreeFind
  disallow: /cgi-bin/
  allow: /cgi-bin/Ultimate.cgi
  allow: /cgi-bin/forumdisplay.cgi

This robots.txt file prevents the spider from accessing every cgi-bin address from being accessed except Ultimate.cgi and forumdisplay.cgi.

Using allows can often simplify your robots.txt file.

Here's another example which shows a robots.txt with two sections in it. One for "all" robots, and one for the FreeFind spider:

Code:

user-agent: *
  disallow: /cgi-bin/

  user-agent: FreeFind
  disallow:

In this example all robots except the FreeFind spider will be prevented from accessing files in the cgi-bin directory. FreeFind will be able to access all files (a disallow with nothing after it means "allow everything").

Examples:

To prevent FreeFind from indexing your site at all:

Code:

user-agent: FreeFind
disallow: /

To prevent FreeFind from indexing common Front Page image map junk:

Code:

user-agent: FreeFind
disallow: /_vti_bin/shtml.exe/

To prevent FreeFind from indexing a test directory and a private file:

Code:

user-agent: FreeFind
disallow: /test/
disallow: private.html

To allow let FreeFind index everything but prevent other robots from accessing certain files:

Code:

user-agent: *
disallow: /cgi-bin/
disallow: this.html
disallow: and.html
disallow: that.html

user-agent: FreeFind
disallow:

Here are some more examples:

The exclusion:
http://mysite.com/ignore.html
prevents that file from being included in the index.

The exclusion:
http://mysite.com/archive/*
prevents everything in the "archive" directory from being included in the index.

The exclusion:
/archive/*
prevents everything in any "archive" directory from being included in the index regardless of the site it's on.

The exclusion:
http://mysite.com/*.txt
prevents files on "mysite.com" that end with the extension ".txt" from being included in the index.

The exclusion:
*.txt
prevents all files that end with the extension ".txt" from being included in the index regardless of what site they're on.

The exclusion:
http://mysite.com/alphaindex/?.html
prevents a file like "http://mysite.com/alphaindex/a.html" from being indexed, but would allow a file "http://mysite.com/alphaindex/aardvark.html" to be indexed.

The exclusion:
http://mysite.com/alphaindex/?.html index=no follow=yes
prevents a file like "http://mysite.com/alphaindex/a.html" from being added to the index but would allow the spider to find and follow the links in that page.

The exclusion:
http://mysite.com/endwiththis.html index=yes follow=no
allows that file to be added to the index but prevents the spider from following any of the links in that file.

yoyoyoyo 01-20-2005 10:37 PM

Quote:

Originally Posted by Natch
If you don't mind my saying: "no kidding" or "so?"

you could say that about alot of the mods here, but each little addon or mod is like a lesson in the workings of php and vbulletin, so I find it interesting. I am sorry if it bothers you, but hopefully some other people will find it interesting or helpful.
Quote:

Originally Posted by Natch
I can't think of a reason why I would want to block Legit spiders (those that respect robots.txt restrictions), and a spambot spider is likely as not to ignore meta tags and robots.txt anyway.

well, maybe you can't but obviously others can, and desire that function, thus the "rules." I did not invent the meta tags. You are correct that bots don't always play by the rules, but some do, and these are the ones that this hack was addressing.

Actually....ya know what - forget about the "no index" option... forget I even mentioned it- this hack was about the "no follow" so if you are planning on implementing the first hack I suggest adding the meta in the header, and I apologize for trying to toss in more info than was needed.

Princeton 01-21-2005 12:14 AM

The "rel" attribute has been around since HTML 3.2.

It's getting a lot of attention these days because of the "junk" pages that are being indexed by the major search engines. Most notably caused by individuals who comment (spam) a blog, wicki, or forum site. The search engines are looking for a way to conserve resources (use it where it counts) and prevent indexing of sites with no relative content.

So they are asking the community to start using the rel="nofollow" attribute to help them stop-- at the very least slow -- the "spamming".

When an individual spams a site they leave links on the post hoping that the search engines will "follow" the link back to their site. When they (the spammers) do this they are hoping to increase their "popularity" with search engines.

The rel="nofollow" does not prevent the search engines from indexing your pages. Nor, does it prevent the other site from being indexed when search engines do it directly.

It will simply tell the search engine not to follow the link that was posted on your page (thread/post) -- that was NOT created by you.

ABOUT THE HACK
If you are worried about your PAGERANK than use this.

If you want to prevent spammers from posting in your forum than this hack will not help. Spammers will continue doing what they do ... the best route is to remove post and ban user. Most will not even know you are using rel="nofollow" and some will not even understand it.

SOME CONTROLS ARE NEEDED
I think there should be some controls.

For example, converting all posted links with rel="nofollow" also punishes those who are loyal to the site.

Why not help your loyal members with their site "popularity"? Do not convert links posted by loyal users. Allow the search engines to follow these links. Some sites can even list this as a membership benefit. -- just throwing ideas

Anyway, what I'm trying to get to is that the ADMIN should have some control over what links get rel="nofollow".

As it is now, all "in-house" links are tagged with rel="nofollow" which may hurt your "popularity".

kall 01-21-2005 01:07 AM

Quote:

Originally Posted by princeton
SOME CONTROLS ARE NEEDED
I think there should be some controls.

For example, converting all posted links with rel="nofollow" also punishes those who are loyal to the site.

Why not help your loyal members with their site "popularity"? Do not convert links posted by loyal users. Allow the search engines to follow these links. Some sites can even list this as a membership benefit. -- just throwing ideas

Anyway, what I'm trying to get to is that the ADMIN should have some control over what links get rel="nofollow".

As it is now, all "in-house" links are tagged with rel="nofollow" which may hurt your "popularity".

Alrighty then, try this:

PHP Code:

    if ($type == 'url')
    {
        global 
$bbuserinfo;

    if (
is_member_of($bbuserinfo6))
        {
        
// standard URL hyperlink
        
return "<a href=\"$rightlink\" target=\"_blank\">$text</a>";
        }
        else
        {
         return 
"<a href=\"$rightlink\" rel=\"nofollow\" target=\"_blank\">$text</a>";
        }
    }
    else 

This will make it so anyone who is an admin (group 6 - change this to whatever you want) will not have their links tagged with the nofollow attribute.

The syntax for multiple groups escapes me at present, but if someone can remind me, I will change it.


All times are GMT. The time now is 12:45 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01232 seconds
  • Memory Usage 1,805KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (9)bbcode_code_printable
  • (4)bbcode_php_printable
  • (9)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (3)pagenav_pagelink
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (10)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • pagenav_page
  • pagenav_complete
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete