The Arcive of Official vBulletin Modifications Site.

Stop Spammers with rel=nofollow in URLs! · **Released**: 01-19-2005

In the first cooperative move for nearly ten years, the major search engines have unveiled a new indexing command for web authors that they all recognize, one that they hope will help reduce the link and comment spam that plagues many web sites....due to removing the point of doing it in the first place.

The new "nofollow" attribute that can be associated with links was originated as an idea by Google in late 2004 and MSN and Yahoo, as well as major blogging vendors have jumped onboard.

The Nofollow Attribute

The new attribute is called "nofollow" with rel="nofollow" being the format inserted within an anchor tag.
When added to any link, it will effectively serve as a flag to tell the search engines that the link has not been explictly approved by the site owner, and therefore "not follow" it, or not use the referring page's (on your site) Page Rank in any way.

For example, this is how the HTML markup for an ordinary link might look:

<a href="http://www.somedomain.com/page.html">My forums are the best lol lol lol click here!!</a>

This is how the link would look after the nofollow attribute has been added, with the attribute portion shown in bold

<a href="http://www.somedomain.com/page.html" rel="nofollow">My forums are the best lol lol lol click here!!</a>

This would also be acceptable, as order of elements within the anchor tag makes no difference:

<a rel="nofollow" href="http://www.site.com/page.html" >Visit My Page</a>

Once added, the search engines supporting the attribute will understand that the link has not been approved in some way by the site owner.

Think of it as a way to flag to them, "I didn't post this link -- someone else did."

Quote:

Originally Posted by Alkatraz

If Google sees nofollow as part of a link, it will:

1. NOT follow through to that page.
2. NOT count the link in calculating PageRank link popularity scores.
3. NOT count the anchor text in determining what terms the page being linked to is relevant for.

The site that is being linked to will gain nothing from the link, so the whole point of doing it in the first place is removed.

WHAT WILL THIS DO, IN ESSENCE?

This will affect URLs in posts, as well as signatures...anything that goes through the bbcodeparse function as far as I can tell/guess, and will work recursively, or whatever the word is that means 'it will affect all existing posts and signatures'...or it did for me anyway.

Update:

Thanks to Michael Morris and natez0rz for pointing out that using the $post global would be a much better idea.

To change the conditional number of posts, alter

PHP Code:


			
OR $post['posts'] > 50)

to whatever you like.

It should work with all vB 3.0.x versions, but was tested on 3.0.6.

File to modify: 1

1/ Open your includes/functions_bbcodeparse.php file

Find:

PHP Code:


			
if ($type == 'url')

    {

        // standard URL hyperlink

        return "<a href=\"$rightlink\" target=\"_blank\">$text</a>";

    }

    else

    {

        // email hyperlink (mailto:)

Replace with:

PHP Code:


			
        if ($type == 'url')

    {

        global $post;



if (is_member_of($post, 6) //Admins are exempt

OR is_member_of($post, 5) //Mods are exempt

OR is_member_of($post, 7) //SuperMods are exempt

OR $post['posts'] > 50) // People with over 50 posts are exempt

    {

    // standard URL hyperlink

    return "<a href=\"$rightlink\" target=\"_blank\">$text</a>";

    }

    else

    {

     return "<a href=\"$rightlink\" rel=\"nofollow\" target=\"_blank\">$text</a>";

    }

    }

   else

    {

        // email hyperlink (mailto:)

2/ Save and Upload.

3/ Relax, safe in the knowledge that spammers linking from your site are doing so for no reason whatsoever.

4/ Edit: exclude staff usergroups and members with over 50 posts.

kall · #12 01-20-2005, 04:43 PM

Quote:

Originally Posted by neocorteqz

thanks.

one small question.

How do we verify it's working?

View the source of any page with a posted URL or a signature.

CTRL-F for nofollow.

neocorteqz · #13 01-20-2005, 06:14 PM

Quote:

Originally Posted by kall

View the source of any page with a posted URL or a signature.

CTRL-F for nofollow.

thanks.

yoyoyoyo · #14 01-20-2005, 07:30 PM

ADDON "HACK":You can also add the "no index and no follow" rule to each page header as well as the URL.

Go to your admin control panel, and open the style manager, and choose to edit the headinclude template and look for:

PHP Code:


			
<meta http-equiv="Content-Type" content="text/html; charset=$stylevar[charset]" />

and add the following AFTER:

PHP Code:


			
<meta name="robots" content="no index, no follow" />

if you want to have the page indexed, but still have the no follow rule stay in effect use this instead:

PHP Code:


			
<meta name="robots" content="no follow" />

Erwin · #15 01-20-2005, 08:39 PM

Interesting.

Dean C · #16 01-20-2005, 08:43 PM

Not to criticise your modification but I'd say this was a poor way of implementing this. As soon as you put no follow on the links it'll:

Quote:

1. NOT follow through to that page.
2. NOT count the link in calculating PageRank link popularity scores.
3. NOT count the anchor text in determining what terms the page being linked to is relevant for.

Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

kall · #17 01-20-2005, 09:01 PM

Quote:

Originally Posted by Dean C

Not to criticise your modification but I'd say this was a poor way of implementing this. As soon as you put no follow on the links it'll:

Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

*seethes at criticism*

But seriously...it's not spambots that are 'targeted' by this hack, it's the Spammers that send them out.

The theory goes that if people were to implement this idea, there would be no reason for the Spammers to send out the bots in the first place. At least, it removes the advantage of having PR from the sites they are spamming be added to their site.

How would you go about implementing this?

regarding the addon: I don't know why he is suggesting to have noindex in the header of each page...not something I would do myself.

yoyoyoyo · #18 01-20-2005, 09:21 PM

Quote:

Originally Posted by Dean C

Also your addon will mean google will not try to index the page. Maybe I'm missing something here but why on earth would you not want the search engines to index your page. The only usage for this will be on blog comment pages. Just because a spambot sees your link having rel="no follow" inside of it will not mean it won't spam the email.

Quote:

Originally Posted by kall

Regarding the addon: I don't know why he is suggesting to have noindex in the header of each page...not something I would do myself.

Not to hijack the thread, but there are many good reasons not to be indexed, but it is up to each person to decide if they want to or not. That is why I gave the alternate of only including the "no follow" meta tag instead of having both the no index and no follow. It seems to me that doing one without doing the other (including the no follow in the URL, but not in the meta) is only half of the solution.

You can also tell the spider to ignore only specific parts of your site in a few different ways. One way is to use a "robots.txt" file. The robots.txt is a TEXT file (not HTML!) which has a section for each robot to be controlled. Each section has a user-agent line which names the robot to be controlled and has a list of "disallows" and "allows". Each disallow will prevent any address that starts with the disallowed string from being accessed. Similarly, each allow will permit any address that starts with the allowed string from being accessed. The (dis)allows are scanned in order, with the last match encountered determining whether an address is allowed to be used or not. If there are no matches at all then the address will be used.

Using a robots.txt file is easy. If your site is located at:
http://domain.com/mysite/index.html
you will need to be able to create a file located here:
http://domain.com/robots.txt

Here's an example:

Code:

user-agent: FreeFind
   disallow: /mysite/test/
   disallow: /mysite/cgi-bin/post.cgi?action=reply
   disallow: /a

In this example the following addresses would be ignored by the spider:

Code:

http://domain.com/mysite/test/index.html
   http://domain.com/mysite/cgi-bin/post.cgi?action=reply&id=1
   http://domain.com/mysite/cgi-bin/post.cgi?action=replytome
   http://domain.com/abc.html

and the following ones would be allowed:

Code:

http://domain.com/mysite/test.html
   http://domain.com/mysite/cgi-bin/post.cgi?action=edit
   http://domain.com/mysite/cgi-bin/post.cgi
   http://domain.com/bbc.html

It is also possible to use an "allow" in addition to disallows. For example:

Code:

user-agent: FreeFind
   disallow: /cgi-bin/
   allow: /cgi-bin/Ultimate.cgi
   allow: /cgi-bin/forumdisplay.cgi

This robots.txt file prevents the spider from accessing every cgi-bin address from being accessed except Ultimate.cgi and forumdisplay.cgi.

Using allows can often simplify your robots.txt file.

Here's another example which shows a robots.txt with two sections in it. One for "all" robots, and one for the FreeFind spider:

Code:

user-agent: *
   disallow: /cgi-bin/

   user-agent: FreeFind
   disallow:

In this example all robots except the FreeFind spider will be prevented from accessing files in the cgi-bin directory. FreeFind will be able to access all files (a disallow with nothing after it means "allow everything").

Examples:

To prevent FreeFind from indexing your site at all:

Code:

user-agent: FreeFind
disallow: /

To prevent FreeFind from indexing common Front Page image map junk:

Code:

user-agent: FreeFind
disallow: /_vti_bin/shtml.exe/

To prevent FreeFind from indexing a test directory and a private file:

Code:

user-agent: FreeFind
disallow: /test/
disallow: private.html

To allow let FreeFind index everything but prevent other robots from accessing certain files:

Code:

user-agent: *
disallow: /cgi-bin/
disallow: this.html
disallow: and.html
disallow: that.html

user-agent: FreeFind
disallow:

Here are some more examples:

The exclusion:
http://mysite.com/ignore.html
prevents that file from being included in the index.

The exclusion:
http://mysite.com/archive/*
prevents everything in the "archive" directory from being included in the index.

The exclusion:
/archive/*
prevents everything in any "archive" directory from being included in the index regardless of the site it's on.

The exclusion:
http://mysite.com/*.txt
prevents files on "mysite.com" that end with the extension ".txt" from being included in the index.

The exclusion:
*.txt
prevents all files that end with the extension ".txt" from being included in the index regardless of what site they're on.

The exclusion:
http://mysite.com/alphaindex/?.html
prevents a file like "http://mysite.com/alphaindex/a.html" from being indexed, but would allow a file "http://mysite.com/alphaindex/aardvark.html" to be indexed.

The exclusion:
http://mysite.com/alphaindex/?.html index=no follow=yes
prevents a file like "http://mysite.com/alphaindex/a.html" from being added to the index but would allow the spider to find and follow the links in that page.

The exclusion:
http://mysite.com/endwiththis.html index=yes follow=no
allows that file to be added to the index but prevents the spider from following any of the links in that file.

yoyoyoyo · #19 01-20-2005, 10:37 PM

Quote:

Originally Posted by Natch

If you don't mind my saying: "no kidding" or "so?"

you could say that about alot of the mods here, but each little addon or mod is like a lesson in the workings of php and vbulletin, so I find it interesting. I am sorry if it bothers you, but hopefully some other people will find it interesting or helpful.

Quote:

Originally Posted by Natch

I can't think of a reason why I would want to block Legit spiders (those that respect robots.txt restrictions), and a spambot spider is likely as not to ignore meta tags and robots.txt anyway.

well, maybe you can't but obviously others can, and desire that function, thus the "rules." I did not invent the meta tags. You are correct that bots don't always play by the rules, but some do, and these are the ones that this hack was addressing.

Actually....ya know what - forget about the "no index" option... forget I even mentioned it- this hack was about the "no follow" so if you are planning on implementing the first hack I suggest adding the meta in the header, and I apologize for trying to toss in more info than was needed.

Princeton · #20 01-21-2005, 12:14 AM

The "rel" attribute has been around since HTML 3.2.

It's getting a lot of attention these days because of the "junk" pages that are being indexed by the major search engines. Most notably caused by individuals who comment (spam) a blog, wicki, or forum site. The search engines are looking for a way to conserve resources (use it where it counts) and prevent indexing of sites with no relative content.

So they are asking the community to start using the rel="nofollow" attribute to help them stop-- at the very least slow -- the "spamming".

When an individual spams a site they leave links on the post hoping that the search engines will "follow" the link back to their site. When they (the spammers) do this they are hoping to increase their "popularity" with search engines.

The rel="nofollow" does not prevent the search engines from indexing your pages. Nor, does it prevent the other site from being indexed when search engines do it directly.

It will simply tell the search engine not to follow the link that was posted on your page (thread/post) -- that was NOT created by you.

ABOUT THE HACK
If you are worried about your PAGERANK than use this.

If you want to prevent spammers from posting in your forum than this hack will not help. Spammers will continue doing what they do ... the best route is to remove post and ban user. Most will not even know you are using rel="nofollow" and some will not even understand it.

SOME CONTROLS ARE NEEDED
I think there should be some controls.

For example, converting all posted links with rel="nofollow" also punishes those who are loyal to the site.

Why not help your loyal members with their site "popularity"? Do not convert links posted by loyal users. Allow the search engines to follow these links. Some sites can even list this as a membership benefit. -- just throwing ideas

Anyway, what I'm trying to get to is that the ADMIN should have some control over what links get rel="nofollow".

As it is now, all "in-house" links are tagged with rel="nofollow" which may hurt your "popularity".

kall · #21 01-21-2005, 01:07 AM

Quote:

Originally Posted by princeton

SOME CONTROLS ARE NEEDED
I think there should be some controls.

For example, converting all posted links with rel="nofollow" also punishes those who are loyal to the site.

Why not help your loyal members with their site "popularity"? Do not convert links posted by loyal users. Allow the search engines to follow these links. Some sites can even list this as a membership benefit. -- just throwing ideas

Anyway, what I'm trying to get to is that the ADMIN should have some control over what links get rel="nofollow".

As it is now, all "in-house" links are tagged with rel="nofollow" which may hurt your "popularity".

Alrighty then, try this:

PHP Code:


			
    if ($type == 'url')

    {

        global $bbuserinfo;



    if (is_member_of($bbuserinfo, 6))

        {

        // standard URL hyperlink

        return "<a href=\"$rightlink\" target=\"_blank\">$text</a>";

        }

        else

        {

         return "<a href=\"$rightlink\" rel=\"nofollow\" target=\"_blank\">$text</a>";

        }

    }

    else

This will make it so anyone who is an admin (group 6 - change this to whatever you want) will not have their links tagged with the nofollow attribute.

The syntax for multiple groups escapes me at present, but if someone can remind me, I will change it.

X vBulletin 3.8.12 by vBS Debug Information
Page Generation 0.04597 seconds Memory Usage 2,358KB Queries Executed 25 (?)
More Information
Template Usage: (1)SHOWTHREAD (1)ad_footer_end (1)ad_footer_start (1)ad_header_end (1)ad_header_logo (1)ad_navbar_below (1)ad_showthread_beforeqr (9)bbcode_code (7)bbcode_php (10)bbcode_quote (1)footer (1)forumjump (1)forumrules (1)gobutton (1)header (1)headinclude (1)modsystem_post (1)navbar (6)navbar_link (120)option (1)pagenav (1)pagenav_curpage (3)pagenav_pagelink (11)post_thanks_box (11)post_thanks_button (1)post_thanks_javascript (1)post_thanks_navbar_search (11)post_thanks_postbit_info (10)postbit (11)postbit_onlinestatus (11)postbit_wrapper (1)spacer_close (1)spacer_open (1)tagbit_wrapper Phrase Groups Available: global inlinemod postbit posting reputationlevel showthread	Included Files: ./showthread.php ./global.php ./includes/init.php ./includes/class_core.php ./includes/config.php ./includes/functions.php ./includes/class_hook.php ./includes/modsystem_functions.php ./includes/functions_bigthree.php ./includes/class_postbit.php ./includes/class_bbcode.php ./includes/functions_reputation.php ./includes/functions_post_thanks.php Hooks Called: init_startup init_startup_session_setup_start init_startup_session_setup_complete cache_permissions fetch_threadinfo_query fetch_threadinfo fetch_foruminfo style_fetch cache_templates global_start parse_templates global_setup_complete showthread_start showthread_getinfo forumjump showthread_post_start showthread_query_postids showthread_query bbcode_fetch_tags bbcode_create showthread_postbit_create postbit_factory postbit_display_start post_thanks_function_post_thanks_off_start post_thanks_function_post_thanks_off_end post_thanks_function_fetch_thanks_start post_thanks_function_fetch_thanks_end post_thanks_function_thanked_already_start post_thanks_function_thanked_already_end fetch_musername postbit_imicons bbcode_parse_start bbcode_parse_complete_precache bbcode_parse_complete postbit_display_complete post_thanks_function_can_thank_this_post_start pagenav_page pagenav_complete tag_fetchbit_complete forumrules navbits navbits_complete showthread_complete
Messages:

The Arcive of Official vBulletin Modifications Site.

It is not a VB3 engine, just a parsed copy!