PDA

View Full Version : Mini Mods - Keyword weight based spam detector


HuangA
07-07-2008, 10:00 PM
I coded this one because I constantly had to moderate and / delete them lengthy lame cell phone ads on vBulletin.com's forums and my own forums. You know, buy iphone, ipod touch, noika blah blah blah sony ericsson blah blah blah etc. etc. etc. email us, we're legitimate business in a country you've never heard of, blah blah blah spam.

While Akismet does work on filtering them out, some times they still leak through.

I know there's two other keyword based tools that automatically adds things to moderation queue (One from SirAdrian (https://vborg.vbsupport.ru/showthread.php?t=131568) and one from tweakmonkey (https://vborg.vbsupport.ru/showthread.php?t=129390)), but it doesn't work too well for me, because I run an iPhone / iPod Touch site and I can't have those keywords on auto spam for simply appearing. So, here's what I did for mine...

What does this product do?

Adds 1 vBulletin Options setting group, with 4 settings
Allows you to define a list of keywords with associated score
Allows you to set a threshold for automatic moderation
Allows you to set a threshold for automatic rejection
Allows you to set a post count limit for posts to be scanned
Adds 1 plugin which gets ran at newpost_process
Adds 1 plugin which gets ran at editpost_update_process


How does it work?
1) You configure your keyword list, and score weight. For example, I use this list:
Nokia|0.5
iPhone|0.5
iPod Touch|0.5
Order|0.5
HTC|0.5
Samsung|0.5
Sony Ericsson|0.5
hotmail|0.5
$|0.5
usd|0.5
url|0.3
email|0.5
The list basically means each time the plugin sees "Noika", it will get a score of 0.5, $, 0.5, etc. etc. A tally of all the score is totaled, and
2) You configure your moderation score, for example, I use 50.
3) You configure your rejection score, for example, I use 100.
4) You configure your exemption post count, for example, I use 5.

When a new post is being created (this could be a thread, or a reply, doesn't matter, they both trigger newpost_process hook), the plugin will count how many times each keyword appears, and total the score. If it is higher than or equal to the moderation score, it will tuck the post into moderation queue. If it is higher than or equal to the rejection score, a standard vBulletin error message is shown to the user.

How much overhead does this add? Realistically, not much... depending on amount of keywords used, I'd say most likely under 0.05 seconds of your CPU time for each post. If you are really that worried, you can set your exemption post count to something lower, and so lesser posts are scanned. Default is 5 right now.

This have been tested on 3.7.0 Beta 5, and 3.7.2. I see no reason why it would not work on 3.6.x series, too.

Change log
0.0.0 => 0.1.0

Changed error message to use vBulletin error message screen instead of die()
Added option for omitting after certain post count (default 5)
Added default values to options
Fixed options not appearing after product import (I forgot to export them for 0.0.0)
Added scanning for editing post (AJAX doesn't seem to give error... I'll work on that for 0.1.1 later)

HuangA
07-08-2008, 08:19 PM
<Reserving second post in thread, in case if I ever need to extend beyond the first post>

KURTZ
07-08-2008, 10:20 PM
interesting Andy ... but just a question runs onto the latest vB?

youradhere4222
07-08-2008, 10:25 PM
This is fantastic! I've installed all of the keyword-moderation hacks but I've been having problems with effectiveness. Is there any way you could set a post count threshold for checking keywords? Also, does this work for edited posts as well?

HuangA
07-08-2008, 10:44 PM
interesting Andy ... but just a question runs onto the latest vB?
I see no reason why it would not work with it. Though, I don't have a test forum to install it on. I'll try to work out a test forum tonight.

This is fantastic! I've installed all of the keyword-moderation hacks but I've been having problems with effectiveness. Is there any way you could set a post count threshold for checking keywords? Also, does this work for edited posts as well?
It doesn't work for edited posts yes. So in theory they can make a post with 10 characters first, and then edit it. I am planning to add that in to a later version to stop that work around.

Q-v-n-s-Q
07-08-2008, 11:21 PM
Reserving, thank you

HuangA
07-09-2008, 04:26 AM
Apologies for the first person to install... If you gotten 0.0.0 instead of 0.1.0, please upgrade... it is probably best if you remove 0.0.0 and then install 0.1.0 because I changed the plugin name (for differentiation) and added the missing options (forgot to export them in first build and didn't notice it).

Aside from that, I did the post count thing so it only scans for a configurable amount of posts, and made it use error message screen instead of boring die() screen as per requested.

So in summary:
KURTZ: Yes, it works for 3.7.2 :)
youradhere4222: Yes, it works for edit now (please install 0.1.0) :)

cheat-master30
07-15-2008, 10:15 AM
I think I might try this, because it might block some annoying spamming that I've seen without causing the disruption of censoring it.

youradhere4222
07-22-2008, 06:55 PM
This works great!

This is somewhat of a long-shot suggestion, but in addition to having posts automatically rejected could we have users automatically banned for a pre-defined period if they hit a certain number of keywords? Also, to ensure that the ban was accurate, could a PM be sent (or even better a thread posted in a "staff forum" - like reported PM's and infractions) saying that xxx has been banned for xx days for posting the following message [ quote ] nokia, ipod, etc. [ /quote ]

Thanks!

HuangA
07-23-2008, 12:29 PM
This works great!

This is somewhat of a long-shot suggestion, but in addition to having posts automatically rejected could we have users automatically banned for a pre-defined period if they hit a certain number of keywords? Also, to ensure that the ban was accurate, could a PM be sent (or even better a thread posted in a "staff forum" - like reported PM's and infractions) saying that xxx has been banned for xx days for posting the following message [ quote ] nokia, ipod, etc. [ /quote ]

Thanks!
Personally, I don't want to do that on my forum because of the possibility of false positives when I'm not around, and I could potentially ban someone who is genuinely interested in my forum before they even make their first post. But, I can see usefulness of that in some other forums, so I can certainly look into coding that some time this weekend or whenever I have time... no guarentee as to when I can push that out though.

youradhere4222
07-24-2008, 12:56 PM
Personally, I don't want to do that on my forum because of the possibility of false positives when I'm not around, and I could potentially ban someone who is genuinely interested in my forum before they even make their first post. But, I can see usefulness of that in some other forums, so I can certainly look into coding that some time this weekend or whenever I have time... no guarentee as to when I can push that out though.

I agree, but let's say you have a competing site: competingsite.com

If they were frequently spamming you, you could enter the keyword and other variations to automatically ban anyone who uses it. It could also be used to auto-ban those who use racial slurs or use words you prohibit in the rules.

HuangA
07-24-2008, 05:20 PM
Yes, there are certainly benefits to it. In your described case though, I'd still take additional percautions. I have had people coming to my site and first thing thy said was something like:
I just found this site from google, comparing to <competitor site>, this is way better and easier to use. Thank you for making this possible!!
If you do add competitor site to your keyword list, I'd recommend giving it some flexibility (ie: allow two occurances in post before it trigger moderation, and three or so before it trigger reject).

As mentioned, I'll look into coding an auto ban level during the weekend coming up, and update this again :)

PS: I'm considering a further "profile" system where we can create different sets of keywords/weights, so we can target spam better; but one problem I can see is if we add too many sets of profiles, the math required will probably take more CPU time... Any opinions on this, anyone?

HuangA
07-29-2008, 09:11 AM
Sorry, just reporting in that I had a very busy weekend so I did not got around to work on this during the weekend. I will try to allocate some time aside this weekend for this.

veenuisthebest
10-17-2008, 07:25 AM
hello Andy..

This is one of the bestest spam preventing mods I have seen till now and it works perfect on my 3.7.3 PL1 board. Wonder why it has so less installs.

I think people like to stay away from mods that have a BETA tag to them. I hope you remove that BETA soon please :)

Thank you

HuangA
10-18-2008, 04:57 AM
Thanks for the feedback, and sorry to everyone as I have not had a chance to update this because of development works... I have something similar (and hopefully even better) in the workings... stay tuned :)

Chadi
12-18-2008, 03:15 AM
This is not working for me at all in 3.7.4

Nokia|1.0
iPhone|1.0
iPod Touch|1.0
Order|1.0
HTC|1.0
Samsung|1.0
Sony Ericsson|1.0
hotmail|1.0
$|1.0
usd|1.0
url|0.3
email|1.0
Created a regular member called test (zero post count) and attmpted to post based on the keywords. I posted Nokia about 10 times, post went through fine.

Moderate Threshold Score is set to 5
Reject Threshold Score is set to 0
Spam Scanning Post Threshold is set to 5

veenuisthebest
12-18-2008, 06:27 AM
Not sure why it is not working for you. Works more than great for me on 3.7.4.

Try setting "Moderate Threshold Score is set to 50" as 5 is tooo less. And try posting a real spam post, search for it.

tekguru
03-25-2009, 06:41 PM
Does this work okay on 3.8.1?

HuangA
03-26-2009, 04:43 AM
The hooks for that haven't changed, I think you should be okay. Though, I don't honestly recall which version is the up most version number I've defined for this. If it tells you that your version is not compatabile, then you can try to edit the xml to get around the limitation, and then install it. In the worst case where it doesn't work, just uninstall it ;)

Farstate
05-15-2009, 02:58 PM
Great mod - works fine on 3.8.1 PL1

TimberFloorAu
09-08-2009, 09:22 PM
Installed on 3.8.4

JUst so I can get my head round all this, if we set a weight of 1.0 for say the word viagra, and that is used 48 times within a new post of a member who has less than 5 posts, and our threshold score is say 50... then their post gets POSTED ?

HuangA
09-09-2009, 06:00 AM
Assuming moderation threshold score set at 50, reject threshold set to 100 and it scans only people with < 5 posts...

User 1 with post count of 50:
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
Post appears.

User 2 with post count of 2:
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra
Post appears.

User 3 with post count of 4:
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
Post goes to moderation queue.

User 4 with post count of 1:
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
viagra viagra viagra viagra viagra viagra viagra viagra viagra viagra
Cialis. Prescription. ... oh, and watch replicas!
Post doesn't even get posted.


Of course, you should have more than one word, and they'd run into the limit a lot sooner. You can also have things like $, http://, url=, etc. on your list. You can also adjust the threshold limits. You can also adjust the keyword weight (IE: viagra = 50 points right away).

Long story short, you control the magic.

Nokia cell phones anyone? :p

TimberFloorAu
09-09-2009, 07:32 AM
Cheers Andy ! Good one.

HansiB
09-15-2009, 10:17 AM
Where do i put/find the keyword and threshold settings?

HuangA
09-16-2009, 09:17 AM
You can enter your keywords and weights in Admin CP > vBulletin Options > Keyword Weight Anti-spam.
You will need to find and craft your own set of filter to cater the spams that you are getting though. The default set really only targets one set of commonly recurring ad for certain cell phone seller...

washingtonboise
05-07-2010, 08:09 AM
Very impressed with this mod, found that negative values also work.
e.g. if you have a forum about selling cialis and don't want people posting about viagra, you can give 'cialis' a negative value so that if someone happens to use both in the same post (maybe they're comparing the differences and that makes it NOW an on-topic post), chances are the system will allow the conversation unless they say viagra multiple times.

InsaneNutter
05-09-2010, 07:03 PM
Works perfectly to stop bots spamming warez / trojans too.

For example hotfile.com is now worth 1 point on our forum, so when a bot spams links to warez hosted on hotfile.com the post is instantly rejected :)

Surprised this mod has not gained more attention, I can see this been a great asset to our forum!

Keep up the great work.

maxc0der
06-07-2010, 07:21 AM
Impressive... Just wanted to say thanks.

It hasn't been updated for a while but I hope we can see a 4.0 version.

EtaiWix
06-22-2010, 05:05 PM
I'm so sorry for the noob question, but how do I install it?

I downloaded the XML file but I'm at a loss for what to do... :(

Thanks for the help!

EtaiWix
06-22-2010, 05:52 PM
Nevermind, I found it... THANKS!!!!

This is GREAT!!

Now if only someone who knows some coding can help me with my other problem... :(

robk6364
07-08-2010, 08:34 PM
This is excellent, thanks a million times!

EtaiWix
08-16-2010, 07:13 AM
Slight problem: I see it's not working for ''partial' words- i.e. if I blocked' nike', and someone spams 'nikefun', then his post goes through...

Can you add that it will block any instance of it even if it's part of another word?

ArchAngelz
12-24-2011, 05:22 AM
Would something like this work for vb4?

chrisrouse
10-04-2013, 12:56 PM
Does this work with vb4? This would be perfect for the issue we're having with people advertising handbags and Uggs.

jaslon
12-15-2015, 10:12 AM
I just installed this on vBulletin 3.8.9. It seems to work well. I found one bug though which I think affects all vBulletin versions. If you set the Reject threshold to 0 to disable rejections the moderation will also be disabled, so if you do not want to use reject you should instead set the reject threshold to something very high. I can see in the source code that this is a bug. It wouldn't be very difficult to correct it, but just using a very high reject threshold also works.

jaslon
12-21-2015, 02:19 PM
I have discovered an annoying bug in the edit part of this script. Here is an example of when the bug occurs:

1. A spammer first posts a message without any spam keywords so that a new thread and a new post is created

2. The spammer then returns and edits his message and adds spam keywords to the post

3. The filter then auto-moderates the post using the following code linked to the "editpost_update_process" hook:
$dataman->set('visible', 0);
$edit['visible'] = 0;

4. If the edit has been done within the time-limit when no "Edited by..." is displayed and the old version of the post is not saved then this will result in a visible thread that contains no posts visible for normal users. There is a post there, but the spam keyword filter has unapproved it so it is not visible for normal users.

I think that I have found a solution to this. I have disabled the "editpost_update_process" hook used in this mod and I have instead written my own routine for the "editpost_update_complete" hook. My code only works with moderation, since I only use moderation and the php code for this new hook looks like this:


if ($vbulletin->userinfo['posts'] < $vbulletin->options['kwas_antispam_posts']) {
$scan = strtolower($edit['message']);
$keywords = explode("\r\n", strtolower($vbulletin->options['kwas_keyword_weights']));
$total = 0;
foreach($keywords as $keyword) {
$keyword = explode("|", $keyword);
$total += substr_count($scan, $keyword[0]) * $keyword[1];
}
if (($total >= $vbulletin->options['kwas_moderate_threshold']) &&($total < $vbulletin->options['kwas_reject_threshold']) && $vbulletin->options['kwas_moderate_threshold']) {
require_once(DIR.'/includes/functions_databuild.php');
unapprove_post($postinfo['postid'],($foruminfo['countposts'] AND !$post['skippostcount']), true, $postinfo, $threadinfo, false);
}
}


This more "high-level" solution using the buildt-in unapprove routines in vBulletin also unapproves threads, re-calculates post counts etc. so it seems to be a better solution.