PDA

View Full Version : Spam Buster / Killer to Merge.


Michael Morris
02-21-2005, 10:57 PM
UK Jimbo and I have decided to merge our projects together. We'll be using this thread for development purposes. Any comments or input on what you'd like to see in an antispam software would be appreciated.

Michael Morris
02-21-2005, 11:00 PM
################################################
######## SPAM KILLER !! ########################
################################################

// settings
$systemuserid = 2;
$systemusername = 'Messageboard Golem';
$banusergroup = 8;
$reportforumid = 4; // Live site is 114

eval('$blacklist = "' . fetch_template('mtblacklist') . '";');
$spamlist = explode("\n", $blacklist);

foreach ($spamlist as $spam)
{
// Chop off comment text at the end of some lines as necessary
if (strstr($spam, '#'))
{
$spam = substr($spam, 0, strpos($spam,"#") - 1);
}

if (!strstr($spam, '#') AND strlen($spam) != 0)
{
if (eregi(trim($spam), $newpost['message']))
{
$report_foruminfo = fetch_foruminfo($reportforumid);
$reportpost = array(
'username' => $systemusername,
'userid' => $systemuserid,
'title' => 'Spam Alert: ' . $newpost['title'],
'emailupdate' => 9999
);

eval('$reportpost[message] = "' . fetch_template('spam_alert') . '";');
build_new_post('thread', $report_foruminfo, array(), 0, $reportpost, $errors);

// Now ban them

// check to see if there is already a ban record for this user in the userban table
if ($check = $DB_site->query_first("SELECT userid, liftdate FROM " . TABLE_PREFIX . "userban WHERE userid = $bbuserinfo[userid]"))
{
// there is already a record - just update this record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "userban SET
adminid = $systemuserid,
bandate = " . TIMENOW . ",
liftdate = 0
WHERE userid = $bbuserinfo[userid]
");
}
else
{
// insert a record into the userban table
$DB_site->query("
INSERT INTO " . TABLE_PREFIX . "userban
(userid, usergroupid, displaygroupid, customtitle, usertitle, adminid, bandate, liftdate)
VALUES
($bbuserinfo[userid], $bbuserinfo[usergroupid], $bbuserinfo[displaygroupid], $bbuserinfo[customtitle], '" . addslashes($bbuserinfo['usertitle']) . "', $systemuserid, " . TIMENOW . ", 0)
");
}

// update the user record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "user SET
$bantitlesql
usergroupid = $banusergroup,
displaygroupid = $banusergroup
WHERE userid = $bbuserinfo[userid]
");

eval(print_standard_error('error_nospam'));
}
}
}

nexialys
02-21-2005, 11:12 PM
genius.. merge may be the best to avoid duplicates, and will help concentrate on the problem... (two heads is always better than just one!)

Erwin
02-22-2005, 03:16 AM
Moved from Beta hacks to this forum as this is not a hack release but a hack collaboration discussion. :)

Michael Morris
02-22-2005, 04:01 AM
K Erwin. Thanks.

oldengine
02-24-2005, 02:09 AM
OK guys, fine, but remember the K.I.S.S. method and don't get it so complicated that it takes a programming degree to install and maintain it.

The prime need is to keep JOEBLOW from registering and posting "visit our website for our new promotional marketing campaign where you can earn thousands." Click here: www.joeblowsspamsite.com

Keeping a running list of joeblowsspamsite and his favorite key words and phrases so that they never see the light of day. I'm sure you know the way and we'll be looking for your posts.

Michael Morris
02-27-2005, 05:10 AM
Ok, several changes here.

1) More commentary code.
2) Blacklist is now called from a file instead of a template
3) File is included from PHPINCLUDE_START

This is currently the PHPINCLUDE_START section

if (defined('GET_EDIT_TEMPLATES'))
{
require('./antispam.php');
}

Later we can include a post variable but for testing purposes that's just another thing that can go wrong. The GET_EDIT_TEMPLATES constant is defined true by any script we would conceivably want to watch for spam - not only vbulletin inbuilt scripts but also hacks which have inputs such as vblinks.

The other file is antispam itself for the moment. If it isn't clear with the comment code let me know.

<?php
/*================================================= =====================*\
|| ################################################## ################## ||
|| # VbSpamicide # ||
|| # Developed by UK Jimbo & Michael Morris # ||
|| # Alpha Day 1 # ||
|| # ---------------------------------------------------------------- # ||
|| # ---------------- VBULLETIN IS NOT FREE SOFTWARE ---------------- # ||
|| # http://www.vbulletin.com | http://www.vbulletin.com/license.html # ||
|| ################################################## ################## ||
\*================================================ ======================*/

// This script gets called by PHPINCLUDE_START, so check to see if the
// DB is initialized before running.

if (!is_object($DB_site))
{
echo 'You can\'t access this file directly';
exit;
}

// Define our settings. Later we'll use variables from the $vboptions group
// To perform these assignements. For now let's tag them into the $vboptions
// array manually.
$vboptions['systemuserid'] = 2; // This is the user id for the system auto poster.
$vboptions['systemusername'] = 'Messageboard Golem'; // This is the name of the auto poster.
$vboptions['usetachy'] = false; // If set true the system adds the user to tachy goes
// to coventry. Tachy needs to be hardened for this to be
// effective. For now we'll concentrate on banning.

$vboptions['spambangroup'] = 8; // The usergroup spammers go to.
$vboptions['reportforum'] = 4; // For now use a forum for spam reports. Later make this
// an option.

// Transfer the $_POST data to the variables we want to work with as necessary. This code is identical
// to newthread.php, newreply.php and editpost.php
if (isset($_POST['WYSIWYG_HTML']))
{
require_once('./includes/functions_wysiwyg.php');
$spamcheck['message'] = convert_wysiwyg_html_to_bbcode($_POST['WYSIWYG_HTML'], $foruminfo['allowhtml']);
}
else
{
$spamcheck['message'] = &$_POST['message'];
}

// Grab the blacklist text file.
$blacklist = file_get_contents('blacklist.txt');

// Explode it into an array broken down line by lines.
$spamlist = explode("\n", $blacklist);

// Now grab the blacklist template. This template will include user defined url's seperate from
// the master blacklist.
eval('$blacklist = "' . fetch_template('mtblacklist') . '";');

// Explode it as well.
$localspamlist = explode("\n", $blacklist);

// Merge the lists.
$spamlist = array_merge($spamlist, $localspamlist);

// Use a foreach loop to iterate over the spamlist.
foreach ($spamlist as $spam)
{
// Chop off comment text at the end of some lines as necessary
if (strstr($spam, '#'))
{
$spam = substr($spam, 0, strpos($spam,"#") - 1);
}

// Check if the line is now blank because of the above operation, and if so, skip it.
if (!strstr($spam, '#') AND strlen($spam) != 0)
{
// Now use a regluar expression to check known for URL's off the blacklist.
if (eregi(trim($spam), $spamcheck['message']))
{
// Ok, true. For now we will go ahead and report the post in a designated forum.
// Later we will choose from a number of branch actions.

// Grab the forum info for the report post forum
$report_foruminfo = fetch_foruminfo($vboptions['reportforum']);

// Create a report post array.
$reportpost = array(
'username' => $vboptions['systemusername'],
'userid' => $vboptions['systemuserid'],
'title' => 'Spam Alert: ' . trim(htmlspecialchars_uni($_POST['subject'])),
'emailupdate' => 9999
);

// This template isn't cached. It's used so rarely will it need to be?
eval('$reportpost[message] = "' . fetch_template('spam_alert') . '";');

// Call the library containing function build new post
require_once('./includes/functions_newpost.php');

// Call build new post and make the report.
build_new_post('thread', $report_foruminfo, array(), 0, $reportpost, $errors);

// Now begin the banning proceedure.

// check to see if there is already a ban record for this user in the userban table
if ($check = $DB_site->query_first("SELECT userid, liftdate FROM " . TABLE_PREFIX . "userban WHERE userid = $bbuserinfo[userid]"))
{
// there is already a record - just update this record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "userban SET
adminid = $vboptions[systemuserid],
bandate = " . TIMENOW . ",
liftdate = 0
WHERE userid = $bbuserinfo[userid]
");
}
else
{
// insert a record into the userban table
$DB_site->query("
INSERT INTO " . TABLE_PREFIX . "userban
(userid, usergroupid, displaygroupid, customtitle, usertitle, adminid, bandate, liftdate)
VALUES
($bbuserinfo[userid], $bbuserinfo[usergroupid], $bbuserinfo[displaygroupid], $bbuserinfo[customtitle], '" . addslashes($bbuserinfo['usertitle']) . "', $vboptions[systemuserid], " . TIMENOW . ", 0)
");
}

// update the user record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "user SET
$bantitlesql
usergroupid = $vboptions[spambangroup],
displaygroupid = $vboptions[spambangroup]
WHERE userid = $bbuserinfo[userid]
");

// Now parse some global templates which haven't been called yet (we arrive here from
// PHPINCLUDE_START

eval('$timezone = "' . fetch_template('timezone') . '";');
eval('$gobutton = "' . fetch_template('gobutton') . '";');
eval('$spacer_open = "' . fetch_template('spacer_open') . '";');
eval('$spacer_close = "' . fetch_template('spacer_close') . '";');

// parse headinclude, header & footer
eval('$headinclude = "' . fetch_template('headinclude') . '";');
eval('$header = "' . fetch_template('header') . '";');
eval('$footer = "' . fetch_template('footer') . '";');

// Inform the user that they've been spam banned.
eval(print_standard_error('error_nospam'));
}
}
}

?>

EDIT: There is now a local list above. Moving on to work on getting the system to "learn" bad URL's.

UK Jimbo
02-28-2005, 09:14 PM
Hi Michael,

I couldn't get hold of you on MSN Messenger so I've penned down some thoughts/questions here:

New code
I like the style, comments and fact that it's been called from PHPINCLUDE_START good work :)

Blacklists
What do you think the best way of storing the blacklist(s) and making it editable is? I wonder if a phrase would be a good plan (even if it was managed through a custom part of the admincp rather than the phrase manager).

If we wanted to store them as files and be able to write to them via the web interface the files would have to be world or web user writable.

It'd be easy to setup a scheduled task to drag down the latest copy of the MT blacklist.

Multiple fields
What are your thoughts on breaking down the fields passed into the "spam engine"? I'm thinking along the lines of the way that the second version of spamBuster was able to have rules relating to the body text or the subject. Username might be anoter field worth matching against - lots of spammers seem to use the recipe [username][number] like robby34. Perhaps something to worry about later.

Are you happy with me going ahead and writing a lower level library that does the spam processing and leaving some of the vBulletin integration (admincp code) to you?

Made some changes to the code in the file handling and character substitution

<?php
/*================================================= =====================*\
|| ################################################## ################## ||
|| # VbSpamicide # ||
|| # Developed by James Cohen & Michael Morris # ||
|| # Alpha Day 2 # ||
|| # ---------------------------------------------------------------- # ||
|| # ---------------- VBULLETIN IS NOT FREE SOFTWARE ---------------- # ||
|| # http://www.vbulletin.com | http://www.vbulletin.com/license.html # ||
|| ################################################## ################## ||
\*================================================ ======================*/

// This script gets called by PHPINCLUDE_START, so check to see if the
// DB is initialized before running.

if (!is_object($DB_site))
{
echo 'You can\'t access this file directly';
exit;
}

// Define our settings. Later we'll use variables from the $vboptions group
// To perform these assignements. For now let's tag them into the $vboptions
// array manually.
$vboptions['systemuserid'] = 2; // This is the user id for the system auto poster.
$vboptions['systemusername'] = 'Messageboard Golem'; // This is the name of the auto poster.
$vboptions['usetachy'] = false; // If set true the system adds the user to tachy goes
// to coventry. Tachy needs to be hardened for this to be
// effective. For now we'll concentrate on banning.

$vboptions['spambangroup'] = 8; // The usergroup spammers go to.
$vboptions['reportforum'] = 4; // For now use a forum for spam reports. Later make this
// an option.

// Transfer the $_POST data to the variables we want to work with as necessary. This code is identical
// to newthread.php, newreply.php and editpost.php
if (isset($_POST['WYSIWYG_HTML']))
{
require_once('./includes/functions_wysiwyg.php');
$spamcheck['message'] = convert_wysiwyg_html_to_bbcode($_POST['WYSIWYG_HTML'], $foruminfo['allowhtml']);
}
else
{
$spamcheck['message'] = &$_POST['message'];
}

// Grab the blacklist text file.
$blacklist = file('blacklist.txt');

// split it into an array broken down line by line (windoze friendly)
$spamlist = preg_split('/\r?\n/', $blacklist);

// Now grab the blacklist template. This template will include user defined url's seperate from
// the master blacklist.
eval('$blacklist = "' . fetch_template('mtblacklist') . '";');

// Explode it as well.
$localspamlist = preg_split('\r?\n', $blacklist);

// Merge the lists.
$spamlist = array_merge($spamlist, $localspamlist);

// Use a foreach loop to iterate over the spamlist.
foreach ($spamlist as $spam)
{
// Chop off comment text at the end of some lines as necessary
$spam = preg_replace('/^(.*)#.*?$/', '$1', $spam);

// Check if the line is now blank because of the above operation, and if so, skip it.
if (!strstr($spam, '#') AND strlen($spam) != 0)
{
// Now use a regluar expression to check known for URL's off the blacklist.
if ( preg_match('/'. trim($spam) .'/i', $spamcheck['message']))
{
// Ok, true. For now we will go ahead and report the post in a designated forum.
// Later we will choose from a number of branch actions.

// Grab the forum info for the report post forum
$report_foruminfo = fetch_foruminfo($vboptions['reportforum']);

// Create a report post array.
$reportpost = array(
'username' => $vboptions['systemusername'],
'userid' => $vboptions['systemuserid'],
'title' => 'Spam Alert: ' . trim(htmlspecialchars_uni($_POST['subject'])),
'emailupdate' => 9999
);

// This template isn't cached. It's used so rarely will it need to be?
eval('$reportpost[message] = "' . fetch_template('spam_alert') . '";');

// Call the library containing function build new post
require_once('./includes/functions_newpost.php');

// Call build new post and make the report.
build_new_post('thread', $report_foruminfo, array(), 0, $reportpost, $errors);

// Now begin the banning proceedure.

// check to see if there is already a ban record for this user in the userban table
if ($check = $DB_site->query_first("SELECT userid, liftdate FROM " . TABLE_PREFIX . "userban WHERE userid = $bbuserinfo[userid]"))
{
// there is already a record - just update this record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "userban SET
adminid = $vboptions[systemuserid],
bandate = " . TIMENOW . ",
liftdate = 0
WHERE userid = $bbuserinfo[userid]
");
}
else
{
// insert a record into the userban table
$DB_site->query("
INSERT INTO " . TABLE_PREFIX . "userban
(userid, usergroupid, displaygroupid, customtitle, usertitle, adminid, bandate, liftdate)
VALUES
($bbuserinfo[userid], $bbuserinfo[usergroupid], $bbuserinfo[displaygroupid], $bbuserinfo[customtitle], '" . addslashes($bbuserinfo['usertitle']) . "', $vboptions[systemuserid], " . TIMENOW . ", 0)
");
}

// update the user record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "user SET
$bantitlesql
usergroupid = $vboptions[spambangroup],
displaygroupid = $vboptions[spambangroup]
WHERE userid = $bbuserinfo[userid]
");

// Now parse some global templates which haven't been called yet (we arrive here from
// PHPINCLUDE_START

eval('$timezone = "' . fetch_template('timezone') . '";');
eval('$gobutton = "' . fetch_template('gobutton') . '";');
eval('$spacer_open = "' . fetch_template('spacer_open') . '";');
eval('$spacer_close = "' . fetch_template('spacer_close') . '";');

// parse headinclude, header & footer
eval('$headinclude = "' . fetch_template('headinclude') . '";');
eval('$header = "' . fetch_template('header') . '";');
eval('$footer = "' . fetch_template('footer') . '";');

// Inform the user that they've been spam banned.
eval(print_standard_error('error_nospam'));
}
}
}

?>

I've used preg_split() in place of explode() to make the file splitting windows friendly.

The comments should be replaced out using preg_replace()

The main regular expression tests are done using preg_match() which I think in a lot of cases is faster than eregi()

Looking at those changes who'd guess I've developed in Perl a fair bit? :ermm:

I've not tested this code the line I'm most dubious about is the comment removing code.

Michael Morris
03-01-2005, 09:44 AM
Hi Michael,

I couldn't get hold of you on MSN Messenger so I've penned down some thoughts/questions here:

New code
I like the style, comments and fact that it's been called from PHPINCLUDE_START good work :)


Thanks


Blacklists
What do you think the best way of storing the blacklist(s) and making it editable is? I wonder if a phrase would be a good plan (even if it was managed through a custom part of the admincp rather than the phrase manager).

Since the master blacklist gets updated a lot I'd like to keep it seperate from the local list. While storing as txt file is optional, storing as a template needs to be an option for those who have file systems set up such that php can't write to the file system (for this same reason vbulletin has the option to retain CSS definitions in the page itself although this is far less efficient).

The local list needs to be a template for quick accessability.

At 50K and growing, I don't think it's gonna fit in the phrase system.

At some point we need a cron job to go to jay allen's site and pull down the updates to the list (He's given his permission for this). To spare his bandwidth, we need to get the system to only do a full refresh when requested from the admincp. Most of the time the system should download the latest 100 additions about once every 3 to 5 days.

Multiple fields
What are your thoughts on breaking down the fields passed into the "spam engine"? I'm thinking along the lines of the way that the second version of spamBuster was able to have rules relating to the body text or the subject. Username might be anoter field worth matching against - lots of spammers seem to use the recipe [username][number] like robby34. Perhaps something to worry about later.

On large boards such as mine there are many legit users that use numbers in their user names. Hence it would be difficult if not impossible to make it a useable discernment.

What would be idea is an algorythm to iterate over the user's signature, post, and title, extract all URL's and put them in an array. Then compare these arrays for a match. Depending on the number of matches we can extract domain names from the array and add them to the local list.

The message itself should be scanned for spammyness. Repeated use of the $ character, FREE in all caps, and maybe use an unusual words list (user definable) for words that shouldn't occur on a normal basis - viagra for example.

Are you happy with me going ahead and writing a lower level library that does the spam processing and leaving some of the vBulletin integration (admincp code) to you?

Sounds good. I'll start with the installer to set up the vboptions for this hack.

As far as functions - right now the code here has the actions taken inside the searching loop. To be honest these need to be seperate. Set up some kind of static variable to cound matches and return it, and put the ban action in a seperate function (or look into the possibility of using the existing ban functions. BTW, I noticed that in spam buster you wrote a routine to send mail - there's already a mail function in vbulletin: vbmail. It's defined in the functions library with is included on all executions of the vbulletin code.



Looking at those changes who'd guess I've developed in Perl a fair bit? :ermm:

I personally avoid PRCE expressions like the plague, but sometimes they're the only way to go :)

UK Jimbo
03-01-2005, 10:38 AM
Since the master blacklist gets updated a lot I'd like to keep it seperate from the local list.

I wasn't suggesting that the two were stored together. Very much pro having local/standard rules.

While storing as txt file is optional, storing as a template needs to be an option for those who have file systems set up such that php can't write to the file system (for this same reason vbulletin has the option to retain CSS definitions in the page itself although this is far less efficient).

The local list needs to be a template for quick accessability.

At 50K and growing, I don't think it's gonna fit in the phrase system.

Sounds sensible. I'm not much of a template expert, to be accessable in an environment where there are multiple styles will this template have to be inherited from a default style?

At some point we need a cron job to go to jay allen's site and pull down the updates to the list (He's given his permission for this). To spare his bandwidth, we need to get the system to only do a full refresh when requested from the admincp. Most of the time the system should download the latest 100 additions about once every 3 to 5 days.

That should be easy to do by looking at the template.dateline field.

A little wrapper script in ./includes/cron and called via the scheduled tasks is probably the best way to go.

On large boards such as mine there are many legit users that use numbers in their user names. Hence it would be difficult if not impossible to make it a useable discernment.

I wasn't suggesting that all users with bob90 style names should be banned just that using a points based system it could count against them.

What would be idea is an algorythm to iterate over the user's signature, post, and title, extract all URL's and put them in an array. Then compare these arrays for a match. Depending on the number of matches we can extract domain names from the array and add them to the local list.

I was planning to make the library checking function use an input array so you can define different fields in the rules as per the most mature incarnation of spamkiller.

The message itself should be scanned for spammyness. Repeated use of the $ character, FREE in all caps, and maybe use an unusual words list (user definable) for words that shouldn't occur on a normal basis - viagra for example.

I think that free at all should count towards it being spam. I've caught quite a few with the combination of "free", a url and "$" or "%" signs in a post.

Sounds good. I'll start with the installer to set up the vboptions for this hack.

As far as functions - right now the code here has the actions taken inside the searching loop. To be honest these need to be seperate. Set up some kind of static variable to cound matches and return it, and put the ban action in a seperate function (or look into the possibility of using the existing ban functions. BTW, I noticed that in spam buster you wrote a routine to send mail - there's already a mail function in vbulletin: vbmail. It's defined in the functions library with is included on all executions of the vbulletin code.

I'll make stuff as reusable as possible. Certainly having functions to handle logging, banning, etc would make sense and it would make switching them in/out through configuration variables much easier.

The sb_send_mail function was just there to build the mail. vbmail() was used to do the sending :)

function sb_send_mail($vars) {

$mail=array();
$mail[] = "This is an automated email from vB SpamBuster";
$mail[] = "";
$mail[] = "The user <%USER%> has just tried to post the following message:";
$mail[] = "";
$mail[] = "***********************************************";
$mail[] = "<%MESSAGE_TITLE%>";
$mail[] = "***********************************************";
$mail[] = "<%MESSAGE_BODY%>";
$mail[] = "***********************************************";
$mail[] = "";
$mail[] = "The vB SpamBuster system deemed it to be spam after it passed the following tests:";
$mail[] = "<%HITS_STR%>";
$mail[] = "";
$mail[] = "This post has now been put in the moderation queue";

$msg = implode("\n",$mail);

foreach($vars as $k => $v) {
$msg = str_replace("<%$k%>",$v,$msg);
}

$emails = explode(' ',SB_ALERT_EMAILS);
foreach($emails as $email) {
vbmail($email,'vB SpamBuster Alert',$msg);
}
}



I personally avoid PRCE expressions like the plague, but sometimes they're the only way to go :)

I think their speed is often overlooked - could be folklore but I've heard people say that the Perl regular expression has been so highly optimised over the years that often doing substr/strchr type operations is no less CPU intensive.

Right - there's plenty to bet getting on with now :D

GetGamer.com
03-19-2005, 05:26 PM
This looks like it's going to be a great mod. I think you're right on target... leveraging Jay Allen's MT Blacklist and providing a local list capability. It's all shaping up great. Any estimate on when it will be available?

oldengine
10-01-2005, 04:08 PM
What happened with this project?

user_not_found
09-22-2007, 07:49 PM
What going on with this project? any news?

Dismounted
09-23-2007, 06:21 AM
Did you realize that the last post to this thread is (nearly) 2 years old?

UK Jimbo
09-23-2007, 11:11 AM
<a href="https://vborg.vbsupport.ru/showthread.php?t=155242&highlight=spambuster" target="_blank">https://vborg.vbsupport.ru/showt...ght=spambuster</a>