Hi Michael,
I couldn't get hold of you on MSN Messenger so I've penned down some thoughts/questions here:
New code
I like the style, comments and fact that it's been called from PHPINCLUDE_START good work
Blacklists
What do you think the best way of storing the blacklist(s) and making it editable is? I wonder if a phrase would be a good plan (even if it was managed through a custom part of the admincp rather than the phrase manager).
If we wanted to store them as files and be able to write to them via the web interface the files would have to be world or web user writable.
It'd be easy to setup a scheduled task to drag down the latest copy of the MT blacklist.
Multiple fields
What are your thoughts on breaking down the fields passed into the "spam engine"? I'm thinking along the lines of the way that the second version of spamBuster was able to have rules relating to the body text or the subject. Username might be anoter field worth matching against - lots of spammers seem to use the recipe [username][number] like robby34. Perhaps something to worry about later.
Are you happy with me going ahead and writing a lower level library that does the spam processing and leaving some of the vBulletin integration (admincp code) to you?
Made some changes to the code in the file handling and character substitution
PHP Code:
<?php
/*======================================================================*\
|| #################################################################### ||
|| # VbSpamicide # ||
|| # Developed by James Cohen & Michael Morris # ||
|| # Alpha Day 2 # ||
|| # ---------------------------------------------------------------- # ||
|| # ---------------- VBULLETIN IS NOT FREE SOFTWARE ---------------- # ||
|| # http://www.vbulletin.com | http://www.vbulletin.com/license.html # ||
|| #################################################################### ||
\*======================================================================*/
// This script gets called by PHPINCLUDE_START, so check to see if the
// DB is initialized before running.
if (!is_object($DB_site))
{
echo 'You can\'t access this file directly';
exit;
}
// Define our settings. Later we'll use variables from the $vboptions group
// To perform these assignements. For now let's tag them into the $vboptions
// array manually.
$vboptions['systemuserid'] = 2; // This is the user id for the system auto poster.
$vboptions['systemusername'] = 'Messageboard Golem'; // This is the name of the auto poster.
$vboptions['usetachy'] = false; // If set true the system adds the user to tachy goes
// to coventry. Tachy needs to be hardened for this to be
// effective. For now we'll concentrate on banning.
$vboptions['spambangroup'] = 8; // The usergroup spammers go to.
$vboptions['reportforum'] = 4; // For now use a forum for spam reports. Later make this
// an option.
// Transfer the $_POST data to the variables we want to work with as necessary. This code is identical
// to newthread.php, newreply.php and editpost.php
if (isset($_POST['WYSIWYG_HTML']))
{
require_once('./includes/functions_wysiwyg.php');
$spamcheck['message'] = convert_wysiwyg_html_to_bbcode($_POST['WYSIWYG_HTML'], $foruminfo['allowhtml']);
}
else
{
$spamcheck['message'] = &$_POST['message'];
}
// Grab the blacklist text file.
$blacklist = file('blacklist.txt');
// split it into an array broken down line by line (windoze friendly)
$spamlist = preg_split('/\r?\n/', $blacklist);
// Now grab the blacklist template. This template will include user defined url's seperate from
// the master blacklist.
eval('$blacklist = "' . fetch_template('mtblacklist') . '";');
// Explode it as well.
$localspamlist = preg_split('\r?\n', $blacklist);
// Merge the lists.
$spamlist = array_merge($spamlist, $localspamlist);
// Use a foreach loop to iterate over the spamlist.
foreach ($spamlist as $spam)
{
// Chop off comment text at the end of some lines as necessary
$spam = preg_replace('/^(.*)#.*?$/', '$1', $spam);
// Check if the line is now blank because of the above operation, and if so, skip it.
if (!strstr($spam, '#') AND strlen($spam) != 0)
{
// Now use a regluar expression to check known for URL's off the blacklist.
if ( preg_match('/'. trim($spam) .'/i', $spamcheck['message']))
{
// Ok, true. For now we will go ahead and report the post in a designated forum.
// Later we will choose from a number of branch actions.
// Grab the forum info for the report post forum
$report_foruminfo = fetch_foruminfo($vboptions['reportforum']);
// Create a report post array.
$reportpost = array(
'username' => $vboptions['systemusername'],
'userid' => $vboptions['systemuserid'],
'title' => 'Spam Alert: ' . trim(htmlspecialchars_uni($_POST['subject'])),
'emailupdate' => 9999
);
// This template isn't cached. It's used so rarely will it need to be?
eval('$reportpost[message] = "' . fetch_template('spam_alert') . '";');
// Call the library containing function build new post
require_once('./includes/functions_newpost.php');
// Call build new post and make the report.
build_new_post('thread', $report_foruminfo, array(), 0, $reportpost, $errors);
// Now begin the banning proceedure.
// check to see if there is already a ban record for this user in the userban table
if ($check = $DB_site->query_first("SELECT userid, liftdate FROM " . TABLE_PREFIX . "userban WHERE userid = $bbuserinfo[userid]"))
{
// there is already a record - just update this record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "userban SET
adminid = $vboptions[systemuserid],
bandate = " . TIMENOW . ",
liftdate = 0
WHERE userid = $bbuserinfo[userid]
");
}
else
{
// insert a record into the userban table
$DB_site->query("
INSERT INTO " . TABLE_PREFIX . "userban
(userid, usergroupid, displaygroupid, customtitle, usertitle, adminid, bandate, liftdate)
VALUES
($bbuserinfo[userid], $bbuserinfo[usergroupid], $bbuserinfo[displaygroupid], $bbuserinfo[customtitle], '" . addslashes($bbuserinfo['usertitle']) . "', $vboptions[systemuserid], " . TIMENOW . ", 0)
");
}
// update the user record
$DB_site->query("
UPDATE " . TABLE_PREFIX . "user SET
$bantitlesql
usergroupid = $vboptions[spambangroup],
displaygroupid = $vboptions[spambangroup]
WHERE userid = $bbuserinfo[userid]
");
// Now parse some global templates which haven't been called yet (we arrive here from
// PHPINCLUDE_START
eval('$timezone = "' . fetch_template('timezone') . '";');
eval('$gobutton = "' . fetch_template('gobutton') . '";');
eval('$spacer_open = "' . fetch_template('spacer_open') . '";');
eval('$spacer_close = "' . fetch_template('spacer_close') . '";');
// parse headinclude, header & footer
eval('$headinclude = "' . fetch_template('headinclude') . '";');
eval('$header = "' . fetch_template('header') . '";');
eval('$footer = "' . fetch_template('footer') . '";');
// Inform the user that they've been spam banned.
eval(print_standard_error('error_nospam'));
}
}
}
?>
I've used preg_split() in place of explode() to make the file splitting windows friendly.
The comments should be replaced out using preg_replace()
The main regular expression tests are done using preg_match() which I think in a lot of cases is faster than eregi()
Looking at those changes who'd guess I've developed in Perl a fair bit? :ermm:
I've not tested this code the line I'm most dubious about is the comment removing code.