UK Jimbo, I understood the cause of the problem.
The used regexp contains the ':' character (inside "http://").
When spambuster parses the rules, it uses ':' as a separator to explode the string into bits, ex:
Code:
regexp:a_nike_dunk:5:any:"/nike dunk/i"
Basically, the culprit is in this line of the
sb_parse_config function:
Code:
// split up the line
$bits = preg_split('/(?<!\\\\):/',$rule);
This explains why the regexp is not found in the string - the regexp is simply not the one we thought we were using.
Suggested fix:
Use '::' as a separator in each rule, rather than ':'.
Replace
Code:
// split up the line
$bits = preg_split('/(?<!\\\\):/',$rule);
With
Code:
// split up the line
$bits = preg_split('/(?<!\\\\)::/',$rule);
Note: regular expressions are quite difficult to wrap one's brain around (-: Especially if I haven't written it myself, or if the regexp was written a long time ago. I didn't try to parse this regexp myself, I just guessed what it does and added another ':' and prayed for the best (-:
I don't know what the rationale behind that choice was, but isn't it easier to use this instead? It is straightforward and requires no belief in external forces :-)
Code:
$bits = explode('::', $rule);
My local tests are successful, below I have included the source of the file I am using for local tests. It is a stripped down version of Spambuster, the rules and the tested post are taken straight from this files, so you can make quick tests without involving VBulletin itself.
Can you share your thoughts on this, and perhaps update Spambuster to include this fix?
Code:
<?php
function sb_parse_config() {
$rules_raw = <<<TEST
#Anything below this is added by Alex Railean
#regexp:a_jewels:5:any:"/^jewel/i"
#regexp:a_runescape:5:any:"/runescape/i"
#power leveling with spaces between words
#regexp:a_wow_powerleveling:10:any:"/power\s*leveling/i"
#regexp:a_wow_gold:5:any:"/gold/i"
#regexp:a_nike_dunk:5:any:"/nike dunk/i"
#regexp:a_hot:5:any:"/hot/i"
#regexp:a_manga:5:any:"/manga/i"
#regexp:a_jordan_shoes:5:any:"/jordan/i"
regexp::a_url::5::any::"/.*http:\/\//i"
regexp::a_gay::6::any::"/gay/i"
regexp::a_not_too_many_links::15::any::"/(http:\/\/.*){3,}/i"
TEST;
$rules = preg_split('/\r?\n/', $rules_raw);
//echo var_export($rules, TRUE);
$log = fopen("spam.log", "a");
fwrite($log, strftime ("%c", time()) . "\n");
$tmp = var_export($rules, TRUE);
fwrite($log, $tmp);
$data = array();
foreach($rules as $rule) {
// comment lines
if( strpos($rule,'#') === 0 )
continue;
// split up the line
$bits = preg_split('/(?<!\\\\)::/',$rule);
$bits2 = explode('::', $rule);
fwrite($log, var_export($bits, TRUE));
fwrite($log, "----\n");
fwrite($log, var_export($bits2, TRUE));
fwrite($log, $rule);
// need the right number of arguements
if( count($bits) < 5 )
continue;
for($i=0;$i<count($bits);$i++) {
if( preg_match('/^"(.*)"$/', $bits[$i], $m) )
$bits[$i] = $m[1];
}
$test=array();
$test['type'] = array_shift($bits);
$test['name'] = array_shift($bits);
$test['score'] = array_shift($bits);
$test['field'] = array_shift($bits);
$test['data'] = $bits;
$data[ $test['name'] ] = $test;
}
$tmp = var_export( $data, TRUE );
echo $tmp."<br><br><br>";
return $data;
}
//echo sb_parse_config();
// used to perform the test on the post
// function sb_test(&$obj,$table=null) {
function sb_test() {
$hits=array();
// no need to worry about most posts
// if( $GLOBALS['vbulletin']->userinfo['posts'] > $GLOBALS['vbulletin']->options['spambusterpostcount'] )
// return false;
// parts of the post
$req = array();
// $req['title'] = $obj->fetch_field('title',$table);
// $req['body'] = $obj->fetch_field('pagetext',$table);
// $req['any'] = $req['title'] ."\n". $req['body'];
$req['title'] = "title";
$req['body'] = "nike tn (http://www.nikemaxtn.com/) chaussure nike (http://www.nikemaxtn.com/) nike (http://www.nikemaxtn.com/)";
$req['any'] = $req['title'] ."\n". $req['body'];
// fetch the list of tests
$tests = sb_parse_config();
// run each test
foreach($tests as $test) {
//echo "#TEST ".var_export($test). "<br>";
echo "TEXT ".$req[ $test['field'] ]."<br>";
$test_pass=false;
// regular expression test
if( $test['type'] == 'regexp' ) {
echo "RGXP ". $test['data'][0]."<br>";
$test_pass = @preg_match($test['data'][0],$req[ $test['field'] ]);
echo "RSLT [".$test_pass."]<br><br>";
//echo $req[ $test['field'] ] ." ". $test_pass. "<br>";
}
// record the test if it was a hit
if( $test_pass ) {
$hits[ $test['name'] ] = $test['score'];
$hits['total'] += $test['score'];
}
}
return $hits;
}
sb_test();
?>
It looks messy, I apologize for that; PHP is simply not my cup of tea...