View Single Post
  #100  
Old 01-12-2010, 01:33 PM
gr8dude gr8dude is offline
 
Join Date: Sep 2005
Location: Moldova
Posts: 12
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

UK Jimbo, I understood the cause of the problem.

The used regexp contains the ':' character (inside "http://").
When spambuster parses the rules, it uses ':' as a separator to explode the string into bits, ex:
Code:
regexp:a_nike_dunk:5:any:"/nike dunk/i"
Basically, the culprit is in this line of the sb_parse_config function:
Code:
// split up the line
$bits = preg_split('/(?<!\\\\):/',$rule);
This explains why the regexp is not found in the string - the regexp is simply not the one we thought we were using.

Suggested fix:
Use '::' as a separator in each rule, rather than ':'.

Replace
Code:
// split up the line
$bits = preg_split('/(?<!\\\\):/',$rule);
With
Code:
// split up the line
$bits = preg_split('/(?<!\\\\)::/',$rule);
Note: regular expressions are quite difficult to wrap one's brain around (-: Especially if I haven't written it myself, or if the regexp was written a long time ago. I didn't try to parse this regexp myself, I just guessed what it does and added another ':' and prayed for the best (-:

I don't know what the rationale behind that choice was, but isn't it easier to use this instead? It is straightforward and requires no belief in external forces :-)
Code:
$bits = explode('::', $rule);
My local tests are successful, below I have included the source of the file I am using for local tests. It is a stripped down version of Spambuster, the rules and the tested post are taken straight from this files, so you can make quick tests without involving VBulletin itself.


Can you share your thoughts on this, and perhaps update Spambuster to include this fix?



Code:
<?php


function sb_parse_config() {

	
	$rules_raw = <<<TEST
#Anything below this is added by Alex Railean
#regexp:a_jewels:5:any:"/^jewel/i"
#regexp:a_runescape:5:any:"/runescape/i"
#power leveling with spaces between words
#regexp:a_wow_powerleveling:10:any:"/power\s*leveling/i"
#regexp:a_wow_gold:5:any:"/gold/i"
#regexp:a_nike_dunk:5:any:"/nike dunk/i"
#regexp:a_hot:5:any:"/hot/i"
#regexp:a_manga:5:any:"/manga/i"
#regexp:a_jordan_shoes:5:any:"/jordan/i"
regexp::a_url::5::any::"/.*http:\/\//i"
regexp::a_gay::6::any::"/gay/i"
regexp::a_not_too_many_links::15::any::"/(http:\/\/.*){3,}/i"	
TEST;
	
	
	$rules = preg_split('/\r?\n/', $rules_raw);
	//echo var_export($rules, TRUE);
	
	$log = fopen("spam.log", "a");
    fwrite($log, strftime ("%c", time()) . "\n");
	$tmp =  var_export($rules, TRUE);
	fwrite($log, $tmp);
	
	$data = array();

	foreach($rules as $rule) {

		// comment lines
		if( strpos($rule,'#') === 0 )
			continue;

		// split up the line
		$bits = preg_split('/(?<!\\\\)::/',$rule);
		$bits2 = explode('::', $rule);
		fwrite($log, var_export($bits, TRUE));
		fwrite($log, "----\n");
		fwrite($log, var_export($bits2, TRUE));
		fwrite($log, $rule);

		// need the right number of arguements
		if( count($bits) < 5 )
			continue;

		for($i=0;$i<count($bits);$i++) {
			if( preg_match('/^"(.*)"$/', $bits[$i], $m) )
				$bits[$i] = $m[1];
		}


		$test=array();
		$test['type'] = array_shift($bits);
		$test['name'] = array_shift($bits);
		$test['score'] = array_shift($bits);
		$test['field'] = array_shift($bits);
		$test['data'] = $bits;

		$data[ $test['name'] ] = $test;
	}
	
	 $tmp = var_export( $data, TRUE );
	 echo $tmp."<br><br><br>";

	return $data;
}

//echo sb_parse_config();


// used to perform the test on the post
// function sb_test(&$obj,$table=null) {
function sb_test() {

	$hits=array();

	// no need to worry about most posts
	// if( $GLOBALS['vbulletin']->userinfo['posts'] > $GLOBALS['vbulletin']->options['spambusterpostcount'] )
		// return false;

	// parts of the post
	$req = array();
	// $req['title'] = $obj->fetch_field('title',$table);
	// $req['body'] = $obj->fetch_field('pagetext',$table);
	// $req['any'] = $req['title'] ."\n". $req['body'];
	$req['title'] = "title";
	$req['body'] = "nike tn (http://www.nikemaxtn.com/)    chaussure nike  (http://www.nikemaxtn.com/)   nike (http://www.nikemaxtn.com/)";
	$req['any'] = $req['title'] ."\n". $req['body'];

	// fetch the list of tests
	$tests = sb_parse_config();
	// run each test
	foreach($tests as $test) {
		//echo "#TEST  ".var_export($test). "<br>";
		echo "TEXT ".$req[ $test['field'] ]."<br>";

		$test_pass=false;

		// regular expression test
		if( $test['type'] == 'regexp' ) {
			echo "RGXP ". $test['data'][0]."<br>";
			$test_pass = @preg_match($test['data'][0],$req[ $test['field'] ]);
			echo "RSLT [".$test_pass."]<br><br>"; 
			//echo $req[ $test['field'] ] ."  ". $test_pass. "<br>";
		}

		// record the test if it was a hit
		if( $test_pass ) {
			$hits[ $test['name'] ] = $test['score'];
			$hits['total'] += $test['score'];
		}
	}

	return $hits;
}

sb_test();

?>
It looks messy, I apologize for that; PHP is simply not my cup of tea...
Reply With Quote
 
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01139 seconds
  • Memory Usage 1,800KB
  • Queries Executed 11 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD_SHOWPOST
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (6)bbcode_code
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_box
  • (1)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (1)post_thanks_postbit_info
  • (1)postbit
  • (1)postbit_onlinestatus
  • (1)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • reputationlevel
  • showthread
Included Files:
  • ./showpost.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showpost_start
  • bbcode_fetch_tags
  • bbcode_create
  • postbit_factory
  • showpost_post
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • showpost_complete