Quote:
Originally posted by fastforward
This is something I did not take into account. The index routine does indeed strip more than the default vB routine. This was done purposely to eliminate some of the extraneous crap that vB indexes. Obviously that wasn't such a smart move where extended character sets are required. I'll make sure the next version takes it into account.
If you know a bit about regular expressions, it shouldn't be too hard to fix the code.
|
Sorry, my coding sucks without instructions

I guess it have something to do with this subroutine?
(or sub wordsonly/sub remove_bb_code)
# index post body
$pagetext =~ s/^\[q[1-9]\]>+.*$//go; # remove all quoted stuff
if (length($pagetext) < 10000) {
my $text = remove_bb_code("$pagetext");
$text = wordsonly("$text");
my @words = split(/\s+/,$text);
my $words_sel="";
foreach my $word (@words) {
if ($word && ((length($word) >= $vbconfig{minsearchlength})) && ((length($word) <= $vbconfig{maxsearchlength}))) {
$word = $dbh->quote($word);
$words_sel .= "$word,";
db_execute("INSERT IGNORE INTO word (title) VALUES ($word)");
}
}
chop $words_sel;
if ($words_sel) {
my $wordids = db_fetch("SELECT wordid FROM word WHERE title in ($words_sel)");
while (my $wid = $wordids->fetchrow_array) {
db_execute("INSERT IGNORE INTO searchindex (wordid,postid,intitle) VALUES ($wid,$id,0)");
}
}
} else {
console(" *-> Post $id skipped... (too long)\n");
}
}
Or this?
sub remove_bb_code {
my $text = $_[0];
my ($bbo,$bbc);
my $bbcodes = db_fetch("SELECT bbcodetag FROM bbcode");
while (my $bbcode = $bbcodes->fetchrow_array) {
$bbo=quotemeta("[".$bbcode."]");
$bbc=quotemeta("[/".$bbcode."]");
$text =~ s/$bbo|$bbc//gi; # easy stuff
}
$text =~ s/"|<|>/ /gsio;
$text =~ s/&|<br>|<(\/)?body>|<p>|<(\/)?html>//gsoi;
$text =~ s/\[size=[0-9]+\]|\[\/size\]//ig; # size
$text =~ s/\[color=(\"\#)?[A-Za-z0-9]+(\")?\]|\[\/color\]//ig; # color
$text =~ s/\[url(=)?(")?//ig;
$text =~ s/(\")?\](.+)\[\/url\]/$2/gi;
$text =~ s/\[email(=)?(\")?//ig;
$text =~ s/(\")?\](.+)\[\/email\]/$2/gi;
$text =~ s/\[font=(\"\#)?[A-Za-z]+(\")?\]|\[\/font\]//ig; # font
$text =~ s/\[list(=)?[1Aa]?\]|\[\/list(=)?[1Aa]?\]//ig; # list
$text =~ s/\[\*\]/ - /ig;
$text =~ s/\[(\/)?code\]//ig;
return $text;
}
Can you briefly tell me what to seek, so I could try to fix it.
Regards,
Joop