PDA

View Full Version : [SOLVED] Arabic encoding with preg_match_all


omardealo
11-16-2014, 12:20 AM
HELLO ,

i try discovery of banned words when users posted new post , but i have a problem only Only the discovery of English words, I think that the problem is in the Arabic language encoding , I tried to solve the problem by iconv("windows-1256", "utf-8",$string ); but don't work , Are there any suggestions ?


$wordss = "هالو|مرحبا|google.com";
$bwords = explode("|", $wordss);
//$string = $vbulletin->GPC['message'];
$string = 'BLA BLA مرحبا BLA BLA google.com BLA BLA BLA ';
$matchFound = preg_match_all(
"/\b(" . implode($bwords,"|") . ")\b/i",
$string,
$matches
);
$words = array_unique($matches[0]);
print_r($words);


output : google.com
but Must be : google.com,مرحبا

kh99
11-16-2014, 01:40 AM
Maybe try putting a u at the end of your pattern string:
"/\b(" . implode($bwords,"|") . ")\b/iu"

to tell it to use unicode strings.

omardealo
11-16-2014, 02:09 AM
Maybe try putting a u at the end of your pattern string:
"/\b(" . implode($bwords,"|") . ")\b/iu"

to tell it to use unicode strings.

yes sir , i try this pattern already
i change /i to /iu
and try it on Different places
- on online external php file by [/iu] only - > works good
- on localhost vbulletin plugin by [/i] only - > works good
but ..
- on localhost external php file - > don't work
- on online vbulletin plugin - > don't work

so .. i Become confused :erm: , i don't know what's the wrong

--------------- Added 1416112678 at 1416112678 ---------------

UPDATE :
when i convert php files to encoding ANSI , Results appear in Arabic by pattern "/\b(" . implode($bwords,"|") . ")\b/i"
but on plugin how i solve this problem ?

kh99
11-16-2014, 01:56 PM
I can't get it to work on my test system either, so I'm afraid I'm stumped. I googled to try to find an answer, but the only thing i found was something that mentioned that it's possible that some versions of php don't handle UTF-8 matching correctly.

omardealo
11-16-2014, 02:15 PM
I can't get it to work on my test system either, so I'm afraid I'm stumped. I googled to try to find an answer, but the only thing i found was something that mentioned that it's possible that some versions of php don't handle UTF-8 matching correctly.

yeah I also looked very much on google, thank you
But I do not think this is the reason [php versions] , because the code work well in an external file on the same site withot encoding it but on vb plugin don't work .
anyway , can i do what i want by another way ? matching banned words and print it with no problem with the Arabic words .

--------------- Added 1416157331 at 1416157331 ---------------

UPDATE :
I FOUND THE Solution :

\b detects word boundaries, remove them to get a regular match.


JUST USE pattern

"/(" . implode($bwords,"|") . ")/i "

THANX , kh99