Quote:
Originally Posted by derfelix
Chinese sentences are written with no special delimiters such as space to indicate word boundaries. Existing Chinese NLP systems therefore employ preprocessors to segment sentences into words.
|
Yes, that was one of the things I discovered too, when I looked into this some time ago. There has been a very helpful and knowledgable Chinese user (ItsBlack) on this forum (he has done all the Chinese translations) - maybe he will spot this post and comment.
I will look at the keyword problem
Edited:
yes, of course, there needs to be a fourth line:
Code:
$find[] = '/^(' . $w . '$)/iu';
this is all because the special utf8 regex characters do not map neatly onto \b, as far as I can tell - \b matches at start and end of line which the utf8 specials do not.