Thread: Major Additions - Links and Downloads Manager
View Single Post
  #995  
Old 08-30-2008, 06:43 AM
derfelix derfelix is offline
 
Join Date: Nov 2001
Posts: 204
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by AndrewD View Post
Thanks, Felix. Indeed the problem is/was the word boundary. The drawback with removing the word boundary markers is that you end up highlighting substrings in the results which the search itself did not match.
For example, suppose you have a string "happily merrily sadly happilymerrilysadly" and you do a search for merrily

This should highlight as "happily merrily sadly happilymerrilysadly"

and it does with the word boundary flags in the regex.

But without them, it highlights as "happily merrily sadly happilymerrilysadly"

So we need to solve the word boundary problem in utf8.
Well then.. I am happy.. then it is actually a feature..

if you search for "intern" in google.. in the description and the title, words like international or internal or internship are highlighted!!!!

i was going to anyway modify the search from "word" to "*word*" because if i do a search for "luxury" and only have one entry with the word "luxuryhotels" in description.. i would get no results..it would not show up.. in that case at least the highlighting would allready be done..
---------------------
on the otherhand.. using ldm as is.. it is also not a major drawback:
if you are looking for merrily ... it will only show you results where the word "merrily" is standalone... so you do have the correct results.. and if you have an extra sadlymerrilysadly then only it will be highlighted.. wich i think is a feature!!!
---------------------

so if it is the only drawback.. i'm sticking to that solution, especially as php6 is going to have full unicode support.. and I am ready to bet that in php6 this problem will be solved!!

But at least for the moment adding the /u modifier (making it /iu) to the regex will help for languages like german, french or spanish as the highlighting will work as you expect it..


Felix
PS: just seen your edit.. doing testing now!

[EDIT]
just tested your routine... works fine with description....(not working with keywords) hmmm

BUT with chinese there is another problem... did some reading (i do not understand chinese)
i was trying to extract content to use as description.. thats how i stumbled into this article:
it says
Quote:
Chinese sentences are written with no special delimiters such as space to indicate word boundaries. Existing Chinese NLP systems therefore employ preprocessors to segment sentences into words.
source: http://portal.acm.org/citation.cfm?id=981621

if this is true i think that the "no boundary" version will for the moment be the easiest solution...for chinese
Reply With Quote
 
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01172 seconds
  • Memory Usage 1,770KB
  • Queries Executed 11 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD_SHOWPOST
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (2)bbcode_quote
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_box
  • (1)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (1)post_thanks_postbit_info
  • (1)postbit
  • (1)postbit_onlinestatus
  • (1)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • reputationlevel
  • showthread
Included Files:
  • ./showpost.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showpost_start
  • bbcode_fetch_tags
  • bbcode_create
  • postbit_factory
  • showpost_post
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • showpost_complete