vb.org Archive - View Single Post - substr is BAD! fetch_trimmed_title is GOOD!

Hubbitus · #2 07-07-2008, 02:07 PM

Quote:

Originally Posted by Makc666

For example Russian letter in UTF-8 will look like:

PHP Code:


			
&_236;

So when you type some word in Russian which consists of 5 letter it will look like:

PHP Code:


			
&_236;&_236;&_236;&_236;&_246;

Excuse me, where it look like this?
In UTF8 any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. So, dependency of character it may be represent from 1 (first 127 characters ASCII) to 4 bytes. Russian Cyrillic characters in UTF-8 represents by 2 byte per character.

Your example is strange, and I can't represent it on my system. I'm guess what you meant HTML mnemonic entities, but in this case mnemonic must look like

PHP Code:


			
&#236;

not like

PHP Code:


			
&_236;

Quote:

Originally Posted by Makc666

And in this way it is storied in databse.

If my guess about html-entities is true, this behaviour you get only with mismatching used character encodings. For example, on vbulletin board you use UTF8 character encoding, but stores information in database (or tables, or may by only row) with another, single-byte charset, for example windows-1251 (CP1251, microsoft-1251). In this case, this transformation made by browser, to allow store characters, which are not be acceptable by form (if charsets correctly provided in HTML-code).

On my board on UTF-8 with database collation utf8_general_ci I'm haven't this troubles - all Russian and many more characters saved as is, without similar transformation.

Quote:

Originally Posted by Makc666

When you use substr(); function you cut by symbols and not by letters!

So, about HTML-entities I'm wrote above. But we still has problem on not ASCII characters, which are be represent more then 1 byte per character. PHP function substr() do not handle characters (or symbols if you wish), it is truncate string by amount of bytes (keep in mind it is same in ASCII)!
So, if you wish cut by characters, you just may use multibyte functions like mb_substr() (which comes from mbstring extension) or iconv_substr() (this is from "more standard" and widely distributed iconv).

So, if none of this extension available, in multibyte strings, you may safely emulate substr by regular expression on this manner:

PHP Code:


			
$getstats_starter['username'] = preg_replace('#(.{0,' . (int)$trimusername . '}.*?)[^\pL].*#u', '\\1', $getstats_starter['username']);

That's all!! You don't need bulky functions!

Or, for clarity, you may wrap it in function, if wish:

PHP Code:


			
function pcre_trim ($title, $chars = 70){
return preg_replace('#(.{0,' . (int)$chars . '}.*?)[^\pL].*#u', '\\1', $title);
}

//Anywhere after, use it:
$getstats_starter['username'] = pcre_trim($getstats_starter['username'], $trimusername);

X vBulletin 3.8.12 by vBS Debug Information
Page Generation 0.01822 seconds Memory Usage 1,793KB Queries Executed 11 (?)
More Information
Template Usage: (1)SHOWTHREAD_SHOWPOST (1)ad_footer_end (1)ad_footer_start (1)ad_header_end (1)ad_header_logo (1)ad_navbar_below (6)bbcode_php (3)bbcode_quote (1)footer (1)gobutton (1)header (1)headinclude (6)option (1)post_thanks_box (1)post_thanks_button (1)post_thanks_javascript (1)post_thanks_navbar_search (1)post_thanks_postbit_info (1)postbit (1)postbit_onlinestatus (1)postbit_wrapper (1)spacer_close (1)spacer_open Phrase Groups Available: global postbit reputationlevel showthread	Included Files: ./showpost.php ./global.php ./includes/init.php ./includes/class_core.php ./includes/config.php ./includes/functions.php ./includes/class_hook.php ./includes/modsystem_functions.php ./includes/functions_bigthree.php ./includes/class_postbit.php ./includes/class_bbcode.php ./includes/functions_reputation.php ./includes/functions_post_thanks.php Hooks Called: init_startup init_startup_session_setup_start init_startup_session_setup_complete cache_permissions fetch_postinfo_query fetch_postinfo fetch_threadinfo_query fetch_threadinfo fetch_foruminfo style_fetch cache_templates global_start parse_templates global_setup_complete showpost_start bbcode_fetch_tags bbcode_create postbit_factory showpost_post postbit_display_start post_thanks_function_post_thanks_off_start post_thanks_function_post_thanks_off_end post_thanks_function_fetch_thanks_start post_thanks_function_fetch_thanks_end post_thanks_function_thanked_already_start post_thanks_function_thanked_already_end fetch_musername postbit_imicons bbcode_parse_start bbcode_parse_complete_precache bbcode_parse_complete postbit_display_complete post_thanks_function_can_thank_this_post_start showpost_complete
Messages: