Go Back   vb.org Archive > vBulletin 3 Discussion > vB3 General Discussions
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 01-20-2007, 12:37 AM
Makc666's Avatar
Makc666 Makc666 is offline
 
Join Date: Dec 2002
Location: MSK-RU
Posts: 392
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default substr is BAD! fetch_trimmed_title is GOOD!

A little note for all coders.

I was inspecting MOD:
Cyb - Advanced Forum Statistics
https://vborg.vbsupport.ru/showthread.php?t=122986

And saw that php's substr(); function is used there.
For example:
PHP Code:
$getstats_starter[username] = substr($getstats_starter[username], 0$trimusername) . '...'
substr(); is a bad one for cutting words up to set limit.

You have to use fetch_trimmed_title(); function which is built-in vbulletin.

You asked why?
1-st, VB's uses this function to cut thread titles.
2-nd, most forum uses UTF8.

For example Russian letter in UTF-8 will look like:
PHP Code:
&_236
(you have to replace _ with #)
So when you type some word in Russian which consists of 5 letter it will look like:
PHP Code:
&_236;&_236;&_236;&_236;&_246
(you have to replace _ with #)
And in this way it is storied in databse.
So it it 30 symbols if you count.

When you use substr(); function you cut by symbols and not by letters!

So if you set to with substr(); function to cut after 27 letter and you will cut the example above you will get:
PHP Code:
&_236;&_236;&_236;&_236;&_2 
As you see the last letter in UTF-8 format was cuted.

And when this one will be displayed on the webpage you will see a bug.

That is why you have to use fetch_trimmed_title(); function.
As it cuts only whole words.
fetch_trimmed_title() relies on spaces when cutting.

If me return to
Cyb - Advanced Forum Statistics
https://vborg.vbsupport.ru/showthread.php?t=122986

in all code I replaced lines like:
PHP Code:
$getstats_starter[username] = substr($getstats_starter[username], 0$trimusername) . '...'
with a good one:

PHP Code:
$getstats_starter[username] = fetch_trimmed_title($getstats_starter[username], $trimusername); 
Here how this function looks like in:
functions.php

PHP Code:
// #############################################################################
/**
* Trims a string to the specified length while keeping whole words
*
* @param        string  String to be trimmed
* @param        integer Number of characters to aim for in the trimmed string
*
* @return       string
*/ 
Reply With Quote
  #2  
Old 07-07-2008, 03:07 PM
Hubbitus Hubbitus is offline
 
Join Date: Nov 2005
Location: Russia
Posts: 6
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Makc666 View Post
For example Russian letter in UTF-8 will look like:
PHP Code:
&_236
So when you type some word in Russian which consists of 5 letter it will look like:
PHP Code:
&_236;&_236;&_236;&_236;&_246
Excuse me, where it look like this?
In UTF8 any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. So, dependency of character it may be represent from 1 (first 127 characters ASCII) to 4 bytes. Russian Cyrillic characters in UTF-8 represents by 2 byte per character.

Your example is strange, and I can't represent it on my system. I'm guess what you meant HTML mnemonic entities, but in this case mnemonic must look like
PHP Code:
ì 
not like
PHP Code:
&_236
Quote:
Originally Posted by Makc666 View Post
And in this way it is storied in databse.
If my guess about html-entities is true, this behaviour you get only with mismatching used character encodings. For example, on vbulletin board you use UTF8 character encoding, but stores information in database (or tables, or may by only row) with another, single-byte charset, for example windows-1251 (CP1251, microsoft-1251). In this case, this transformation made by browser, to allow store characters, which are not be acceptable by form (if charsets correctly provided in HTML-code).

On my board on UTF-8 with database collation utf8_general_ci I'm haven't this troubles - all Russian and many more characters saved as is, without similar transformation.

Quote:
Originally Posted by Makc666 View Post
When you use substr(); function you cut by symbols and not by letters!
So, about HTML-entities I'm wrote above. But we still has problem on not ASCII characters, which are be represent more then 1 byte per character. PHP function substr() do not handle characters (or symbols if you wish), it is truncate string by amount of bytes (keep in mind it is same in ASCII)!
So, if you wish cut by characters, you just may use multibyte functions like mb_substr() (which comes from mbstring extension) or iconv_substr() (this is from "more standard" and widely distributed iconv).

So, if none of this extension available, in multibyte strings, you may safely emulate substr by regular expression on this manner:
PHP Code:
$getstats_starter['username'] = preg_replace('#(.{0,' . (int)$trimusername '}.*?)[^\pL].*#u''\\1'$getstats_starter['username']); 
That's all!! You don't need bulky functions!

Or, for clarity, you may wrap it in function, if wish:
PHP Code:
function pcre_trim ($title$chars 70){
return 
preg_replace('#(.{0,' . (int)$chars '}.*?)[^\pL].*#u''\\1'$title);
}

//Anywhere after, use it:
$getstats_starter['username'] = pcre_trim($getstats_starter['username'], $trimusername); 
Reply With Quote
  #3  
Old 07-08-2008, 04:57 AM
Makc666's Avatar
Makc666 Makc666 is offline
 
Join Date: Dec 2002
Location: MSK-RU
Posts: 392
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Hubbitus View Post
Excuse me, where it look like this?
In UTF8 any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. So, dependency of character it may be represent from 1 (first 127 characters ASCII) to 4 bytes. Russian Cyrillic characters in UTF-8 represents by 2 byte per character.

Your example is strange, and I can't represent it on my system. I'm guess what you meant HTML mnemonic entities, but in this case mnemonic must look like
PHP Code:
ì 
not like
PHP Code:
&_236
If you try to quote your post or my post here with
PHP Code:
ì 
you will get a letter instead of code...

Try to post something like:
PHP Code:
&_236;&_236;&_236;&_236;&_236;&_236
(you have to replace _ with #)

You will get this one:
PHP Code:
ìììììì 
Reply With Quote
  #4  
Old 08-29-2008, 06:39 AM
kotlt99 kotlt99 is offline
 
Join Date: Aug 2006
Posts: 87
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

With VBB 3.7 , Has it effect ?
Reply With Quote
  #5  
Old 08-29-2008, 07:39 AM
Dismounted's Avatar
Dismounted Dismounted is offline
 
Join Date: Jun 2005
Location: Melbourne, Australia
Posts: 15,047
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

This post will apply regardless of vBulletin version - it will apply as long the the substr() function behaves like it currently does.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 01:03 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.04000 seconds
  • Memory Usage 2,241KB
  • Queries Executed 13 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (18)bbcode_php
  • (4)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (5)post_thanks_box
  • (5)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (5)post_thanks_postbit_info
  • (5)postbit
  • (5)postbit_onlinestatus
  • (5)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete