vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vB3 Programming Discussions (https://vborg.vbsupport.ru/forumdisplay.php?f=15)
-   -   Converting Special Chars from HTML to UTF-8 ascii standard? (https://vborg.vbsupport.ru/showthread.php?t=157193)

Kaelon 09-06-2007 07:52 PM

Converting Special Chars from HTML to UTF-8 ascii standard?
 
Hey there,

I'm using the AddonChat Integration Script and have been working with Chris Duerr, the author, to try and solve this problem: users that have special characters (such as accents, as in ? ? ? ? ?) are getting an invalid username/password notice. This is because vBulletin stores these special characters as HTML escape equivalents.

How can we convert the HTML escape characters to UTF-8 standard ascii characters?

Here is the code cited from the integration script:

Code:

<?php
  header("Content-type: text/plain; charset=iso-8859-1");
  error_reporting(E_ALL & ~E_NOTICE);
  define('NO_REGISTER_GLOBALS', 1);
  define('SESSION_BYPASS', 1);
  define('LOCATION_BYPASS', 1);
  //define('DIE_QUIETLY', 1);
   
  /*
      We lie a little here to let us get through when
      forum read privileges are disabled for non-registered
      users.
  */
  define('THIS_SCRIPT', 'login');   
  $_REQUEST['do'] = 'register';
  require_once('./global.php');     
  require_once('./chat_global.php');
 
  $username = $_REQUEST['username'];
  $password = $_REQUEST['password'];
 
  /*
      Uncomment the following to support non-ASCII UTF-8 characters
      Requires PHP Multibyte String (mbstring) Extension
  */
  $username = mb_convert_encoding($username, "HTML-ENTITIES", "UTF-8");
  $password = mb_convert_encoding($password, "HTML-ENTITIES", "UTF-8");
 
 
  if(!$SIGMACHAT_VB_AUTHENTICATE) die("DISABLED");
 
  # Fetch User Info from Database..
  $uid = 0;
  if ($userinfo = $db->query_first('SELECT userid, usergroupid, membergroupids, password, salt FROM ' . TABLE_PREFIX . 'user WHERE username = "' . addslashes(htmlspecialchars_uni($username)) . '"'))
  {
      # Invalid Password
    if (($userinfo['password'] != $password) && ($userinfo['password'] != md5(md5($password) . $userinfo['salt'])))   
          $auth = 0;   
    else
    {
  $usergroups = explode(',', $userinfo[membergroupids]);
  $usergroups[] = $userinfo[usergroupid];
       
  $auth = 0;
  foreach($usergroups as $ug)
  {
            if( ($auth < 3) && (in_array($ug, $SIGMACHAT_AUTH_GRANTACCESS)) ) $auth = 3;
  if( ($auth < 2) && (in_array($ug, $SIGMACHAT_AUTH_ADMINACCESS)) ) $auth = 2;
  if( ($auth < 1) && (in_array($ug, $SIGMACHAT_AUTH_ACCESS)) ) $auth = 1;
  if(in_array($ug, $SIGMACHAT_AUTH_NOACCESS)) { $auth = 0; break; }
  }
    $uid = $userinfo['userid'];
    }
  }
  else
  $auth = $SIGMACHAT_AUTH_GUEST;
   
   
  $result_string = "SCRAS^1.1\nAUTH^$auth\nUID^$uid\n";
 
  if($SIGMACHAT_ENABLE_LINK_PROFILE) $result_string .= "SITE_LINK^Profile^$SIGMACHAT_FORUM_URL/chat_func_profile.php\n";
  if($SIGMACHAT_ENABLE_LINK_ADDBUDDY) $result_string .= "SITE_LINK^Add Buddy^$SIGMACHAT_FORUM_URL/chat_func_addbuddy.php\n"; 
  if($SIGMACHAT_ENABLE_LINK_PM) $result_string .= "SITE_LINK^Prv. Message^$SIGMACHAT_FORUM_URL/chat_func_pm.php\n";
  if($SIGMACHAT_ENABLE_LINK_EMAIL) $result_string .= "SITE_LINK^eMail^$SIGMACHAT_FORUM_URL/chat_func_email.php\n"; 
  if($SIGMACHAT_ENABLE_LINK_FINDPOSTS) $result_string .= "SITE_LINK^Find Posts^$SIGMACHAT_FORUM_URL/chat_func_findposts.php\n"; 
  if($SIGMACHAT_ENABLE_LINK_FORUM_IGNORE) $result_string .= "SITE_LINK^Forum Ignore^$SIGMACHAT_FORUM_URL/chat_func_ignore.php\n";     
 
  print($result_string); 
 
?>

Update -- I've tried using html_entity_decode by calling as follows:

Code:

$username = html_entity_decode($username);
$password = html_entity_decode($password);

... where the "uncomment the following" comment is indicated in the above code. That didn't work, tragically.

Paul M 09-06-2007 08:22 PM

There is a function in vb called unhtmlspecialchars()

From the documentation ;

Code:

Returns a string where HTML entities have been converted back to their original characters

string unhtmlspecialchars (string $text, [boolean $doUniCode = false])

string $text: String to be parsed

boolean $doUniCode: Convert unicode characters back from HTML entities?


Kaelon 09-06-2007 08:50 PM

Thanks, Paul! However, that didn't seem to work. I added:

Code:

        $username = unhtmlspecialchars($username);
        $password = unhtmlspecialchars($password);

... to the previous mb_convert_encoding command-lines, and I was still getting invalid returns from the system. Judging by the code above, is there a more sensible place to convert the unhtmlspecialchars to validate this? Thanks!

Kaelon 09-09-2007 04:21 PM

Latest information from Chris Duerr, the original hack author:

Quote:

Originally Posted by cduerr
I'm not familiar with that command -- but it almost seems like you'd want to do the reverse; that is convert the special chars to their HTML representation. Sometimes function names can be confusing though, so you may have the right function.

Do you know the usage of the command, ideally it would be a drop-in replacement for the mb_convert_encoding commands -- it'll be one of the first commands you run in the script.

What we typically do when debugging this sort of thing is to write the output data to a text file (using php file commands within the authentication script) as there is no easy way to simply echo the information to the console when using special characters. This may help by first printing the raw data we send, then print the data as you've converted it, and finally print the raw data stored in the database for comparison to gauge your progress.

Accordingly, is the opposite of unhtmlspecialchars() just htmlspecialchars()?

Paul M 09-09-2007 06:03 PM

I didn't really read your code, you asked about decoding, which was what I answered.

Looking at your code then yes, you need to do the opposite, you want to code your username to match vb. The vb function is htmlspecialchars_uni(), but I believe vb does more than just that.

Kaelon 09-12-2007 02:39 PM

Thanks, Paul. I gave that a shot, but strangely, still no luck. Specifically, I used:

$username = htmlspecialchars_uni($username);
$password = htmlspecialchars_uni($password);

... and I still got invalid returns from the system. Then looking further, I also saw that the chat_auth.php code provided by Chris Duerr had already apparently done this analysis:

Code:

  # Fetch User Info from Database..
  $uid = 0;
  if ($userinfo = $db->query_first('SELECT userid, usergroupid, membergroupids, password, salt FROM ' . TABLE_PREFIX . 'user WHERE username = "' . addslashes(htmlspecialchars_uni($username)) . '"'))
  {
      # Invalid Password
    if (($userinfo['password'] != $password) && ($userinfo['password'] != md5(md5($password) . $userinfo['salt'])))   
          $auth = 0;   
    else
...


Paul M 09-12-2007 05:32 PM

You need to look in the user datamanager to see what other conversions vb does.

Kaelon 09-12-2007 06:48 PM

Quote:

Originally Posted by Paul M (Post 1337758)
You need to look in the user datamanager to see what other conversions vb does.

Sounds good. Where can I find the user datamanager?

Paul M 09-12-2007 07:52 PM

class_dm_user.php in the includes folder.

Grim77 05-03-2008 06:55 AM

Kaelon -- Just curious if we ever found a solution to this? I'm working on the 3.7 mod now, and would like to find a solution that doesn't require a non-standard php library.

Kaelon 05-06-2008 03:10 PM

Quote:

Originally Posted by Grim77 (Post 1506547)
Kaelon -- Just curious if we ever found a solution to this? I'm working on the 3.7 mod now, and would like to find a solution that doesn't require a non-standard php library.

Hi Grim77,

Unfortunately, no. Any of my users that have special characters in their usernames (such as accents, which are very common in Romance languages such as Spanish and French) have never been able to log in to our chat room properly. My recommendation would be to definitely allow special characters in the future.

Let me know how your progress goes with regards to this.

Thanks,
Juan

Grim77 05-06-2008 04:40 PM

Ok, Jaun -- We're working on the next update for v3.7 now. I'll look into this and see what we can do. :)

Kaelon 05-06-2008 04:49 PM

Quote:

Originally Posted by Grim77 (Post 1510677)
Ok, Jaun -- We're working on the next update for v3.7 now. I'll look into this and see what we can do. :)

Great, thanks, Chris!

Grim77 05-10-2008 12:20 AM

The release candidate is now online for v3.7 integration. You can get it from http://forums.addoninteractive.com/s...ead.php?t=3915

If you prefer to stick with 3.5/3.6, this is the code I've found to work but admittedly only tested on v3.7, though I don't think the way usernames are stored in the database has changed.

mb_convert_encoding almost does the trick, but not quite. I found the following code posted at php.net, and modified it so that HTML character codes aren't used for anything other than UTF-8 characters in the > 8 bit range, and it also allows for special characters (like '<') -- though some usernames with these special characters aren't permitted by the AddonChat chat software.

I've tested it using various English, Spanish and Arabic characters, and it seems to be working. Again though, if you're running v3.7 -- just download the release candidate and let me know if you run into any problems :)

PHP Code:

   /*
      UTF-8 to Numeric HTML Entity Conversion      
         Credit to: http://us3.php.net/manual/en/function.utf8-decode.php#75941
         ** Modified to only return HTML entities for characters out of 8 bit ASCII range.
         ** Modified to use htmlspecialchars_uni() function.
   */
   
function utf8_to_html ($data)
   {
      return 
htmlspecialchars_uni(preg_replace("/([\\xC0-\\xF7]{1,1}[\\x80-\\xBF]+)/e"'_utf8_to_html("\\1")'$data));
   }

   function 
_utf8_to_html ($data)
   {
      
$ret 0;
      foreach((
str_split(strrev(chr((ord($data{0}) % 252 248 240 224 192) + 128) . substr($data1)))) as $k => $v)
         
$ret += (ord($v) % 128) * pow(64$k);
         
      if(
$ret 256)
         return 
chr($ret);
         
      return 
"&#$ret;";
   } 

To use, insert the above code at the end of your authentication script, then find the following code:
PHP Code:

if ($userinfo $db->query_first('SELECT userid, usergroupid, membergroupids, password, salt FROM ' TABLE_PREFIX 'user WHERE username = "' addslashes(htmlspecialchars_uni($username)) . '"')) 

and replace it with:
PHP Code:

if ($userinfo $db->query_first('SELECT userid, usergroupid, membergroupids, password, salt FROM ' TABLE_PREFIX 'user WHERE username = "' $db->escape_string(utf8_to_html($username)) . '"')) 



All times are GMT. The time now is 09:20 AM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01219 seconds
  • Memory Usage 1,799KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (5)bbcode_code_printable
  • (3)bbcode_php_printable
  • (4)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (14)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete