Go Back   vb.org Archive > Community Discussions > Modification Requests/Questions (Unpaid)
  #1  
Old 03-22-2009, 03:32 PM
SBlueman SBlueman is offline
 
Join Date: Jan 2006
Posts: 717
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default Fixing Curly Quotes & Em Dashes

As many know the scourge of curly quotes and Em dashes can frustrate you to no end. Some XML readers can't read those characters and you end up with a mess on your hands.

I recently found this online and was wondering if there was a way you can implement this to vBulletin:

http://www.snipe.net/2008/12/fixing-...dashes-in-php/

Quote:
The curly quotes, or ?smart quotes? generated by Microsoft Word and other applications can be a real headache to developers. If you?ve built an administration area for your content publishers, and the publishers frequently compose their posts in Word and then copy+paste into your form to publish to the web, you may run into the situation where the curly quotes are replaced by your browser?s version of an unrecognized symbol, often a question mark. This can be particularly frustrating when Word-generated characters such as these curly quotes or em dashes break content-generated XML feeds, even after you?ve been careful enough to convert ?normal? HTML special characters so that your XML would be valid. Fortunately, there is an easy workaround.


Rather than try to convince your publishers to stop using Word to compose their content, the easier (and more effective) solution will be to replace the curly quotes with ?normal? quotes before the data is inserted into the database.

The function below will convert curly quotes and em dashes into standard quotes and dashes ?-?. If you?ve got a handful of classes or functions that you routinely use as part of your data scrubbing process (to clean data before it gets sent to the server), you may want to include this function in that group, that way you don?t ever have to think about it again.
PHP Code:
01.function convert_smart_quotes($string)
02.{
03.$search = array(chr(145),
04.chr(146),
05.chr(147),
06.chr(148),
07.chr(151));
08. 
09.$replace 
= array("'",
10."'",
11.'"',
12.'"',
13.'-');
14. 
15.
return str_replace($search$replace$string);
16.
Reply With Quote
  #2  
Old 03-25-2009, 04:47 AM
SBlueman SBlueman is offline
 
Join Date: Jan 2006
Posts: 717
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Anyone?
Reply With Quote
  #3  
Old 03-26-2009, 05:56 AM
SBlueman SBlueman is offline
 
Join Date: Jan 2006
Posts: 717
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Hello? McFly?
Reply With Quote
  #4  
Old 03-28-2009, 03:05 AM
SBlueman SBlueman is offline
 
Join Date: Jan 2006
Posts: 717
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Seriously....anyone???
Reply With Quote
  #5  
Old 04-30-2010, 09:51 PM
juanune2 juanune2 is offline
 
Join Date: Apr 2010
Posts: 1
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

I hate replying to a month-old thread, but I wasn't even a vBulletin user a month ago. : )

I ran into the same problem, and... really can't believe that it exists. This is a glaring hole and you really couldn't use it in a production environment without fixing (at least not as more than a toy). I chose vBulletin over other solutions because of some of the power that it offers wrt communities, so I won't go into a rant... as I still believe I made the right choice.

I'm honestly not sure why this exists. I've found the issue not just on smart quotes, but also on things like ….

I've done some investigation through the code and have determined that there isn't a good way of doing this without touching half of the code in the system, as there is no general-purpose text-cleaning function. Note that since they're high ASCII values, modyifing these is something that should be done immediately upon ingestion, but there doesn't appear to be a good way of doing that. There are even too many ways to enter text from the client-side, and no general text parser in the script (not that you should trust client parsing). It will also be language dependent.

That being said, there is still a bad way of doing it, and I've implemented a brute-force method. Note... I have only done cursory testing with this. I have found no issues as of yet, but please make sure that you do some sanity checks.

STEP 1: [REQUIRED!]
#1 on the list of things to do before using my hack is to move ALL attachments, images, profile images, and all other binary data out of the database and into the filesystem. I have no issues with this, since I'm one of those guys that doesn't believe that you should ever have this kind of data in there in the first place.

Here's info on how, why/why not to do that:
vbulletin docs for moving attachments
and
vbulletin docs for moving user pictures

STEP 2
Create a patch file. My file is called functions_custom.php, and lives in a 'custom' directory off of my root. It is a bit verbose (I chose a format that could be easily read), but there the full text:
PHP Code:
<?php
function convert_extraspecial_chars($string)
{
// uncomment the following line to turn the functionality off.
//return $string;  

     
$search = array();
     
$replace = array();

     
$search[] = chr(130);
    
$replace[] = '\'';
     
$search[] = chr(131);
    
$replace[] = '';
     
$search[] = chr(132);
    
$replace[] = '';
     
$search[] = chr(133);
    
$replace[] = '...';
     
$search[] = chr(134);
    
$replace[] = '';
     
$search[] = chr(135);
    
$replace[] = '';
     
$search[] = chr(136);
    
$replace[] = '';
     
$search[] = chr(137);
    
$replace[] = '';
     
$search[] = chr(138);
    
$replace[] = '';
     
$search[] = chr(139);
    
$replace[] = '';
     
$search[] = chr(140);
    
$replace[] = '';
     
$search[] = chr(174);
    
$replace[] = '(r)';
     
$search[] = chr(175);
    
$replace[] = '(c)';
    
    
    
     
$search[] = chr(145);
    
$replace[] = '\'';
     
$search[] = chr(146);
    
$replace[] = '\'';
     
$search[] = chr(147);
    
$replace[] = '"';
     
$search[] = chr(148);
    
$replace[] = '"';
     
$search[] = chr(149);
    
$replace[] = '"';
     
$search[] = chr(150);
    
$replace[] = '*';
     
$search[] = chr(151);
    
$replace[] = '-';
     
$search[] = chr(152);
    
$replace[] = '-';
     
$search[] = chr(153);
    
$replace[] = '';
     
$search[] = chr(154);
    
$replace[] = 'tm';
     
$search[] = chr(155);
    
$replace[] = '';
     
$search[] = chr(156);
    
$replace[] = '\'';
 
 
return 
str_replace($search$replace$string);
}  
?>
STEP 3
Hook your patch into the only place that seems to be a culmination point for input parsing, which is escape_string() inside of class_core.php (around line 717):
PHP Code:
    function escape_string($string)
    {
        require_once(
'../custom/functions_custom.php');
        
$string convert_extraspecial_chars($string);
        if (
$this->functions['escape_string'] == $this->functions['real_escape_string'])
        {
            return 
$this->functions['escape_string']($string$this->connection_master);
        }
        else
        {
            return 
$this->functions['escape_string']($string);
        }
    } 
What this does is force an extra cleaning pass for all info that passes into the DB, stripping certain high-ascii values. If you tried to skip step 1, you'll wind up corrupting 99% of binary files uploaded to the system... so don't do that.

I can't comment on general usage, since I've only worked through this on one pre-production installation, but it is working nicely here. As such, put this through a test pass before using it. If it works, great... put some comments in here. If it doesn't... maybe I can offer some assistance.

-- j

--------------- Added [DATE]1272684214[/DATE] at [TIME]1272684214[/TIME] ---------------

Ok, after dealing with this a bit more...

I changed the map file so that the mappings strip out high ascii characters instead of putting them through an HTML entities setup. Although... the real problem lies in how the database calls are structured. Again, the fix would mean changing every call in the system, and I'm sure that nobody is keen to do that.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 01:43 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.07516 seconds
  • Memory Usage 2,238KB
  • Queries Executed 11 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (3)bbcode_php
  • (1)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (5)post_thanks_box
  • (5)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (5)post_thanks_postbit_info
  • (5)postbit
  • (5)postbit_onlinestatus
  • (5)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete