vb.org Archive - View Single Post - Show Thread Enhancements - AME - Auto Media Embedding (youtube, Amazon, google, myspace, etc...)

The Geek · #88 06-29-2007, 06:22 PM

I thought I would put down some basics of creating regexp's for AME as many users are wanting to create their own. This isnt an exhaustive list, nor am I an expert on regexps, this is just a rough, basic overview to get you started.

AME takes the regexp you provide and wraps it in [url] tags, boundaries and codes to make the regexp case insensitive so you don't have to worry about doing that. You do have to remember though that your regexp needs to match the entire URL. If it doesnt, it wont qualify as a match in AME.

Best way to test regular expressions (that I have found) is with regexp buddy. This is what I do:

Start up Regexpbuddy
Click the test tab
Tick the 'case insensitive' option
In the box below the tabs, I past the URL I want to create a regexp for. You need to be able to identify what part of the URL is the part you want to extract. In this instance, I am trying to create a regexp for http://www.clipfish.de/player.php?videoid=MzEwODYwfDg2NzY0Ng==. So I want to extract the MzEwODYwfDg2NzY0Ng part.
I then paste everything leading up to the part I want to extract into the top window like this: http://www.clipfish.de/player.php?videoid=
I then escape special characters from the url with the \ character like this: http://www\.clipfish\.de/player\.php\?videoid=. At this stage, RegexpBuddy should have highlighted your test URL up to the part we want to extract. If it hasnt, then you are missing something.

I now need to define a character class that will allow me to match the pattern I am after. If the pattern is only word characters (i.e. letters and or numbers), then I can use [\w]. If it is letters only, then I would use [a-z], If they are numbers only then I can use [\d].
I can also specify additional characters that can appear. For instance, if I wanted a class that allows word characters and underscores, I could do [\w_]. If I wanted letters only, hyphens and underscores, I could do [a-z_-]

In the case where I am trying to extract MzEwODYwfDg2NzY0Ng then a word character class would work fine: [\w]

The problem is that only matches the first occurence of a character in the class. In other words, my match would be http://www.clipfish.de/player.php?videoid=M NOT http://www.clipfish.de/player.php?videoid=MzEwODYwfDg2NzY0Ng== which is what I want!.

This is where special characters come in.

. will match any single character that is NOT a line break
* will match 0 or unlimited times
+ will match once or unlimited times
? will match 0 or 1 time.

So, to make my character class work, I use [\w]+

So now my regexp looks like:

http://www\.clipfish\.de/player\.php\?videoid=[\w]+

Now, that will match, but I need to capture whatever pattern is matched in the [\w]+ part. Thats where ()'s come into play. If I so this:

http://www\.clipfish\.de/player\.php\?videoid=([\w]+)

Then I get the contents of that pattern.

However!!! It still wont match yet because there are these annoying == signs in there! Since we are not sure how and when they will appear, lets just create another class to accomadate whatever else may come after.

[&\w;=+_-]* That class says "match any single character that is an &, a word (or digit), a semi colon, a plus, an underscore and a hyphen 0 to an unlimited amount of times (the asterix says that!). That means that any of those mentioned characters may of may not appear, but nothing outside of that class can appear (for instance, a %).
So my final regexp looks like:

http://www\.clipfish\.de/player\.php\?videoid=([\w]+)[&\w;=+_-]*

And in the case of AME, I can put $p1 in the replacement HTML to get the 'movie' id which in this case is MzEwODYwfDg2NzY0Ng.

nJoy

X vBulletin 3.8.12 by vBS Debug Information
Page Generation 0.01192 seconds Memory Usage 1,780KB Queries Executed 11 (?)
More Information
Template Usage: (1)SHOWTHREAD_SHOWPOST (1)ad_footer_end (1)ad_footer_start (1)ad_header_end (1)ad_header_logo (1)ad_navbar_below (1)footer (1)gobutton (1)header (1)headinclude (6)option (1)post_thanks_box (1)post_thanks_button (1)post_thanks_javascript (1)post_thanks_navbar_search (1)post_thanks_postbit_info (1)postbit (1)postbit_onlinestatus (1)postbit_wrapper (1)spacer_close (1)spacer_open Phrase Groups Available: global postbit reputationlevel showthread	Included Files: ./showpost.php ./global.php ./includes/init.php ./includes/class_core.php ./includes/config.php ./includes/functions.php ./includes/class_hook.php ./includes/modsystem_functions.php ./includes/functions_bigthree.php ./includes/class_postbit.php ./includes/class_bbcode.php ./includes/functions_reputation.php ./includes/functions_post_thanks.php Hooks Called: init_startup init_startup_session_setup_start init_startup_session_setup_complete cache_permissions fetch_postinfo_query fetch_postinfo fetch_threadinfo_query fetch_threadinfo fetch_foruminfo style_fetch cache_templates global_start parse_templates global_setup_complete showpost_start bbcode_fetch_tags bbcode_create postbit_factory showpost_post postbit_display_start post_thanks_function_post_thanks_off_start post_thanks_function_post_thanks_off_end post_thanks_function_fetch_thanks_start post_thanks_function_fetch_thanks_end post_thanks_function_thanked_already_start post_thanks_function_thanked_already_end fetch_musername postbit_imicons bbcode_parse_start bbcode_parse_complete_precache bbcode_parse_complete postbit_display_complete post_thanks_function_can_thank_this_post_start showpost_complete
Messages: