PDA

View Full Version : A challenge for regular expressions gurus


amykhar
06-03-2003, 03:20 PM
I need to figure out how to strip a certain style of number from urls. The challenge, I know the format of the number, but the actual number will be different every time. The second challenge? The number can appear in different places in the url.

I need to strip numbers that look like this: 102-5152600-7203338

from urls that look like this: http://www.amazon.com/exec/obidos/tg/browse/-/548166/102-6125745-6132915

or this:
http://www.amazon.com/exec/obidos/tg/detail/-/B00004W430/qid=/br=1-4/ref=br_lf_outlet_4//104-5184292-0799155?v=glance&s=outlet&n=548166

or whatever other crazy format Amazon decides to use.

amykhar
06-03-2003, 04:39 PM
I'm making progress. Turns out, I can lop off everything after the pattern of numbers that I am looking for. And, I believe the reg exp that I need to find looks like this:

/[0-9](3)-[0-9](7)-[0-9](7)


Off to play some more :D

filburt1
06-03-2003, 04:42 PM
Dammit, you should have waited longer so I could come to the rescue with my regexp genius! :p

Remember to have the trailing /, too :)

amykhar
06-03-2003, 05:18 PM
It's not working yet. Still time for you to be my knight in shining armor.

Arg! I HATE regular expressions!

amykhar
06-03-2003, 05:23 PM
Here's the problem, I need to find the pattern, and then preserve everything BEFORE the pattern. I can discard the rest.

I figured this would work:


list($url,$rest) = split("/[0-9](3)-[0-9](7)-[0-9](7)/",$string);


but, either it's not finding the pattern, or it's not splitting it properly.

Amy

gerlando
06-03-2003, 05:44 PM
I'm just learning Regex's myself, but I think you should try and simplify things and make sure its finding the pattern before you try doing anything else to it (like the split).

The error you have in your regex is the parens. they should be curly brakets i.e. [0-9]{3}-[0-9]{7}-[0-9]{7}, other then that it will work, I tested it in javascript since i don't have a php server available. Here's what I used for reference.

var referrer="http://www.amazon.com/exec/obidos/tg/browse/-/548166/102-6125745-6132915";
regex=/[0-9]{3}-[0-9]{7}-[0-9]{7}/i;
var ar = regex.exec(referrer);
Response.write (ar[0]);

amykhar
06-03-2003, 05:53 PM
Bless you for finding the error :) That's what comes from having bad eyes and a tutorial with fine print :D

filburt1
06-03-2003, 06:02 PM
Hint: anything you need returned in preg_match or preg_match_all should be in parens; for example:

preg_match("/([A-Za-z0-9]*)stuff([0-9]*)/", $somestring, $match);
echo "<pre>";
print_r($match);