vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   vB3 Programming Discussions (https://vborg.vbsupport.ru/forumdisplay.php?f=15)
-   -   Help needed with parsing (https://vborg.vbsupport.ru/showthread.php?t=31876)

Saint 10-30-2001 10:15 PM

am new to php and needed help in parsing some html code from an external webpage (with consent from owner)
and outputing it on the index.php of vB.

Not parsing the whole webpage though
just some of it.

Any anyone help?

MrLister 10-31-2001 01:01 AM

post the code here.... you'll get much more response... if the code is too big then post the problem area.

Mark Hensler 10-31-2001 05:13 AM

This will involve pattern matching. You'll need to know what's before and after the text you want.

If you want to read docs on some of the functions you might use...
eregi(), file(), preg_match()

Saint 10-31-2001 12:26 PM

Quote:

Originally posted by MrLister
post the code here.... you'll get much more response... if the code is too big then post the problem area.
The HTML code or the php code?

I know nothing about PHP, am still learning. :(

If it's the HTML code, yes I can paste it here.

Saint 10-31-2001 01:16 PM

This is the HTML from the site I want to parse

<TABLE WIDTH="70%" >
<TR>
<TD><FONT SIZE=+1><a name="patch">Patch Server:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 57h 20m 06s</TD>
</TR>
<TR>
<TD><FONT SIZE=+1><a name="login">Login Server:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 97h 49m 06s</TD>
</TR>
<TD><FONT SIZE=+1><a name="AOLLegends">AOL Legends:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 1h 31m 06s&nbsp;&nbsp;<A HREF="http://ultima.lightning.net/uo/en/history/AOLLegends.html"><FONT SIZE="-2">[details]</FONT></a></TD>
</TR>


Note that I only need to parse some of it not the whole HTML so some stripping need to be done.
i.e I only need to parse the code I highlighted in red above.
The page that I'm parsing the HTML from refreshed every 60 secs.

MrLister 10-31-2001 01:38 PM

As Mark already mentioned try looking into eregi(), file(), preg_match() on php.net and i'm pretty sure there are a few scripts that do something like this... you could try and look them up at hotscripts.com and look at the source and get an idea from there.

Saint 10-31-2001 01:55 PM

Ok thanks

Mark Hensler 10-31-2001 03:03 PM

What you'll be doing is pattern matching. So, you have to know what is surrounding the text you want.

Do you only want the those two pairs in red?
Or, do you want anything in this pattern:
<TR>
<TD><FONT SIZE=+1><a name="login">TEXT TEXT TEXT</a></FONT></TD>
<TD>TEXT TEXT TEXT HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 97h 49m 06s</TD>
</TR>

When your pattern matching, you want to be very specific.

Saint 10-31-2001 07:39 PM

I just want the 2 pair in red.

In total there's about 14 pairs of that on that page.

But output differently on my page,
I'll want to replace his image file with my own image file.
But i need to know which image file is on his page at that time cos there's 2 type, a grnball.gif and a redball.gif

I'll name mine the same too, but will be of different pic.

Thanks for replying Mark.

Mark Hensler 10-31-2001 08:12 PM

Wait.. do you want only those 2 pairs (login server, aol legends), or all 14 pairs? (I'm looking at pairs as the text and image)

Some "Quickie Code" (untested)
PHP Code:

// suck the remote file into a string
$remote_site join(''file("http://remote.domain.com/index.php") );

preg_match_all(
    
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
    
$remote_site,$matches);

for (
$i=0$i<count($matches[3]); $i++) {
    
$name $matches[3];
    
$image $matches[5];
    if (
strstr($image,'grnball.gif')) {
        
// green ball
    
}
    else {
        
// red ball
    
}
    
    
// do your thingy
    


functions docs: file(), join(), preg_match_all(), strstr()

Good Luck,

Saint 11-01-2001 05:16 PM

Quote:

Originally posted by Mark Hensler
[B]Wait.. do you want only those 2 pairs (login server, aol legends), or all 14 pairs? (I'm looking at pairs as the text and image)

Sorry all the 14 pairs.
Yes, pairs as in the text and image.


I'm still trying to absorb your code.
Am a newbie at this. :(


Thanks

Mark Hensler 11-02-2001 01:44 AM

Let me try to break it down for you..
PHP Code:

// suck the remote file into a string
$remote_site join(''file("http://remote.domain.com/index.php") );

// now, pattern match for the desired text, in this case,
// $matches[3] will contain the value of the first red block (the name-like thingy)
//$matches[5] will contain the value of the second red block (the image source)
preg_match_all(
    
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
    
$remote_site$matches);

/**
 * $matches now looks like this:
 * $matches[3][0] = first match for the name block
 * $matches[5][0] = first match for the image block
 * $matches[3][1] = second match for the name block
 * $matches[5][1] = second match for the image block
 * etc.
 */

// loop through all the matches
for ($i=0$i<count($matches[3]); $i++) {
    
// put the name/image info into more user friendly variables
    
$name $matches[3];
    
$image $matches[5];
    
    
// find out what the image source was...
    
if (strstr($image,'grnball.gif')) {
        
// the image source contains "grnball.gif",
    
}
    else {
        
// the imag source does not contain "grnball.gif",
        // so it must be "redball.gif"
    
}
    
    
// do your thingy
    // you might print a new table using the $name/$image from the other site


I hope that helps (probably not =P). If you have a specific question, those are easier to answer.

Saint 11-02-2001 06:18 AM

means I got to repeat that 14 times for the pairs?

and add $matches(0) for all the pairs?

Mark Hensler 11-02-2001 06:37 AM

No, it is already looping through all the pairs. See where I said "// do your thingy"?

Try it.. just make a new file, and through this in there....
PHP Code:

<?
$remote_site = join('', file("http://remote.domain.com/index.php") );

preg_match_all(
    "|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
    $remote_site, $matches);

for ($i=0; $i<count($matches[3]); $i++) {
    $name = $matches[3];
    $image = $matches[5];

    echo "|" . $name . "|" . $image . "|";

    if (strstr($image,'grnball.gif')) {
        echo "the image is a green ball" . "|";
    }
    else {
        echo "the image is a red ball" . "|";
    }
    
    echo "<br>\n";
}
?>


Saint 11-02-2001 06:54 AM

Warning: file("http://ulitma.lightning.net/uo/index.html") - Undefined error: 0 in /usr/local/www/vhosts/nettiq.com/htdocs/serverstats.php on line 2

Warning: Bad arguments to join() in /usr/local/www/vhosts/nettiq.com/htdocs/serverstats.php on line 2

I got this error when I try to run the php script.

Mark Hensler 11-02-2001 02:57 PM

That URL doesn't work for me.

Saint 11-02-2001 03:12 PM

my mistake
typo
http://ultima.lightning.net/uo/index.html

I corrected it and when i run the php
it gives me a blank screen.

Mark Hensler 11-02-2001 06:28 PM

try this:
PHP Code:

<?
echo "Yes, I'm running<BR>\n";

$remote_site = join('', file("http://remote.domain.com/index.php") );

preg_match_all(
    "|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
    $remote_site, $matches);

echo "begining loop<BR>\n";

for ($i=0; $i<count($matches[3]); $i++) {
    $name = $matches[3][$i];
    $image = $matches[5][$i];

    echo "|" . $name . "|" . $image . "|";

    if (strstr($image,'grnball.gif')) {
        echo "the image is a green ball" . "|";
    }
    else {
        echo "the image is a red ball" . "|";
    }
    
    echo "<br>\n";
}
?>


Saint 11-02-2001 06:38 PM

trying now. :D

Saint 11-02-2001 06:43 PM

Nope.

Only get this 2 line

Yes, I'm running
begining loop

I should replace just the

$remote_site = join('', file("http://remote.domain.com/index.php") );

to http://ultima.lightning.net/uo/index.html right?

all the code stays.

Mark Hensler 11-02-2001 07:12 PM

$remote_site = join('', file("http://remote.domain.com/index.php") );

should become

$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

Saint 11-02-2001 11:42 PM

Yup.

That's how I did it.

Only got those 2 lines

Mark Hensler 11-03-2001 04:06 AM

I have some time right now.. let me try playing with it.

Mark Hensler 11-03-2001 04:25 AM

OK.. this works for me. You can edit it to say whatever you need..
PHP Code:

<?
// echo "Yes, I'm running<BR>\n";

$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
    "|<td>(.*)<a name=\"(.*)\">(.*):</a>(.*)</td>(.*)<td>(.*)<img src=\"(.*)\"(.*)</td>|Usi",
    $remote_site, $matches);

// echo "begining loop<BR>\n";

for ($i=0; $i<count($matches[0]); $i++) {
    $name = $matches[3][$i];
    $image = $matches[7][$i];

//    echo "|" . $name . "|" . $image . "|";

    if (strstr($image,'grnball.gif')) {
//        echo "the image is a green ball" . "|";
        echo "$name is <font color='#00CC00'>online</font><BR>\n";
    }
    else {
//        echo "the image is a red ball" . "|";
        echo "$name is <font color='#FF0000'>offline</font><BR>\n";
    }
} //END for
?>


Saint 11-03-2001 04:45 AM

That works!

Thanks alot for your time Mark!

Mark Hensler 11-03-2001 05:46 AM

With table, and images...
PHP Code:

<?
$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
    "|<td>(.*)<a name=\"(.*)\">(.*):</a>(.*)</td>(.*)<td>(.*)<img src=\"(.*)\"(.*)</td>|Usi",
    $remote_site, $matches);

echo "<table border=1 cellpadding=1 cellspacing=0>\n";

for ($i=0; $i<count($matches[0]); $i++) {
    $name = $matches[3][$i];
    $image = $matches[7][$i];
    
    echo " <tr>\n";
    echo "  <td>\n";
    echo "\t<font face='verdana, arial' size=1>";
    echo $name;
    echo "</font>\n";
    echo "  </td>\n";
    echo "  <td>\n";
    
    if (strstr($image, 'grnball.gif')) { 
        echo "\t<img src='http://nettiq.com/images/image1.gif'>\n"; 
    } 
    else { 
        echo "\t<img src='http://nettiq.com/images/image2.gif'>\n"; 
    }
    
    echo "  </td>\n";
    echo " </tr>\n";
    
} //END for

echo "</table>\n";
?>


Mark Hensler 11-05-2001 02:48 PM

new patterm:
Code:

<tr bgcolor="#ffc858">
<td><font size="+1"><a name=Catskills>Catskills:</a></font></td>
<td width="230" nowrap><img height=17 src="http://ultima.lightning.net/uo/img/grnball.gif" width=17
align=top> UP! for 6h 00m 05s</td>
<td width="170"><font size="-1"><a class=tbl href="http://ultima.lightning.net/uo/en/history/Catskills.html">details &gt;&gt;</a></font></td>
</tr>

revised version:
PHP Code:

<?
$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
    "|<tr.*<a name=.*>(.*):</a>.*src=\"(.*)\".*</tr>|Usi",
    $remote_site, $matches);

echo "<html>\n";
echo "<body>\n";

echo "<table border=0 cellpadding=0 cellspacing=0 align=center>\n";

for ($i=0; $i<count($matches[0]); $i++) {
    $name = $matches[1][$i];
    $image = $matches[2][$i];
    
    echo " <tr>\n";
    echo "  <td>";
    
    if (strstr($image, 'grnball.gif')) { 
        echo "<img src='http://nettiq.com/images/image1.gif'>"; 
    } 
    else { 
        echo "<img src='http://nettiq.com/images/image2.gif'>"; 
    }
    
    echo "</td>\n";
    echo "  <td>\n";
    echo "\t<font face='verdana,arial,helvetica' size='1'>&nbsp;";
    echo $name;
    echo "</font><BR>\n";
    echo "  </td>\n";
    echo " </tr>\n";
    
} //END for

echo "</table>\n";
echo "<br>\n";

echo "<center>\n";
echo "<font face='verdana,arial,helvetica' size='1'>\n";
echo "<a href='$PHP_SELF'>Refresh</a>\n";
echo "</font>\n";
echo "</center>\n";

echo "</body>\n";
echo "</html>\n";
?>

It's always a pain when the remote site changes their pattern. 8[


All times are GMT. The time now is 07:38 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.02928 seconds
  • Memory Usage 1,834KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)bbcode_code_printable
  • (7)bbcode_php_printable
  • (2)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (27)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete