Go Back   vb.org Archive > vBulletin 3 Discussion > vB3 Programming Discussions
  #1  
Old 02-20-2005, 10:54 AM
Michael Morris's Avatar
Michael Morris Michael Morris is offline
 
Join Date: Nov 2003
Location: Knoxville TN
Posts: 774
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default How do I hunt down and store all URL's

I'd like to write a function to look over a message's contents and extract all the sites in all the URL's of the message for later storage in the database.

Any ideas?
Reply With Quote
  #2  
Old 02-20-2005, 11:15 AM
Dean C's Avatar
Dean C Dean C is offline
 
Join Date: Jan 2002
Location: England
Posts: 9,071
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

This would be a pretty mammoth task, probably best done with MySQL regexp's:

http://dev.mysql.com/doc/mysql/en/regexp.html
Reply With Quote
  #3  
Old 02-20-2005, 07:58 PM
Michael Morris's Avatar
Michael Morris Michael Morris is offline
 
Join Date: Nov 2003
Location: Knoxville TN
Posts: 774
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Let me explain what I'm doing a little more clearly. I'm working on a hack designed to make spamming my boards a bit harder (actually, a LOT harder). I've noticed that comment spammers may forge IP's and email addresses, but they can't forge the destination URL or else you can't go to their damn site.

So I'm writing a program that gives moderators the option to "spam ban" a user. When they do the program pulls up all their messages, scans their messages, and presents a list of all URL's present. The moderator can then uncheck any he doesn't want to consider spam - the rest are added to a table in the database either as a domain (banning all it's pages) or page (banning just that page).

When users with less than 20 posts make a post the system compares their newly submitted message against the database of known bad URL's. If a match is found the poster is immediately put into a temporary ban pool and their post is set to invisible so that it has to be moderated. The next time a moderator logs in they get an alert that a spam attempt has been made and then they can go to a review screen and determine whether the system's temp ban was appropriate and confirm it if it was. They're also alerted to which entry in the database prompted the message to be spam banned.

Currently I have the message comparison working - that's simply a while loop over the database with a stristr statement. On the first match the function returns true and jumps out of the while loop. If there's no match the function returns false.

I think the answer to the question I posted is going to be based around preg_match_all. But I could use help on a good query screen to extract the site domain out of the url. There's an example of this on www.php.net that I might be playing with.
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 03:59 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.03641 seconds
  • Memory Usage 2,174KB
  • Queries Executed 13 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (3)post_thanks_box
  • (3)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (3)post_thanks_postbit_info
  • (3)postbit
  • (3)postbit_onlinestatus
  • (3)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete