PDA

View Full Version : Thoughts on a SQL query for removing spam


RideMonkey!
08-13-2013, 03:16 PM
I've written up the following to do some database purging of spam that has accumulated. I've commented what each section is designed to do - could someone take a look and see if A) it makes sense, or B) I'm missing anything?

# Remove all moderated visitor messages
delete from `visitormessage` where visitormessage.state = 'moderation'

# Ban all users who have left a visitor message containing a URL, have no posts and haven't logged on in 6 months
update `user`
left join (visitormessage)
on (user.userid=visitormessage.postuserid)
set user.usergroupid='14', user.displaygroupid='14'
where visitormessage.pagetext like '%url%' and
user.posts < 1 and
user.lastactivity < UNIX_TIMESTAMP('2013-02-12 00:00:00')


# Delete all visitor messages for banned users with fewer than 5 posts
delete `visitormessage` from `visitormessage`
left join (user)
on (user.userid=visitormessage.postuserid)
where user.usergroupid='14' and
user.posts < 5

# Delete user homepages for banned users, or those with no posts
update `user` set homepage='' where usergroupid='14'
update `user` set homepage='' where posts < 1

# Delete signatures for banned users
update `usertextfield`
left join (user)
on (user.userid = usertextfield.userid)
set signature=''
where user.usergroupid='14'

# Delete URL-containing signatures from users with no posts
update `usertextfield`
left join (user)
on (user.userid = usertextfield.userid)
set usertextfield.signature=''
where usertextfield.signature like '%url%' and
user.posts < 1

kh99
08-13-2013, 03:42 PM
I'm not an sql expert or anything but they seem to do what the comments suggest. The only thing I'm wondering is, does %url% match a url? I thought it would match text that literally has 'url' in it somewhere. ETA: oh, maybe you're matching the 'url' in the bbcode tag? I guess that could work.

RideMonkey!
08-13-2013, 03:59 PM
Yes, I'm looking for the bbcode tag with URL in it. "http" doesn't work because you don't have to include that in the URL tag, and "www" would not catch anything that had a subdomain that wasn't www... so, I'm figuring this will catch the majority. I may go back through and do a more granular cleanup but selecting these statements out of the database is eliciting thousands of spam visitor messages & profiles so I figured I'd start with a bulk remove and decide where to go from there.

Thanks for the feedback. Do I need to run any cleanup tasks in VB after doing this? I couldn't find any counters for visitor messages so my assumption was there wasn't anything additional.