vb.org Archive

vb.org Archive (https://vborg.vbsupport.ru/index.php)
-   Modification Requests/Questions (Unpaid) (https://vborg.vbsupport.ru/forumdisplay.php?f=112)
-   -   Is there a way to restrict how often guests can refresh? (https://vborg.vbsupport.ru/showthread.php?t=319955)

spamgirl 08-20-2015 02:44 PM

Is there a way to restrict how often guests can refresh?
 
I was wondering if there is any add-on that can limit how frequently guests are allowed to refresh? I'd like members to be able to refresh as much as they want, but guests to be limited, regardless of the server load. Thanks for your help!

Elite_360_ 08-20-2015 03:29 PM

Their is no way to stop someone from refreshing their browser.

Max Taxable 08-20-2015 03:34 PM

Leverage browser cache of static content, this way the browser doesn't load the entire KB on refresh. In fact it will load only elements it didn't already encounter on first load.

Example, if your site loads 400kb, on refresh it should only be 1 or 2 percent of that. Because the rest is cached.

spamgirl 08-20-2015 03:40 PM

It's actually to ensure people aren't using scripts to scrape our site. We don't want to turn off public access, but we do want people to stop taking content from our site and reposting it elsewhere. Having to track down their host information and file a copyright complaint is getting to be a real time suck.

We'd just like an error to be shown if they refresh more than once every minute, which I know is possible when server load is high (if it's above x then certain membergroups see an error message, while other member groups do not). I'd even be happy if it only updated the page content every minute.

Max Taxable 08-21-2015 12:31 AM

You need "Ban Spiders by User Agent" then, a good comprehensive list of bad bots is available and contains most of the known content scrapers, and you can add any you see to the list as well.

spamgirl 08-21-2015 12:48 AM

Quote:

Originally Posted by Max Taxable (Post 2553358)
You need "Ban Spiders by User Agent" then, a good comprehensive list of bad bots is available and contains most of the known content scrapers, and you can add any you see to the list as well.

The problem is that it's a single person scraping our site for their own, and I don't know their IP, otherwise I'd just ban them. :(

Max Taxable 08-21-2015 03:48 PM

You get the IP and their user agent string while they are on your site, from the WoL or even the server logs.

But let me get this straight - you want to restrict the reload of all visitors, because you have one person manually scraping content?

spamgirl 08-21-2015 04:01 PM

Quote:

Originally Posted by Max Taxable (Post 2553404)
You get the IP and their user agent string while they are on your site, from the WoL or even the server logs.

But let me get this straight - you want to restrict the reload of all visitors, because you have one person manually scraping content?

We have hundreds of guests on the site, I have no way to determine which the scraper is.

I just want to temporarily slow them until we can figure out what's going on. If you have a better idea, I'd be happy to take your advice. :)

Max Taxable 08-22-2015 01:19 AM

Quote:

Originally Posted by spamgirl (Post 2553405)
We have hundreds of guests on the site, I have no way to determine which the scraper is.

I just want to temporarily slow them until we can figure out what's going on. If you have a better idea, I'd be happy to take your advice. :)

I would solve this problem by installing Paul M's "Track Guest Visits" and studying the log it provides daily, looking for IP addresses that load a lot of pages. That mod tracks visitors that way. It also gives you their user agent and tells exactly what pages they visited as well, and it's all timestamped even.

You must identify the bad actor and stop IT, not penalize all visitors. You want to slow down your page loading or otherwise restrict visitors, get ready for the hit from google in your search results and pagerank.

spamgirl 08-22-2015 11:50 AM

Quote:

Originally Posted by Max Taxable (Post 2553427)
I would solve this problem by installing Paul M's "Track Guest Visits" and studying the log it provides daily, looking for IP addresses that load a lot of pages. That mod tracks visitors that way. It also gives you their user agent and tells exactly what pages they visited as well, and it's all timestamped even.

You must identify the bad actor and stop IT, not penalize all visitors. You want to slow down your page loading or otherwise restrict visitors, get ready for the hit from google in your search results and pagerank.

Yeah, you're right. :( I'll give that a try, thank you!

Zachery 08-22-2015 04:58 PM

If you want to stop people from scraping your site, don't put it on the internet.

TheLastSuperman 08-22-2015 08:14 PM

Quote:

Originally Posted by Zachery (Post 2553448)
If you want to stop people from scraping your site, don't put it on the internet.

I know you weren't sitting there all riled up, intentionally posting something to sound mean or rude yet I thought back to an old saying from when we were kids, most of us were taught this; "If you don't have anything nice to say, don't say anything at all" - That's not you in my opinion. Since tone is always missing I can't assume but do you ever re-read what you type and realize its not offering one bit of help sometimes? I think the OP has a valid concern and wants helpful suggestions not a reply that can't be taken any other way but being a smarty-pants.

Spamgirl,

I think Max had an excellent idea... it may take more time to review the logs for certain guests with Paul's mod but if you do it now and find who you think the culprit is, it might help! Remember though that overseas a person can unplug their modem/router and BAM instant new IP address so if they happen to be where that can happen, lets hope they only scrape content and aren't toooooo web savvy :cool:.

spamgirl 08-22-2015 08:28 PM

Quote:

Originally Posted by TheLastSuperman (Post 2553454)
Spamgirl,

I think Max had an excellent idea... it may take more time to review the logs for certain guests with Paul's mod but if you do it now and find who you think the culprit is, it might help! Remember though that overseas a person can unplug their modem/router and BAM instant new IP address so if they happen to be where that can happen, lets hope they only scrape content and aren't toooooo web savvy :cool:.

FWIW, I get what Zachary is saying, but that doesn't mean I won't try to at least stem the flow. If we sit back and don't fight, we let the monsters win, and I refuse to do that in any situation. Nothing is hopeless. :)

Anyhoo, I agree that Max had an excellent idea! Already three IPs are sticking out like a sore thumb, and one of them seems to be the culprit (with a scraper I didn't even know about potentially being a second problem user). Based on their shitty web design skills, I'm hopeful that means they aren't tech savvy at all. :) Thank you all so much for your advice!

--------------- Added [DATE]1440343872[/DATE] at [TIME]1440343872[/TIME] ---------------

I've found the IPs and tried to block them with .htaccess. I included my own IP in order to test it, but I am still able to access the forum, I just can't see the CSS or images. Here is what I did:

order allow,deny
deny from ###.#.#.
deny from ###.#.#.
deny from ###.#.#.
allow from all

Does anyone know why it would be so wonky? I put it in the main folder of my forum (html1). My site is hosted on EC2, if that matters. I tried it last week and it worked, so I don't know why it wouldn't now...

Zachery 08-24-2015 11:03 PM

Sometimes the truth hurts, but its important to understand the limitations of what you can do. You can ban an ip, but it will probably change and come back.

You can make it so only registered users can view content, but then your search rankings go down.

You can make some content pay only, but chances are if its stuff people want someone will steal it, and hopefully they don't do it with a stolen credit card.

I do think you should fight, just be ready for the long haul.

If they're actually stealing and rehosting your content on their site, you could try a DMCA, but it may or may not work.

spamgirl 08-24-2015 11:22 PM

Quote:

Originally Posted by Zachery (Post 2553580)
Sometimes the truth hurts, but its important to understand the limitations of what you can do. You can ban an ip, but it will probably change and come back.

You can make it so only registered users can view content, but then your search rankings go down.

You can make some content pay only, but chances are if its stuff people want someone will steal it, and hopefully they don't do it with a stolen credit card.

I do think you should fight, just be ready for the long haul.

If they're actually stealing and rehosting your content on their site, you could try a DMCA, but it may or may not work.

I've been doing the DMCA, but they just change hosts every day. Now I'm blocking by IP, and just redoing it constantly. I've actually found *multiple* scrapers since installing the Track Guests extension, go figure. :/ I'll just keep up the good fight and hope I annoy them into scraping someone else lol

bridge2heyday 08-28-2015 10:00 AM

Is this what you are looking for ?
Limited Guest Viewing -- Motivate Guests to Register

Dave 08-28-2015 10:59 AM

It's not easy to prevent people from scraping your site.
IP's can be changed/proxies can be used and headers can be spoofed (thus making methods to detect the user-agent useless).

There may be one way to stop scrapers, that is to add a JavaScript check to your site before people are able to view your site. CloudFlare does this to prevent certain DDoS attacks. However people could simply just go to your site in a normal browser and save each file individually to their desktop.

spamgirl 08-28-2015 03:33 PM

Quote:

Originally Posted by bridge2heyday (Post 2553813)
Is this what you are looking for ?
Limited Guest Viewing -- Motivate Guests to Register

That is! Thank you so much. :)

Zachery 08-30-2015 10:48 PM

Quote:

Originally Posted by spamgirl (Post 2553822)
That is! Thank you so much. :)

FYI, some times search engines can penalize you for this. It won't work for anyone who is blocking cookies, or who decides to use specific user agents that are generally white listed.

More often than not it just leads to:

- More Users leaving your site
- Some Users registering just to view content, but not participate.

Max Taxable 08-31-2015 12:30 AM

Quote:

Originally Posted by Zachery (Post 2553923)
FYI, some times search engines can penalize you for this.

Pretty sure spiders are immune to it. If memory serves. I used it for awhile.

But for the rest, you're right. All it really does is irritate people.

Zachery 08-31-2015 10:55 PM

It can be considered content cloaking.

spamgirl 08-31-2015 11:07 PM

I'm not actually planning to use it, the Track Guests extension was extremely helpful. I was just thankful that it was suggested :)


All times are GMT. The time now is 01:05 PM.

Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.

X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.01919 seconds
  • Memory Usage 1,787KB
  • Queries Executed 10 (?)
More Information
Template Usage:
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (10)bbcode_quote_printable
  • (1)footer
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (6)option
  • (1)post_thanks_navbar_search
  • (1)printthread
  • (22)printthreadbit
  • (1)spacer_close
  • (1)spacer_open 

Phrase Groups Available:
  • global
  • postbit
  • showthread
Included Files:
  • ./printthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/class_bbcode_alt.php
  • ./includes/class_bbcode.php
  • ./includes/functions_bigthree.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • printthread_start
  • bbcode_fetch_tags
  • bbcode_create
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • printthread_post
  • printthread_complete