PDA

View Full Version : Geographic Redundancy / Multi-Master for VB - What's going to bite me?


neatbrian
01-30-2008, 03:42 PM
Greetings,

I've set up a geographically redundant VB with the following configuration

- One server in a data-center on the east coast running VB
- A second server server in a data-center on the west coast running VB


Each server runs its own copy of VB and its own copy of MySQL.

Database replication is done in real time using MySQL's multi-master circular replication (updates to one database are reflected in real time on the other end). This works amazingly well. Security is ensured through a back-end encrypted SSH tunnel linking the two MySQL instances together.

Site/file replication is done with rsync every 60 seconds (again, using the SSH tunnel for encryption).

Load balancing is done with a simple round-robin DNS setup. Primary DNS is on one side, secondary DNS on the other. This gives me a 50/50 load between the two servers without needing to rely on a single-point-of-failure load balancer.

Failover is accomplished by using a short TTL in the DNS (60 seconds) -- if one side disappears, the other side alters the zone file automatically to remove the failed site. When the other side comes back up, the other node is added back into the DNS rotation. This ensures the maximum "Downtime" for a user will be 60 seconds (the TTL value) if their server disappears.


So far, it's working out really well. We've really been able to handle some heavy loads with absolute EASE... but there are a couple of issues I'm worried about:

1) Temp-file cleanup becomes a bit of a nightmare since neither server can really know for sure that it's safe to wipe out a temp file that might have come from the other side.

2) Cron-based events... is it safe to run these on BOTH sides?? What will happen??

3) What about things like emails for thread-subscriptions... So far I've not heard any complaints, but are both servers going to attempt to send them out? I'm operating under the assumption that the server who accepted the post to the "subscribed thread" is responsible for launching this event... so I should be fine..

4) Anything else I'm too green to understand about VB... What do I not know about VB that could bite me later in this configuration?

Alfa1
01-30-2008, 08:21 PM
Excellent idea!

*subscribes*

Marco van Herwaarden
01-31-2008, 07:07 AM
I have no experience with this kind of setup, but an interesting idea.

1) Is it wise to synchronize temp files between the servers?

2) If O.S. cronjobs: Yes you will need to ensure yourself that each job only runs once.
If vB Scheduled Tasks: This is based on status values stored in the database. If the database is synchronised then this should not be a problem.

DigitalCrowd
01-31-2008, 01:47 PM
I have setup this configuration before, one with a datacenter in Europe, another in the United States. It does work really well with VB. I don't current run this setup, as I don't have a box in Europe anymore, but I did get comments from those geographically closer to that server that it loaded much faster for them because the latency was significantly less.

The best part about this setup is literally you could shut off one server for 48 hours and turn it back on and everything syncs back up, no promoting a slave to a master and then trying to switch that back at a later date.

I did this a year or two ago, so I may have forgot some of the small details or problems I had with it or ran into. I did use Nettica for DNS failover, they provide a low-cost (not as low-cost as your own solution) that provides an "outside" monitoring of your servers and changes DNS on the fly.

I would also redirect traffic that was located in the EU to the EU server and even had a latency test for registered users to pick which server was best for them automatically.

Cron-based events... is it safe to run these on BOTH sides?? What will happen??

You are safe with VB scheduled tasks, but if you have any CRON scripts that run against your site or database to fire off events, then you'd only want to initiate that on one server, and then if the other goes down, all the other server to perform that task. You can also take it a bit further and offset the cron time by 5 minutes or so and allow the other server which is not primary for cron to check to see if those crons had been run, and if not, run them itself.

What about things like emails for thread-subscriptions... So far I've not heard any complaints, but are both servers going to attempt to send them out? I'm operating under the assumption that the server who accepted the post to the "subscribed thread" is responsible for launching this event... so I should be fine..

Yea, this would be a VB initiated cron/scheduled task, and you'd be fine.

Anything else I'm too green to understand about VB... What do I not know about VB that could bite me later in this configuration?

You pretty much have to store attachments and avatars/pics in the database in this setup. You can get around it, but you lose the "it just works" part of this configuration.

I remember have some issues with the configuration, it seemed that the two masters would eventually get out of sync for whatever replication error. So, sometimes you'd have different sets of data, someone would post and then they'd end up at the other IP from a TTL expiration and wonder where their post went. Some of the issues I had I don't think would exist if I didn't have a huge latency between the masters, some 170ms I remember, opposed to what you probably have between coasts in the USA. The key was to run a script that would monitor your masters and make sure they where in sync and if not to try and correct the issue automatically, else notify you of the fact and either consider one of the boxes "failed" and drop it out of DNS or whatever is best for you. The good news is, once the replication kicks back in, it will all sync up.

Make sure both boxes have IDENTICAL clocks and sync to the same time server on a very frequent basis. The more the two boxes are out of the sync, the more often you need to sync them. I remember having issues where someone would post a response to someone else, but it would show up before the person who asked, because of some clock issues.

It was fun, I enjoyed setting it up. I was asked recently how I had that setup and if I'd do it again, and I said that while it worked and I would recommend it for the right situation, it was more prone to glitches than a setup where it was a master slave configuration. You can accomplish the same thing by having a slave mysql on the second server and its own caching server. While you still have added latency between the servers (because of the geo-distance) so anyone hitting that secondary box could have slightly slower access, it does provide you with all your goals and if the master server goes down, promote the slave master and keep going forward. It's not as easy to get all back up and going in the original config as was master-master, but the frequency of that happening should be few and far between.

Last thing I can think of, is when you do VB upgrades or anything that uses the DB, you can't just shut down one server, perform the upgrade, bring it up and then do it all over again on the other. You still have to take down the site, update the php files on BOTH servers, and run upgrade.php, then bring it back up. Because they both share the same database, if you turn one off, the other goes off.

As I said earlier, I am sure there are things I'm forgetting... but nice to see someone else do this. I don't feel like such a rebel anymore. :)

What people considering this setup need to remember is to ask your question WHY do you need a geo-distant solution? If your motivation is strictly a failover in the event that the primary datacenter has an outage, goes out of business, gets hit by a tornado, or covered in volcanic ash, then your best running a "hot stand-by" that doesn't need to have all the meat and potatoes of your primary server, but could take the load, even if temporary until you could figure out how long the primary server will be out and what you need to bring up another box or if your holding steady on the backup.

For those looking to setup a server in say Europe and one in the United States and the primary motivation is to provide a faster site to those in their respective home locations this can be accomplished by loading a up server with all your attachment, images and "bulk" of your web page to a server closer to them, without actually running VB on it. The "text" can still come from your primary server or you could setup a "slave" type setup so that it could be a hot fail-over if needed and don't load balance between the boxes, just send traffic that is closer to the other secondary box when it seems appropriate for those users.

Everyone SHOULD keep a copy of their database and files at some other location outside of the datacenter their server is in. That is just a given and most people don't do that. They might backup to another drive in their server or to a "backup space" provided by the datacenter.

Scott

FlyBoy73
02-01-2008, 11:49 PM
Excellent information. Thanks!

Alfa1
02-02-2008, 08:08 PM
I would love to read more experiences with this setup.
How about bandwidth costs? With the database being sent to the other server frequently, this seems like a costly adventure.

DigitalCrowd
02-03-2008, 10:23 AM
I would love to read more experiences with this setup.
How about bandwidth costs? With the database being sent to the other server frequently, this seems like a costly adventure.

Bandwidth usage is very small compared to regular web traffic. Your only transferring the changes in the data between the databases and any usage associated with syncing up your web directories.

mikellogg
02-13-2008, 07:15 PM
Round-robin DNS isn't going to help people connect to a closer server. They will randomly connect to either the east or west coast server.

For those looking to setup a server in say Europe and one in the United States and the primary motivation is to provide a faster site to those in their respective home locations this can be accomplished by loading a up server with all your attachment, images and "bulk" of your web page to a server closer to them, without actually running VB on it. The "text" can still come from your primary server or you could setup a "slave" type setup so that it could be a hot fail-over if needed and don't load balance between the boxes, just send traffic that is closer to the other secondary box when it seems appropriate for those users.
That sounds like the best we can achieve, though you will need to use something other than round-robin DNS to let browsers know which is closest.

DigitalCrowd
02-13-2008, 07:50 PM
You can use round robin, just use whichever server it hits to make the decision on which is closer and redirect the browser to a dns name associated with the closer server.

mikellogg
02-13-2008, 08:10 PM
You can use round robin, just use whichever server it hits to make the decision on which is closer and redirect the browser to a dns name associated with the closer server.
Request goes to www.forums.com
Then you redirect to east.forums.com or west.forums.com?
That's not DNS round-robin, that's redirecting :)

I would love to be proven wrong. I've been looking for a (cheap) solution to this issue for a while. DNS round-robin distributes the load, but it uses round-robin, which, by definition, is semi-random. I've been using it for three years now.

DigitalCrowd
02-13-2008, 08:35 PM
Sorry, the thread was aging and I didn't go back and read everything to detail. I never said to use round robin to do redirects (from my scan of the thread) however... it is possible, yet almost pointless.

Browser originates request to www.forum.com....

Gets random IP form DNS server.... for SERVER B....

Server B sees the IP would be best suited for SERVER A... then does a redirect, thus leaving round robin DNS.

But, this goes back to my original point, if your goal is to serve up traffic based on IP address to a geo-closer server, then round robin is not what you want to do but just redirect, although if the will desires it, it can be done. ;)