Quote:
Originally Posted by mute
We have 2 post indexes, one or our live post table, and one for our archived post table. They each have 30 million posts each. I don't see a point in sharding the post indexes aside from being able to take advantage of multiple CPUs when indexing.
The way I see it, if I can keep the old indexes online while I do a full reindex, I don't really care how long the full reindex takes since (at least in our case), the search server is just a slave database server and not our primary.
|
Splitting up the posts index into several sources has more advantages that just re-indexing. As kmike eluded to, you can setup "agents" on your server so that when a person does a search it will search all the sources in parallel utilizing one source per CPU.
For most people though, even large indexes we are all probably just using 1 CPU without realizing it and still maintaining less than one second search times...
This week I'm going to be swamped working on some new AMD processor reviews, so work on the new sphinx code will have to be set aside until Tuesday next week.
I'm super pumped though, after diving into the documentation and code, I realized it's really not that bad at all...