Go Back   vb.org Archive > Community Discussions > Forum and Server Management
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #751  
Old 01-12-2010, 04:33 PM
amcd amcd is offline
 
Join Date: Oct 2004
Posts: 218
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

16.5 mil posts
488k threads
vb 3.6

no plans to move to vb 4 until everyone else does it, too.

boolean and phrase search are needed. been missing them.

spending for a search solution - no problem. spending 2k - no way.

Quote:
I would really love an updated version that runs on 3.6 that takes advantage of all the Sphinx goodies, since I don't see us moving to 4.0 for close to a year. All of the custom code we've written has to be ported and tested, and being the lone admin on a site this big has my hands full a lot of the time.
echo
Reply With Quote
  #752  
Old 01-13-2010, 08:14 PM
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 493
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...
Reply With Quote
  #753  
Old 01-13-2010, 09:26 PM
mute mute is offline
 
Join Date: Dec 2002
Location: Phoenixville, PA
Posts: 152
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by eoc_Jason View Post
Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...
One thing I'd like to have that we don't currently have, is properly ordered search results. If you don't do full reindexing on a regular basis, they tend to get really out of order.
Reply With Quote
  #754  
Old 01-15-2010, 10:47 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Apart from using Sphinx to search for the similar threads, you can also use it to generate the post excerpts with search keywords highlighted when in the "Show search results as posts" mode.

Our stats: almost 14 mln posts, 1.1 mln threads, 300k users, vB 3.8.
We're using our own Sphinx implementation since it predates the hack in this thread.

We got rid of the obscure search and sort modes though (such as sorting by the number of views or replies), and there was not a single complaint from our members. I don't think you should focus too much on 100% compliance with the default search. Having too many document attributes will inflate the index size, resulting in more I/O and more sluggish performance.
If you are worried about the need to edit the default search form template, you could always clone it, make the necessary changes and ship it with the product.
Reply With Quote
  #755  
Old 01-15-2010, 06:26 PM
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 493
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...

I.E. If a site has 10,000,000 posts... Have 10 indexes each with 1,000,000 threads. Then have each of the indexes rotate say hourly. This would be a shift from the typical one massive re-index nightly (or however often you do it). In theory too, the last index would contain the most recent posts and could be re-indexed more often.

I dunno, that's just a thought... My concern right now is the core code for searching, the indexes themselves can be manipulated differently at a later time as that is transparent to everything else.
Reply With Quote
  #756  
Old 01-16-2010, 05:20 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by eoc_Jason View Post
Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...
That's what we're doing, too, though the delta is still there. The bonus is that you can set up a distributed index with the number of agents equal to the number of CPUs, like described here, to take advantage of all CPUs in the server. However it's more of a manual operation, it would be hard to generate a partitioned sphinx.conf automatically.
Reply With Quote
  #757  
Old 01-17-2010, 09:17 PM
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 493
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

kmike - thanks for that info, I must over looked over that in the docs...

Just curious, how much of a performance difference did you see using the distributed process?

I kind of got sidetracked today... One of my good friend's wife just got out of the hospital, so I was there for a while today. Then I was coding some anti-spammer measures for my forum registration process...
Reply With Quote
  #758  
Old 01-17-2010, 09:49 PM
mute mute is offline
 
Join Date: Dec 2002
Location: Phoenixville, PA
Posts: 152
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

We have 2 post indexes, one or our live post table, and one for our archived post table. They each have 30 million posts each. I don't see a point in sharding the post indexes aside from being able to take advantage of multiple CPUs when indexing.

The way I see it, if I can keep the old indexes online while I do a full reindex, I don't really care how long the full reindex takes since (at least in our case), the search server is just a slave database server and not our primary.
Reply With Quote
  #759  
Old 01-18-2010, 11:20 AM
Kevlar's Avatar
Kevlar Kevlar is offline
 
Join Date: Nov 2001
Location: Ft. Lauderdale, FL.
Posts: 93
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

The only thing I am waiting on before converting to vB4 is sphinx (or a working search alternative). The rest of the little stuff I modded I can do with or without until those developers get upgrades.

1.3 million threads
18 million posts
Reply With Quote
  #760  
Old 01-18-2010, 12:13 PM
kris kris is offline
 
Join Date: Nov 2001
Posts: 8
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

mute, can you share how did you archive post table ? What changes did you do in code and MySQL ? I want to move my old posts to another post_archive table but I am not sure how can I join those tables from vbulletin code.

eoc_Jason
my forum is 200k threads and 10mil posts, vb 3.8.4. I have only one database (no slave), nginx webserver, Core I7 with 12GB RAM.

I installed sphinx on server and from ssh it works great but from moded search.php it works very strange, sometimes when I want to find some keywords with option "show results as posts" it returns "no results" message but if I change search options to "show results as thread" with same keywords, I got good numbers of results showen as threads.

Users posts search does not works at all, search.php?do=finduser&u=xxx always gives blank screen no php errors in log or anywhere just blank screen and thats it.
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 11:11 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2024, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.08891 seconds
  • Memory Usage 2,271KB
  • Queries Executed 12 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (3)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (2)pagenav_pagelinkrel
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (10)postbit
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete