Go Back   vb.org Archive > Community Discussions > Forum and Server Management
  #141  
Old 10-17-2006, 07:40 AM
ALanJay ALanJay is offline
 
Join Date: Jun 2002
Location: London
Posts: 46
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban
ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.
OK - does anyone else get "assert" warnings?

What I have done is set the warning messages off

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

in sphinxapi.php

As far as I can see the assert errors are generated because the asserts all check to see if things are integers and some of the input defaults are either text strings numerics or text strings.

These warnings don't seem to effect the output which seems to work pretty well. But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

So this warning implies that one of the items is the wrong datatype - checking back through the code on line 34 and 50 these are set to:

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

Is this the issue? Should they be numeric?

For anyone interested this is now live at:

www.digitalspy.co.uk/forums/

We have 11,158,584 Posts and 464,239 Threads. And the main data file is a little over 4Gb in size.

It is still a work in progress but it does seem to produce the correct results

Quote:
Originally Posted by ALanJay
But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))
After much thought we realised that we don't use the $Coventry feature and I suspect that is the reason it does not work. As I'm not sure what $Conventry should resolve to I have removed from my implementation the whole line. It seems to say if not moderator and sent to Coventry then don't do search and as we have no people in the secodn category removing it seems to be the best short term solution.

I'm not sure if this is an issue between 3.0.x and 3.5/3.6 but thought I would share my thoughts on this as it kept me on my toes and I now have a much better understanding of the way the code works


PS the docinfo[$spinx????] elemets turn the group defaults into numerical output as required. I'm still not sure why the assert errors are being seen though will delve deeper

PPS Well after more searching and playing I am no further forward as to why the assert warning errors are occuring. Trying to force the elements to be integers with intval breaks the code so I am now with a system that seems to work but generates warning errors that I have switched off. I assume no one using 3.6 is having these issues with these assert warnings?
Reply With Quote
  #142  
Old 10-17-2006, 09:58 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Why does intval() break any code?

And maybe the $Coventry variable is something else in vB 3.0...

I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0
Reply With Quote
  #143  
Old 10-17-2006, 11:07 AM
ALanJay ALanJay is offline
 
Join Date: Jun 2002
Location: London
Posts: 46
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban
Why does intval() break any code?
That is a very good question. I suspect I am not using it 100% correctly but in the simplest example line 32

$sphinx_groups2 = $sphinx_userids;

to

$sphinx_groups2 = intval($sphinx_userids);

Seemed to cause odd behaviour.

I was also seeing if using it in:

if (!empty($userids)) $sphinx_userids = explode(',', $userids);
else $sphinx_userids = array();
if ($forumchoice != '') $sphinx_groups = explode(',', $forumchoice);

But wasn't sure I could use it in this context.

My problem is that not entirely understanding the logic of what is going on here (but learning as I go along). I'm not sure why I am seeing the "Warnings" yet they generate perfect results.

Depending on the results each of the elements "SetGroups" "SetGroups2" SetGroups3" generate these warning errors but because these are arrays I need to build the array with integers and I assume not numerics that are text(?)

Quote:
Originally Posted by orban
And maybe the $Coventry variable is something else in vB 3.0...
It is possible - from talking to my system admin it allows you to not allow users to do certain things. After thinking about this I don't think it is an issue as we don't use it. So for me removing it solves the problem that the second element of the if statement that checking if the user has been sent to Coventry isn't nescessarry.

Quote:
Originally Posted by orban
I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0
No problem without your code we wouldn't have been able to do this at all. So thanks so much.

I assume you don't see any of the assertion errors in vB 3.6 ?

Anyway as you can see (if you register on our site) the Sphinx search does work and very smoothly and quickly and great solution to off looading the search function out of the main database.

One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?
Reply With Quote
  #144  
Old 10-21-2006, 08:51 AM
Swamper Swamper is offline
 
Join Date: Oct 2001
Posts: 19
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?
Why not have that specific search just redirect to the standard vB search.php? It's fast.

----

Found my way here via the Big Boards Thread on vB.com - wow - I'm going to get on this right away! We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks and for over a year now we've survived only because our search was split up into separate tables according to date range - updated nightly - and stored on another drive, but with 'Search this Thread', 'View New Posts' and 'Find all posts by User' acting on the live post table.
Reply With Quote
  #145  
Old 10-23-2006, 05:48 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by Swamper
We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks
Be warned that vB 3.6 is much more CPU demanding than vB2 (and even vB3), so you'd better beef up your web frontend(s) before the final switch.

Quote:
Originally Posted by orban
Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.
Reply With Quote
  #146  
Old 10-23-2006, 08:49 AM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by kmike
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.
Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.
Reply With Quote
  #147  
Old 10-23-2006, 02:40 PM
ALanJay ALanJay is offline
 
Join Date: Jun 2002
Location: London
Posts: 46
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban
It's because they are both arrays, or a string of comma seperated numbers?

You'd have to use array_walk, lemme know if you need help.
I everntually worked this out but never managed to get it to work sucessfully I assume something in difference between the way 3.0.x and 3.6 handles these casues a problem. Because it is only a warning I have left it - maybe next time there is an opportunity to play I will have another go with array_walk if I can fathom the syntax to get everything switched from numerals as text to integers.

Quote:
Originally Posted by orban
With or without key words?

If it's without it's using the default search and I can't really help with that.
Without which I now understand why it is slow and we have removed it from our choiced 1 minute to bring back the answer was a little long.


Overall it has been running now for a week and once we sorted a few things out it has been excellent and using your cool current and DELTA index the databases are updated every 15 minutes and the whole site reindexed every night.

Thanks for the ideas this has been an excellent tool and remarkably easy to implement.
Reply With Quote
  #148  
Old 10-23-2006, 02:56 PM
orban orban is offline
 
Join Date: Jan 2005
Posts: 445
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Glad to hear it works for you!
Reply With Quote
  #149  
Old 10-23-2006, 03:08 PM
ALanJay ALanJay is offline
 
Join Date: Jun 2002
Location: London
Posts: 46
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.
I will have a play - thanks.

Quote:
Originally Posted by orban
Glad to hear it works for you!
Seems to Of the various hacks and atempts to solve the text search issue this one seems to have delivered on its goals. There are still a few things I don't understand and which would probably improve performance but overall it works well.

Maybe you have some ideas on the issues:

morphology = none
stopwords =
min_word_len = 3
charset_type = sbcs
}

What do morphology and stopwords do / offer and how to best use them.

and

mem_limit = 256M
}

mem_limit for creating the index anyone have any views as to sensible optimum answer for this we are running this on a machine with 8Gb of RAM and as it started as 32M I didn't want to make it too big but it still complains it could be better


==============

Looking in the original configuration file I think I have a handle on the morphology, word_len and char set.

Would I be right in saying that the stopwords file is a list of words NOT to index?

If so does anyone have a good list of 2 and 3 letter words that can happily be removed from an index

==============

Looking on the sphinxsearch forums there is discussion on creating stop words and the indexer can produce list of most used words for you to work with ie

/usr/local/bin/indexer --config sphinx.conf --rotate --buildstops sphinx-stop.txt 1000 --buildfreqs

This builds a file with the most commonly used words in the index and the frequencythat they are in your index.

If I understand this correctly it should allow you to remove a few of the obvious things.

Quote:
Originally Posted by orban
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.
Hi,

Looking at the code in sphinx.php:

Code:
if ($titleonly)
{
        // searching thread titles
        $sphinx_index = $sphinx_thread_index_name;
        $sphinx_groups2 = $sphinx_userids;
        $sphinx_forumid_group =  'group';
        $sphinx_switch_group =   'group3'; //firstpostid
        $sphinx_userid_group =   'group2';
        // only titles, nothing to weight
        $sphinx_weights = array ( 1 );
}
Where do you put the array_walk manipulation?

As far as I can tell one needs the results of the various items above to be so processed.

or do you implement it:

Code:
$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( $sphinx_weights );
// $cl->SetLimits ( 0, $vboptions['maxresults'] );
$cl->SetLimits ( intval(0), intval($vboptions['maxresults']) );
$cl->SetMatchMode ( SPH_MATCH_ALL );
$cl->SetGroups ( $sphinx_groups );
$cl->SetGroups2 ( $sphinx_groups2 );
$cl->SetGroups3 ( $sphinx_groups3 );
$cl->SetGroups4 ( $sphinx_groups4 );
$cl->SetGroups5 ( $sphinx_groups5 );
$cl->SetSortMode ( $sphinx_sort );
ie

$cl->SetGroups4 ( (array_walk( $sphinx_groups4, "intvalArray") );

I assume it doesn't matter that sometimes the array will be one element long.

=====================================

Quote:
Originally Posted by orban

Quote:
Originally Posted by alanjay
Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?


With or without key words?

If it's without it's using the default search and I can't really help with that.


Quote:
Originally Posted by Swamper
Why not have that specific search just redirect to the standard vB search.php? It's fast.


Searches without keywords already are redirected to the default search.
Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?
Reply With Quote
  #150  
Old 10-24-2006, 09:16 AM
kmike kmike is offline
 
Join Date: Oct 2002
Posts: 169
Благодарил(а): 0 раз(а)
Поблагодарили: 0 раз(а) в 0 сообщениях
Default

Quote:
Originally Posted by orban
Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.
Well, it's their own fault then ;-)
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT. The time now is 07:55 PM.


Powered by vBulletin® Version 3.8.12 by vBS
Copyright ©2000 - 2025, vBulletin Solutions Inc.
X vBulletin 3.8.12 by vBS Debug Information
  • Page Generation 0.04883 seconds
  • Memory Usage 2,295KB
  • Queries Executed 12 (?)
More Information
Template Usage:
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_header_end
  • (1)ad_header_logo
  • (1)ad_navbar_below
  • (1)ad_showthread_beforeqr
  • (1)ad_showthread_firstpost
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (2)bbcode_code
  • (18)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)navbar
  • (3)navbar_link
  • (120)option
  • (1)pagenav
  • (1)pagenav_curpage
  • (4)pagenav_pagelink
  • (3)pagenav_pagelinkrel
  • (10)post_thanks_box
  • (10)post_thanks_button
  • (1)post_thanks_javascript
  • (1)post_thanks_navbar_search
  • (10)post_thanks_postbit_info
  • (10)postbit
  • (10)postbit_onlinestatus
  • (10)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available:
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files:
  • ./showthread.php
  • ./global.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/class_hook.php
  • ./includes/modsystem_functions.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_post_thanks.php 

Hooks Called:
  • init_startup
  • init_startup_session_setup_start
  • init_startup_session_setup_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • style_fetch
  • cache_templates
  • global_start
  • parse_templates
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • forumjump
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • post_thanks_function_post_thanks_off_start
  • post_thanks_function_post_thanks_off_end
  • post_thanks_function_fetch_thanks_start
  • post_thanks_function_fetch_thanks_end
  • post_thanks_function_thanked_already_start
  • post_thanks_function_thanked_already_end
  • fetch_musername
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • post_thanks_function_can_thank_this_post_start
  • pagenav_page
  • pagenav_complete
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • showthread_complete