PDA

View Full Version : Searchlog


TosaInu
08-09-2004, 10:41 AM
Hello,

Our site has about 500,000 posts, not the biggest site, but some posts are pages long. It's pretty important to have an efficient searchlog (the more there we want other hacks). I read about the fulltext: it's interesting for us and it's not.

-It's possible to exclude some forums from the searchlog, but the sql fulltext is an all or nothing (as far as I understand it). Our board has an Off Topic forum and the content is 'volatile'. The topics shouldn't be deleted but storing all those posts in the searchlog while hurting the search for content isn't a jolly good idea either.

A post database with an sql fulltext search index is about as large as a post database having a searchlog. The searchlog can be made smaller though (people having access to SQL config can probably gain there). An optimized searchlog is better for storage and I guess it will beat the sql fulltext in speed.

-The SQL search omits small words, it's easy and necessary to add some site specific ones in the searchlog. I estimate we have 50 smaller than 3 letters word. Prime subjects of our site. Searchlog allows to do that.

The searchlog lacks some options though to make it the perfect solution for us. A badwordlist. Storing the 10,000's of records with variants of $@#!, cowstuff, horsetool, $ costs and pound sterling costs of products, words merged with &tags like &34my, numbers, yes..yes, yeh, yes, yes? and 10's of their variants, hello, hallo, ciao, current, altogether, nice, mine, yours .................... is not efficient. The word mine alone has 3631 records.

A tool to delete such entries from an existing searchlog would also be great. I know it's possible to make sql queries in say PHPMyAdmin, but it's errorprone and timeconsuming.

A PHP script that lists the wordlog and allows to select the words you want to strip will be convenient. The script stores the array of word ID's and deletes the corresponding records in the postindex

DELETE FROM vb_postindex
WHERE WORDID = deleteWORDID

I lack even the basic knowledge to create even the most basic PHP script. I guess it will be of great help to optimize the searchlog, I will surely use it. Someone please?

TosaInu
08-09-2004, 12:52 PM
There's a badword list: vb\includes\searchwords.php. That's nice. Would it be possible to make a cleaning tool and have it automatically insert/append words that are removed to the badwordlist? Train it so to say.

Liquid1ce
08-09-2004, 12:59 PM
TosaInu

totally off topic but maybe you should change your forum default skin to your own :)

TosaInu
08-09-2004, 01:08 PM
Hello Liquid1ce,

You mean on this board? I'm lost again, what benefit does that give?

TosaInu
08-09-2004, 03:08 PM
Did some manual queries, stripped some greetings and political factions currently used to insult each other:

SQL-query : [Wijzigen] [Creƫer PHP Code]
DELETE FROM `anvb3_postindex` WHERE wordid = '5972';# Getroffen rijen:1944
DELETE FROM `anvb3_postindex` WHERE wordid = '5974';# Getroffen rijen:5270
DELETE FROM `anvb3_postindex` WHERE wordid = '5986';# Getroffen rijen:350
DELETE FROM `anvb3_postindex` WHERE wordid = '6009';# Getroffen rijen:1215
DELETE FROM `anvb3_postindex` WHERE wordid = '6016';# Getroffen rijen:1654
DELETE FROM `anvb3_postindex` WHERE wordid = '6024';# Getroffen rijen:11481
DELETE FROM `anvb3_postindex` WHERE wordid = '6025';# Getroffen rijen:18
DELETE FROM `anvb3_postindex` WHERE wordid = '6039';# Getroffen rijen:1
DELETE FROM `anvb3_postindex` WHERE wordid = '6040';# Getroffen rijen:607
DELETE FROM `anvb3_postindex` WHERE wordid = '6041';# Getroffen rijen:27
DELETE FROM `anvb3_postindex` WHERE wordid = '6102';# Getroffen rijen:761
DELETE FROM `anvb3_postindex` WHERE wordid = '6104';# Getroffen rijen:13805
DELETE FROM `anvb3_postindex` WHERE wordid = '6123';# Getroffen rijen:3631
DELETE FROM `anvb3_postindex` WHERE wordid = '6130';# Getroffen rijen:158
DELETE FROM `anvb3_postindex` WHERE wordid = '6152';# Getroffen rijen:137
DELETE FROM `anvb3_postindex` WHERE wordid = '6174';# Getroffen rijen:10
DELETE FROM `anvb3_postindex` WHERE wordid = '6175';# Getroffen rijen:14
DELETE FROM `anvb3_postindex` WHERE wordid = '6189';# Getroffen rijen:3046
DELETE FROM `anvb3_postindex` WHERE wordid = '6190';# Getroffen rijen:776
DELETE FROM `anvb3_postindex` WHERE wordid = '6194';# Getroffen rijen:2
DELETE FROM `anvb3_postindex` WHERE wordid = '6195';# Getroffen rijen:16
DELETE FROM `anvb3_postindex` WHERE wordid = '6209';# Getroffen rijen:2
DELETE FROM `anvb3_postindex` WHERE wordid = '6211';# Getroffen rijen:18
DELETE FROM `anvb3_postindex` WHERE wordid = '6222';# Getroffen rijen:289
DELETE FROM `anvb3_postindex` WHERE wordid = '6238';# Getroffen rijen:13
DELETE FROM `anvb3_postindex` WHERE wordid = '6242';# Getroffen rijen:704
DELETE FROM `anvb3_postindex` WHERE wordid = '6256';# Getroffen rijen:897
DELETE FROM `anvb3_postindex` WHERE wordid = '6261';# Getroffen rijen:304

Getroffen rijen means deleted rows.

~40,000 deleted rows are deleted.