PDA

View Full Version : File Based Posts Cache : Believe It


Trigunflame
07-12-2004, 10:00 PM
Ok, well I sorta have this idea...im not so sure everyone will agree with it.. but in my personal task, I find it to be better.

What I am wanting to do, is keep a flatfile based cache of posts ordered in there own folders via thread id in their own array as they would appear in mysql table.

For example say you want to cache threadid 5454. Now depending on 1 thing, it changes final file output. You determine when you run script how many post per file, before it creates another one.. for example.

Thread 5454 has 5000 post, you want to have 1000 post per file, structure would be:
/(urforum)/cache/5/4/5/4/5454_0.php
/(urforum)/cache/5/4/5/4/5454_1.php
/(urforum)/cache/5/4/5/4/5454_2.php
/(urforum)/cache/5/4/5/4/5454_3.php
/(urforum)/cache/5/4/5/4/5454_4.php

In the case where another thread say 3526 has 500 post, and a limit of 1000 post per file, structure would be:
/(urforum)/cache/3/5/2/6/3526.php

This is all done, to limit the size of each file generated. As most people only have their post per page set anywhere from 20-50 no point in loading one whole thread at once, when you can split it into multiple files.

Now, back to the point, I personally like to grab files directly from the filesystem when possible as to mysqldb, especially if I have some noobs try to DOS me, cause MySQL will always take the hit the worst due to queries. Even so, filebased reads generally will be faster and less cpu intensive.

This is really just an alpha script, to show proof of concept atleast for the part that makes the files..

Basically, you create a 'cache' directory, chmod it 777, upload this file and point ur browser to it like so..

http://ursite.com/urforum/cache/cache.php?t=threadid&pp=howmanypostperfile

And vwallah, it creates the files. Hence my own problem thusfar is with memory limit errors, ive tried my best to cut down the memory usage.. however testing on my own server it generally works fine however when i used it on a large thread ie. 12,000 post, it wants to cough up.. Maybe someone here can shed some light.

Also, another interesting sideeffect, if you use the script, set it real low.. like say 50 post per page, on a say 500 post thread, doesnt really matter... an interesting thing happens, if you look at the filesize for each incremented file, you will notice it is getting larger by around 30kb.. even though it should be making a generally steady filesize for each file until the last. As that is the whole point of spreading it amongst files in the first place.

Take a look, tell me what you think etc... Once I figure this out.. I know I will be implementing it, cause I personally like filebased caching. Sure this will probably double the size of ur total forum... but then again if you have a dedicated server like me.. it doesnt really matter..

O by the way, what I had planned on it doing, was everytime a new thread or new post is created, it is auto added to the cache, so no having to manually rebuild the cache everytime.

nexialys
07-13-2004, 12:11 PM
before trying to help or test, i have some questions:

1- serverside, what is the cost in resources ?!... because reading files is more resource-expensive than read the SQL cache...

2- moving thread, make a real mess in your cache files ?!

3- archive browsing ?!

4- on Windows server, that load of directories would crash the server... solution ?!

5- what is the real use of this anyway ?!.... real points, not only dreaming that it may work!

;)

nexialys
07-13-2004, 12:13 PM
my personal need for such a system would be to archive - really archive - closed threads... so when a thread is locked or closed, instead of browsing it from the DB, it would read the files in a text way, like most blogs do...

so ursite.com/thread/1/53 would load the post #53 in the thread #1 ...

Trigunflame
07-13-2004, 12:17 PM
1. actually no.. mysql is a hell of a lot more resource intensive than reading a file into an array from a flatfile

2. why would that make a mess, the post data is the only thing stored..and thats done by thread id... doesnt matter what forum u move the thread to oO

3. i dont see why not

4. its using the same concept vbulletin does for folder limits.... by seperating them into 4/3/5/6/7/ etc...

5. to save resources.... a direct filebased read will always be quicker in most cases especially under load, then having to send a query to mysql, and mysql fetch data, then send it back through link

So yes..there is a valid reason..

ps. dont double post.

Andreas
07-13-2004, 12:23 PM
Hmm .. how would you enforce permissions on viewing threads with this?
Just everybody could access the files ...

And it would be kinda static - no display of ppl currently browsing the thread, no online indicators, ...

Trigunflame
07-13-2004, 12:27 PM
Kirby...what are you talking about, cache.php is just a proof of concept file to generate the cache.

The post data are stored in serialized arrays of all the data that is normally present in the rows of the post table, line for line.

The actual code I will make will be implemented directly into the showthread/newpost functions to handle reading from the cache instead of from the mysqldb.

Its not like im trying to remake a whole forum, Please understand, all this does is save Post array data to a flatfile, this does not mess with permissions or any of that crap, its just simply using the files to read the data, instead of from the mysql db.

Note: The post will still be stored in the db, it just wont be read from there when people view threads..

Andreas
07-13-2004, 12:33 PM
Basically, you create a 'cache' directory, chmod it 777, upload this file and point ur browser to it like so


Please understand, all this does is saved Post data to a flatfile, this does not mess with permissions or any of that crap, its just simply using the files to read the data, instead of from the mysql db.


AFAIK 777 is world-readable, so just anyone could view the thread - no matter if he has permission to do so or not.

Trigunflame
07-13-2004, 12:35 PM
Lol, Kirby... its not that hard to write a <?php die; ?> at the top of the files, to keep anything from being shown through the browser. Or did you not know that trick? :ermm:

ps. 777 is "word-writable/readable"

Andreas
07-13-2004, 12:40 PM
Ack. That's the point I missed in your description :)
I though you are just generating data-files (although I should have noticed the .php - lol).

Anyway, I'd really fear the idea of heaving some 100.000 thread-files floating around.

> ps. 777 is "word-writable/readable"
Hmm ... how does Word come into play here? :D

Trigunflame
07-13-2004, 12:45 PM
Ack. That's the point I missed in your description :)
I though you are just generating data-files (although I should have noticed the .php - lol).

Anyway, I'd really fear the idea of heaving some 100.000 thread-files floating around.

> ps. 777 is "word-writable/readable"
Hmm ... how does Word come into play here? :D

When you make typos from staying up nearly 2 days straight like me OO

Erwin
07-13-2004, 08:27 PM
This is a great idea and an interesting implementation method. There are lots of potential for uses, and would really reduce server load if it is possible to complete remove the need to connect to the database for normal thread perusing by guests, spiders or users. It would be most useful for close and old threads, which do not need dynamic updates of post information.

I look forward to updates on this concept. :)

AWS
07-13-2004, 09:06 PM
This is a great idea and an interesting implementation method. There are lots of potential for uses, and would really reduce server load if it is possible to complete remove the need to connect to the database for normal thread perusing by guests, spiders or users. It would be most useful for close and old threads, which do not need dynamic updates of post information.

I look forward to updates on this concept. :)
Well said Erwin my thought exactly. I have said many times that vbulletin needs a true archive which would take closed threads or older threads and put them in flatfiles to rerad from there. This seems to be taking it one step further. To have all threads read from static files would reduce the server load imensly. The script itself after the initial run would be resource intense, but, once it only has to do a few threads say every 15 minutes it wouldn't be bad at all.

I hope this becomes a reality. Someone started something like this for vb2 and never finished it.

nexialys
07-13-2004, 09:25 PM
so be it... i'm happy my questions are answered well... ;) - i'm a bit devil's advocate so i ask strange things!

... this hack answer my need of an archive way to do things... hope to see the ACP soon!

feature request: archive a complete thread in zip or tar when the topic is closed or deleted... so we can make a followup.. ;)

Xenon
07-13-2004, 09:33 PM
hmm, i think i know what i'll do in my Holidays ^^

i'll port my vb Archive to vb3 ;)

Actually to get back to this hack, i'm with erwin, that seems to be a great idea.

nexialys
07-13-2004, 09:37 PM
also, once we're in... would be good to have a specific archive browser for this cache... so if we want to keep these cached threads in a different pattern, we can have a different template to display them... i like that idea, not used for everyone though!

Trigunflame
07-14-2004, 02:58 AM
This is a great idea and an interesting implementation method. There are lots of potential for uses, and would really reduce server load if it is possible to complete remove the need to connect to the database for normal thread perusing by guests, spiders or users. It would be most useful for close and old threads, which do not need dynamic updates of post information.

I look forward to updates on this concept. :)

I actually hadnt thought of that erwin, my general idea was for it to be seamlessly integrated into the forum itself, but now that you mention it. Really is a great idea, considering not having to use "any" connection.

Also, if mysql went down the potential would be there for people to still read thread in some cases, if I can put together a script that uses Vbulletins code to reproduce a queryless forum.

Ps. anyone got any ideas on reducing memory usage, or about the increase in file size sequentially. I know its not that big of a deal as we can adjust the sizes for memory limit... but I want this hack to be available to everyone, not just those that can manually tweak their settings.

Erwin
07-14-2004, 06:31 AM
I actually hadnt thought of that erwin, my general idea was for it to be seamlessly integrated into the forum itself, but now that you mention it. Really is a great idea, considering not having to use "any" connection.

Also, if mysql went down the potential would be there for people to still read thread in some cases, if I can put together a script that uses Vbulletins code to reproduce a queryless forum.

Ps. anyone got any ideas on reducing memory usage, or about the increase in file size sequentially. I know its not that big of a deal as we can adjust the sizes for memory limit... but I want this hack to be available to everyone, not just those that can manually tweak their settings.Basically, you would be creating a new way of storing threads and posts in parsed html in files, rather than in the mysql database. :) Very interesting concept! vB3 can already store parsed HTML in the database to save on queries and load, but imagine if you could do this in real-time in files, and really only update the threads daily if a new post is posted... you would cut down server load by a lot, especially for large sites like mine, by eliminating database queries altogether for normal thread reading! :)

lazytown
07-14-2004, 08:59 AM
Some of you are stating that this would reduce server load, but I am confused by this. Programs like UBB Classic originally used a flat-file type system for the entire board. They then upgraded it to "Threads" that uses MYSQL. Threads can handle many more simultaneous users and posts. Itsn't that what also makes VBulletin able to handle many more users & posts? I previously used UBB which did use flat files and it was much more server intensive.

-Victor

PS: Perhaps this is only true if users do searches/etc? I'm almost certain MYSQL will use less resources on a search than flat files.

Trigunflame
07-14-2004, 09:54 AM
You are correct vissa in the search aspect, although I have not done realtime benchmarks im almost certain that a fulltext search in mysql would be faster than an ereg or array manipulation THEN search inside the result of a flatfile.

But that is not what this script is doing, you will still be using mysql to search, the post will still be stored in mysql. This script will only make a change to the part of vbulletin that actually shows the post in which case will be drawn from a flatfile WHICH will be faster and less CPU intensive than a query of result rows from MySQL, I should also mention using less Memory.

In your regardst to threads, i doubt that is wh at UBB is referring to. They probably just coined that name in reference to "threads" as in topics. "Real" "Threads" are kernel dependent things. Depends on if your BSD/*Nix/etc build supports "Threading" which extended support can be compiled from within MySQL if you build it yourself.

To support optimized MultiThreading you have to build a kernel with modified File Handles, a new Glibc and latest and greatest LinuxThreads module then build your MySQL using other-libc its a pretty complicated process but lets you optimize the range of MySQL of about 400% or more. In tests ive done as well as various books, its shown that doing a MultiThread modfication can take a fully optimized MySQL from being able to handle several hundred active connections to well over 4000, limited only by your file handle count.

Only problem is, a lot of people dont know that trick and instead rely on getting multiple DB servers, when they can solve their problem with a semi-complex recompile.

Back to the subject, flatfile will always be faster for fetching results, and since these are done in read mode, they can support multiple file reads at once, there is no locking when done with reading.

Freezerator
07-14-2004, 10:23 AM
Interesting,

but will discspace grow huge?

For instance: I have almost 900.000 posts and 60.000 threads.
If one cached thread takes up 30kb, it will cost me around 1800mb discspace.

That's almost twice my database size?

nexialys
07-14-2004, 10:31 AM
that's what i said... used for archiving, not pure editing/search.

what i would suggest is to focus on one part of the possibilities... some here are suggesting it can become a great archive system, for closed topics and archive system, and i think it can be the best of both worlds... because actually it's one of the things that is missing most here...

also, would be cool to merge with a semi-RSS feed, like on most blogs ... once we control this technique, personal blogs for our members will be easy to deal with, as there is no need to use the db for blogs...

anyway, back to work/test!

Erwin
07-14-2004, 10:43 AM
Some of you are stating that this would reduce server load, but I am confused by this. Programs like UBB Classic originally used a flat-file type system for the entire board. They then upgraded it to "Threads" that uses MYSQL. Threads can handle many more simultaneous users and posts. Itsn't that what also makes VBulletin able to handle many more users & posts? I previously used UBB which did use flat files and it was much more server intensive.

-Victor

PS: Perhaps this is only true if users do searches/etc? I'm almost certain MYSQL will use less resources on a search than flat files.
UBB classic used CGI to serve up dynamic information based on a flat-file system. It's the CGI/ PERL-based code that is the drain on the server.

However, what we are talking here is static HTML files, not CGI/ PHP based - that is a world of difference to UBB. :) We are talking about loading up static HTML files.

Great for threads to be read by guests who can't post anyway.

Mind you, I'm sure forum programmers would have thought of this, and if it was this simple to implement, it would have been done already. :) I'm sure there would be some overhead, which will probably come from the frequent updating of these static HTML files, which would probably override any if not all of the savings in server resources.

Zachery
08-02-2004, 11:04 PM
UBB classic used CGI to serve up dynamic information based on a flat-file system. It's the CGI/ PERL-based code that is the drain on the server.

However, what we are talking here is static HTML files, not CGI/ PHP based - that is a world of difference to UBB. :) We are talking about loading up static HTML files.

Great for threads to be read by guests who can't post anyway.

Mind you, I'm sure forum programmers would have thought of this, and if it was this simple to implement, it would have been done already. :) I'm sure there would be some overhead, which will probably come from the frequent updating of these static HTML files, which would probably override any if not all of the savings in server resources.
I could see issues with the dynamic content for posts, if there was a constant writing to the file system it would be almost as bad as using mysql if not worse

TosaInu
11-30-2004, 05:15 PM
Interesting.

Is it possible to use more than one SQL database from another server within same network? We have a large Of Topic forum and it would be nice if the topics and posts of that one were served from the other database/server we have.

buro9
12-06-2004, 08:57 PM
Nice thread, I'm a bit of a caching geek so let me point you to a few things:

Firstly... take a look at the cache cannon hack I was making for VB 2.x, it was abandoned as I was making it for a very specific person who was on a safe_mode protected box and it wouldn't work with safe_mode... however it did work fine on a normal box.

Anyhow, link for that:
https://vborg.vbsupport.ru/showthread.php?t=36000

That created 100% flat filed and browsable forums, threads and posts.

For searchability you would have to consider something that indexed the threads seperately (such as a site indexing tool) or simply embed a Google search box ;)

Secondly, if you want to improve caching for existing items but leave them in the database, consider using one of the PEAR cache libraries:
http://pear.php.net/packages.php?catpid=3&catname=Caching

Thirdly... and possibly the most interesting... dump your posts, threads and forums data to XML files, embed at the top of these the path of an applicable XSLT file to render them, and offer an XPATH way to search, sort and browse them.

That is technically possible and feasible, though it will depend on your skills with XML, XSLT and XPATH as to whether you can make that work. I tell you one thing though... it would be a hell of a thing to see and would allow you to change the presentation of it over time in a way that static HTML would not.

The XSLT transformation gets shifted onto the browser btw... server side transformation would be intensive to say the least.

Helpful starters there:
A SQL 2 XML package:
http://pear.php.net/package/XML_sql2xml

XML Transformer package:
http://pear.php.net/package/XML_Transformer

XML XPath Queries package:
http://pear.php.net/package/XML_XPath