In the previous entry, I wrote about parsing the StackOverflow XML dump files.
This entry is about setting up SQLite full text search for the StackOverflow posts data. In Smalltalk, set up the database schema:
The key SAX parsing callback method becomes, in essence, as follows:
On my computer, running through the 6+ million posts took about an hour. The resultant SQLite database file is about 6.5GB in size. Here are the comparative file sizes:
Next, let's set up full text search on this database:
I didn't time how long the INSERTs took. Setting up full text search boosted the database's file size:
Now search the database using the sqlite3 shell tool:
% sqlite3 so.db sqlite> select count(*) from stags where tags match 'smalltalk'; 332 sqlite> select max(id) from stags where tags match 'smalltalk'; 7260027 sqlite> select body, tags from posts where id = 7260027; <p>what's the best server side programming language for web development if we keep very good performance and a lot of database interaction of the script with server in mind. i am already working on a project with php, i'm fearing that my application might have a lot of traffic, i just want to use only the server side language which is best in terms of performance, and please also guide me any major step i should take to prevent my application from crashing when it will face a huge traffic, what major step i might need to take to stop myself crying after launching my applicaition, i heard that twitter use ruby, how's ruby's performance? my concern is only and only performance and database interaction, thanks in advance.</p> |<php><python><ruby-on-rails><ruby><smalltalk> sqlite>