I last wrote about football.db in a Jul 2014 blog post. Four years have gone by, and the World Cup is here again. This time around, I've started building a Glorp descriptor system for the World Cup data.
football.db's data is described in YAML files which are assembled into SQLite databases using tools written in Ruby. From the constructed 2014 and 2018 World Cup databases, I've created SQL dumps and placed them in the repo. To get an SQLite database from the 2018 SQL dump file:
In its current state, footballdb-Glorp allows querying the initial World Cup first round group membership.
This Glorp descriptor may not be completed in time for this World Cup which is starting in a few days, but will be in time for the next one for sure! :-) Load it thusly:
From its documentation, SQLite does not have a separate Boolean storage class. Instead, Boolean values are stored as integers 0 (false) and 1 (true).
I've added Boolean handling to UDBC-SQLite. When writing to an SQLite database from Pharo, true is written as integer 1 and false as integer 0. SQLite uses dynamic typing, and any column in an SQLite database, except an INTEGER PRIMARY KEY column, may be used to store a value of any type, irrespective of the column's type declaration. As such, when writing Boolean values to a database, UDBC-SQLite does not check the database's type declaration.
When reading an SQLite database, UDBC-SQLite does check a column's type declaration: If a column is Boolean, UDBC-SQLite reads 1 as true, 0 as false, NULL as nil, and any other integer values raises an exception. I've encountered real world data where the string "t" means true and "f" means false for a Boolean column, so UDBC-SQLite handles these cases too.
Glorp has been similarly updated. Loading GlorpSQLite, from my development fork for now, installs both UDBC-SQLite and Glorp:
All Glorp unit tests should pass. Tested on Linux using fresh 32- and 64-bit 60540 images.
An SQLite extension is built as a .so/dylib/dll shared library file. Let's use SQLite's rot13 extension as our example. The source file rot13.c is located in the SQLite source code's ext/misc directory. To build the rot13 extension, also download the amalgamation. Unzip the amalgamation and copy rot13.c into its directory. Build the extension:
Verify that the extension works:
For use with Pharo, copy rot13.so into the Pharo VM directory where all the other .so files are.
Next steps are done in Pharo. For the purpose of this blog post, I downloaded a fresh Pharo 60536 64-bit image. Start the image and install GlorpSQLite from the Catalog browser, which installs the latest UDBC-SQLite. (This also installs Glorp, of course. If you run Glorp's unit tests from Test Runner you should get all 890 tests passed.)
In a playground, run this snippet and Transcript should show, first, the text "no such function: rot13" and then the rot13 outputs.
Note the messages #enableExtensions and #loadExtension:. For security reasons, extension loading is disabled by default.
Tested on Linux latest Pharo 64-bit 60536.
I've updated ConfigurationOfGlorp for Pharo 6 and added catalog methods to ConfigurationOfGlorpSQLite.
Take a fresh Pharo 60365 image, the latest as of yesterday's download. Launch it, open the Catalog Browser, and install GlorpSQLite from there.
Run the Glorp tests in TestRunner. The result should be green, with all 889 tests passed and 12 tests skipped. The database file is sodbxtestu.db in your image directory.
(On Pharo 5, Glorp runs and passes total of 953 tests. Something to look into.)
I've created ConfigurationOfGlorpSQLite on STH.
Take a fresh Pharo 5 image. Make a script like the following and run it on the image:
When done, fire up the image again, and run the Glorp tests in TestRunner. The result should be green, with all 953 tests passed and 12 tests skipped. The database file is sodbxtestu.db in your image directory.
I'm pleased to announce the release of Glorp-SQLite3 for Pharo 5.
Developed and tested on Linux. Known working on Windows 7. Your Pharo 5 VM needs to be able to find libsqlite3.so or the Windows equivalent.
Take a fresh Pharo 5 image. Run the following:
Run the Glorp tests in TestRunner. All tests should pass, with 12 tests skipped. The database file is sodbxtestu.db in your image directory.
SQLite 3.9.0, released in Oct 2015, introduced the json1 extension that provides functions to manage JSON content stored in an SQLite database. The extension source code is included in the SQLite source amalgamation and is enabled by the -DSQLITE_ENABLE_JSON1 option.
The following assumes that the json1 functionality is available. As sample data, I use the stocks data set from jsonstudio.com. Here's what the wrapped first line of the JSON data file looks like partially:
To load the file into an SQLite database using NBSQLite3:
Note the invocation of the json() SQL function, which verifies that its argument is a valid JSON string and returns a minified version of that JSON string.
Let's inspect the created database with the sqlite3 shell tool:
The json_extract(X, P1, P2, ...) function extracts and returns one or more values from the JSON structure X:
The json_each() table-valued function walks the JSON value provided as its argument and returns one row for each element. The schema for the table returned by json_each() is as follows:
Here's json_each() in action:
Let's run the same query in Pharo:
Transcript displays thusly. As we can see, SQLite and NBSQLite3 handle numerical values in the JSON data properly.
Notice that the first row is well-formed JSON. If we use json_tree() instead of json_row(), that JSON structure will be returned as a key/value row as well.
Finally, let's read the first JSON record using NeoJSON:
This is powerful stuff. Well worth a look.
Dolphin Smalltalk 7 has been released as open source. Many thanks to Object Arts.
I spent an afternoon updating the mostly unused Windows partition on my laptop, installing various development tools, and playing with Dolphin 7. As a proof of concept, NBSQLite3 now runs on Dolphin 7. :-)
Happy new year!
SQLcipher "is an open source extension to SQLite that provides transparent 256-bit AES encryption of database files." SQLcipher provides the same API as the SQLite Encryption Extension (SEE) by D Richard Hipp, the original developer of SQLite.
I've added SQLcipher/SEE's API to NBSQLite3. By convention, on Unix, SQLcipher produces a shared library named libsqlcipher.so, while the SQLite shared library is named libsqlite3.so. NBSQLite3 switches between the two libraries based on the messages #beSQLcipher and #beSQLite to the NBSQLite3FFI class.
Here's a demonstration code snippet using the keying pragma in SQL:
NBSQLcipherExample class>>examplePragma contains a longer version of the above snippet that includes reopening the encrypted database file without the keying pragma and using the SQLite library.
Tested on Linux Mint. Code updated in Smalltalk Hub. Some refactoring to be expected, because the above snippet using the keying pragma is the only test I've done.
% openssl sha256 libsqlcipher.so SHA256(libsqlcipher.so)= 441cbc559a4f38a018121c6d86caa0cf0fb2c5b2a57c353cc09a4e048ec8ebe8 % ldd libsqlcipher.so linux-gate.so.1 => (0xf77da000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7569000) libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf754d000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf739e000) /lib/ld-linux.so.2 (0xf77dd000)
For good measure, I've also put up a copy of sqlcipher built at the same time. It requires readline.
% openssl sha256 sqlcipher SHA256(sqlcipher)= 4ccb3cf2064d41675406a55c8404a8877a40541dd9830009f4c0e203468e3d7b % ldd sqlcipher linux-gate.so.1 => (0xf770a000) libreadline.so.6 => /lib/i386-linux-gnu/libreadline.so.6 (0xf76a7000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf76a2000) libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7685000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf74d7000) libtinfo.so.5 => /lib/i386-linux-gnu/libtinfo.so.5 (0xf74b5000) /lib/ld-linux.so.2 (0xf770d000)
A couple of links to set the scene, one from 2004, the other from 2005. Their titles relate to this post, although their content doesn't really.
While Pharo is an immersive interactive programming environment, it can also be used for operating system command line-oriented scripting. Take a freshly downloaded standard Pharo 4 image, here renamed p4cli.image:
Let's see how to use NBSQLite3's online backup functionality to back up a live SQLite database through a Pharo script.
% pharo -vm-display-null p4cli.image --list Currently installed Command Line Handlers: Fuel Loads fuel files config Install and inspect Metacello Configurations from the command line save Rename the image and changes file update Load updates printVersion Print image version st Loads and executes .st source files test A command line test runner clean Run image cleanup eval Directly evaluates passed in one line scripts
Firstly, install the current bleedingEdge version of NBSQLite3.
Now make a script of the backup code snippet seen in my previous blog post, changing the database file names. In this case, I am using the script to make a copy of my server's firewall log database, which is live and constantly updated by the logging daemon. The script is named 'sqlitebackup.st'.
% pharo -vm-display-null p4cli.image config \ http://www.smalltalkhub.com/mc/Pharo/MetaRepoForPharo40/main \ ConfigurationOfNBSQLite3 --install=bleedingEdge 'Installing ConfigurationOfNBSQLite3 bleedingEdge' Loading 1.2 of ConfigurationOfNBSQLite3... Fetched -> NBSQLite3-Core-PierceNg.7 --- ... ... ... ...finished 1.2% %
Run the script under Unix "time":
% time pharo -vm-display-null p4cli.image st --quit sqlitebackup.st pharo -vm-display-null p4cli.image st --quit sqlitebackup.st 0.26s user 0.10s system 53% cpu 0.675 total % ls -l /var/log/ulog/ulogd.db* /tmp/*.db -rw-r--r-- 1 pierce pierce 11636736 May 17 20:49 /tmp/ulogd-backup.db -rw-r--r-- 1 ulog ulog 11643904 May 17 20:50 /var/log/ulog/ulogd.db
The database files aren't identical, because the logging daemon has written yet more data since the backup was taken.
SQLite provides an online backup API that operates incrementally, so that the source database file is locked only when it is actually read from, allowing other applications to use the source database while the backup is in progress.
I've added the backup API to NBSQLite3. Here's a demonstration code snippet using OpenFootball's World Cup 2014 database:
Let's look at the two databases:
% pharo -vm-display-null p4cli.image config \ % ls -l *.db -rw-r--r-- 1 pierce pierce 1101824 May 16 10:58 backup.db -rw-r--r-- 1 pierce pierce 1101824 Jul 22 2014 wc2014.db % openssl md5 backup.db wc2014.db MD5(backup.db)= 408494059b67fc3c7d39b672ce7e525e MD5(wc2014.db)= 28ab0865e87920ca2f2b1f0de710c622
Ok, the two files have the same size, but aren't bit-wise identical. Now try sqldiff that comes with SQLite:
% pharo -vm-display-null p4cli.image config \ % sqldiff --help Usage: sqldiff [options] DB1 DB2 Output SQL text that would transform DB1 into DB2. % sqldiff wc2014.db backup.db %
No output, meaning the two files are identical structurally and content-wise, according to sqldiff. Let's make sure:
The output shows that sqldiff and the SQLite library are in cahoots to make two database files appear identical even when they are not. Haha, no, I mean, the output gives us some assurance that the backup API operated as advertised and created a proper backup of the database.
% sqldiff wc2014.db backup.db DELETE FROM goals WHERE id=169; DELETE FROM goals WHERE id=170; DELETE FROM goals WHERE id=171;
Now, the demonstration code snippet is somewhat contrived, because it backed up a totally quiescent source database. Next step is to verify backing up a database in active use.
Code updated in Smalltalk Hub. Unit tests and ConfigurationOf update to come.
SQLite does not support date/time as a built-in data type. Instead, SQLite provides date and time functions to deal with ISO 8601-format strings, floating point values representing Julian days, and Unix epoch time integers.
I've added date/time handling to NBSQLite3, so that storing a DateAndTime object results in an ISO 8601 string in the database, and reading such a string from the database, provided the column type is declared as DateTime, yields a DateAndTime object.
Here's an example using the low-level API:
And here's the example using the high-level API:
I've integrated NBSQLite3 into Glorp on the current Pharo 4.0 v40592 beta image.
Firstly, install NBSQLite3 (TorstenBergmann.7) and then Glorp (TorstenBergmann.42) from the config browser.
Then, using the Monticello browser, open the NBSQLite3 repository and load the packages NBSQLite3-Glorp and NBSQLite3-Test-Glorp. Next, open the Glorp repository and load the packages Glorp-PierceNg.97.mcz and GlorpTests-PierceNg.44.mcz.
In a workspace/playground, execute the following:
Open Test Runner and run the Glorp tests.
Tested on Linux Mint 17 and OSX Mavericks. 2 fewer tests passed on Linux.
Curiously, #testLargeBlob, #testLargeClob and #testLargeText passed on the Pharo 3 image that I wrote this code on.
The database file created by the tests is sodbxtest.db.
Made really good progress with NBSQLite3 for Glorp.
On the failed tests:
#testReadTimestamp - SQLite3 does not support date/time/timestamp types natively. More investigation needed. #testDateCast - The test itself says, "This test will fail on SQLite which has no DB Date type; see test comment." #testDeleteAndReinsertTwiceWhenRemovingFromAnExclusiveCollection - This test is expected to result in a primary key conflict. Without studying the code, my conjecture is that the test's failing has to do with SQLite's primary key handling. #testInt8 - Integer overflow on 32-bit SQLite, I think. #testPreparedStatementsAreReused - PostgresV2 doesn't support prepared statements. NBSQLite3 does. Mayhaps this is why? More investigation needed. #testUpdatingComment - Test passes on OSX Mavericks but fails on Linux Mint 17. No idea why. #testUpdatingOtherThing - Test passes on OSX Mavericks but fails on Linux Mint 17. No idea why.
The entire GlorpOptimisticLockingTest and GlorpTimestampTest suites are skipped, because some of the tests fail, and foobars the Pharo-SQLite interface, causing many subsequent tests to fail, requiring restart of the Pharo image. Still need to look into these.
Making good progress with NBSQLite3 for Glorp.
Sven van Caekenberghe has written a very nice tutorial implementing a Reddit clone in Pharo using Seaside, Glorp and PostgreSQL. Sven also makes available a prebuilt image containing the application.
Seeing that the image contains Glorp working with the PostgresV2 driver, I set about integrating NBSQLite3 with Glorp. After about an afternoon's work, I now have Reddit.st working with Glorp+NBSQLite3.
There is still much to be done to get Glorp fully working with NBSQLite3: Some tests apparently expect Glorp proxies, but are getting OrderedCollections, and one particular test rendered my Linux X session non-responsive to mouse and keyboard, except for screen brightness key chords!
NBSQLite3 now does callback from C to Smalltalk. As a start, the tracing API, which "is invoked at various times when an SQL statement is being run".
The Transcript output looks thusly:
Test incorporated and passes on OSX Mountain Lion and Linux Mint 17 for Pharo 3. Get the source on SS3.
I've been playing with the OpenFootball data using ROE. ROE provides RAArrayRelation, which shares a common superclass with RASQLRelation. As its name implies, RAArrayRelation uses an in-memory array object and does not involve any underlying SQL database.
In my OpenFootball data crunching, I tried JOINing instances of both. When the left side of the JOIN is the RASQLiteRelation instance, the operation threw an SQLite exception "no such table" because ROE expected the right side to also be an RASQLiteRelation on the same underlying database. When the left side is the RAArrayRelation instance, the operation sends #tuples to the right side, which resulted in a DNU. I've tried a few implementations of RASQLiteRelation>>tuples, but so far no success. Even if I get a working implementation, the implication of #tuples is that all matching rows from the SQLite database are loaded into memory, which is not always performant or feasible.
The reason for using an RAArrayRelation is to create a new relation unlike any existing one. The alternative is to create an RASQLiteRelation instance on a new, empty table. However, ROE does not support creating tables in the underlying database. Of course, I can use the given database connection to CREATE TABLE via direct SQL, but that doesn't feel satisfying.
I've committed my integration of ROE with NBSQLite3 to SmalltalkHub.
Tested on Pharo 3 on OSX Mountain Lion and Linux Mint 17 based on Ubuntu 14.04. All 23 tests of RATestSQLiteSemantics pass.
football.db is a "free open public domain football (soccer) database". Instructions to construct an SQLite data file from the data are on the website. The database consists of 40 tables; its schema looks normalized. With the World Cup 2014 and dependent data, the file weighs in at ~1MB.
Let's see what we can do with the data using ROE.
Each league/tournament is given an event ID; the World Cup's event_id is 8.
The above code snippet results in a RAProjection instance. We can see its content by the following (pretty-printed by hand):
(Ivory Coast's proper name in the output doesn't look right in Pharo. The SQLite database is correct though.)
I've started a page on NBSQLite3. Comments welcome.
ROE is "a library for manipulating relational queries as first class Smalltalk expressions, and generating SQL from them."
Not bad for an afternoon's hacking, now ROE works with NBSQLite3 and all tests pass.
(Yes, I notice these screenshots are of rather poor quality.)
I like SQLite. SQLite is small, fast and reliable. As mentioned in this old blog post, I was extending the Squeaksource SQLite FFI wrapper to support parameter binding. Since then, Torsten Bergmann has also picked it up and now tends to it on SmalltalkHub.
Taking the chance to learn NativeBoost, with hints from Masashi Umezawa's PunQLite, I took my SQLite FFI changes and created NBSQLite3. NBSQLite3's small test suite runs successfully on Pharo 2.0 (OSX, Linux) and the current Pharo 3.0 beta on OSX.
Here's a multithreading example:
To run this example, evaluate "NBSQLite3Example multithreading". The output looks thusly:
The SQLite shared library is compiled with -DSQLITE_USE_URI=1 and -DSQLITE_THREADSAFE=2.
In the previous entry, I wrote about parsing the StackOverflow XML dump files.
This entry is about setting up SQLite full text search for the StackOverflow posts data. In Smalltalk, set up the database schema:
The key SAX parsing callback method becomes, in essence, as follows:
On my computer, running through the 6+ million posts took about an hour. The resultant SQLite database file is about 6.5GB in size. Here are the comparative file sizes:
Next, let's set up full text search on this database:
I didn't time how long the INSERTs took. Setting up full text search boosted the database's file size:
Now search the database using the sqlite3 shell tool:
% sqlite3 so.db sqlite> select count(*) from stags where tags match 'smalltalk'; 332 sqlite> select max(id) from stags where tags match 'smalltalk'; 7260027 sqlite> select body, tags from posts where id = 7260027; <p>what's the best server side programming language for web development if we keep very good performance and a lot of database interaction of the script with server in mind. i am already working on a project with php, i'm fearing that my application might have a lot of traffic, i just want to use only the server side language which is best in terms of performance, and please also guide me any major step i should take to prevent my application from crashing when it will face a huge traffic, what major step i might need to take to stop myself crying after launching my applicaition, i heard that twitter use ruby, how's ruby's performance? my concern is only and only performance and database interaction, thanks in advance.</p> |<php><python><ruby-on-rails><ruby><smalltalk> sqlite>
I like SQLite. SQLite is small, fast and reliable. There is an SQLite FFI wrapper on Squeaksource which is very simple to use. Let's start with this database:
Run the following snippet in a workspace:
The output is shown in the Transcript thus:
The underlying SQLite API used by withResultForQuery:do: is sqlite3_get_table, which is "a legacy interface preserved for backwards compatibility". Using this API, at the C level, NULL values are NULL pointers, and all other values are in their UTF-8 zero-terminated string representation. Not very interesting at the Smalltalk level, either, as seen from the Transcript.
SQLite also supports "prepared statements", aka "parameter binding". A prepared statement is compiled once (using sqlite3_prepare, into an SQLite VDBE virtual machine program), then executed multiple times (using sqlite3_step). Parameters are bound using sqlite3_bind_XXX calls, and values are retrieved using sqlite3_column_XXX calls. Parameter binding defends against SQL injection attacks.
I'm extending the SQLite FFI wrapper to support some of the parameter binding APIs. (Well, until enough to do my stuff.) Here's an example code snippet:
The Transcript displays the following: