SQLite and JSON

24 Apr 2016

SQLite 3.9.0, released in Oct 2015, introduced the json1 extension that provides functions to manage JSON content stored in an SQLite database. The extension source code is included in the SQLite source amalgamation and is enabled by the -DSQLITE_ENABLE_JSON1 option.

The following assumes that the json1 functionality is available. As sample data, I use the stocks data set from jsonstudio.com. Here's what the wrapped first line of the JSON data file looks like partially:

{ "_id" : { "$oid" : "52853800bb1177ca391c17ff" }, "Ticker" : "A", "Profit
Margin" : 0.137, "Institutional Ownership" : 0.847, "EPS growth past 5
years" : 0.158, "Total Debt/Equity" : 0.5600000000000001, "Current Ratio" :
3, "Return on Assets" : 0.089, "Sector" : "Healthcare", "P/S" : 2.54,
"Change from Open" : -0.0148, "Performance (YTD)" : 0.2605, "Performance
(Week)" : 0.0031, "Quick Ratio" : 2.3, "Insider Transactions" : -0.1352,

To load the file into an SQLite database using NBSQLite3:

| jr db |
jr := (FileSystem / 'tmp' / 'stocks.json') asFileReference readStream.
jr ascii.
db := NBSQLite3Connection openOn: '/tmp/s.db'.
[   db execute: 'create table s (d json)'.
    db execute: 'begin'.
    [ jr atEnd ] whileFalse: [ 
        db execute: 'insert into s values (json(?))' "<== Note."
            with: (Array with: jr nextLine) ] 
    db execute: 'commit'.
] ensure: [ db close ]
jr close.

Note the invocation of the json() SQL function, which verifies that its argument is a valid JSON string and returns a minified version of that JSON string.

Let's inspect the created database with the sqlite3 shell tool:

$ wc -l stocks.json
6756 stocks.json
$ sqlite3 s.db
SQLite version 3.9.2 2015-11-02 18:31:45
Enter ".help" for usage hints.
sqlite> select count(*) from s;
6756
sqlite> .schema
CREATE TABLE s (d json);
sqlite> 

The json_extract(X, P1, P2, ...) function extracts and returns one or more values from the JSON structure X:

sqlite> select json_extract(d, '$.Sector', '$.Company') 
   ...> from s 
   ...> where json_extract(d, '$.Ticker') = 'AAPL';
["Consumer Goods","Apple Inc."]
sqlite> 

The json_each() table-valued function walks the JSON value provided as its argument and returns one row for each element. The schema for the table returned by json_each() is as follows:

CREATE TABLE json_tree(
    key ANY,             -- key for current element relative to its parent
    value ANY,           -- value for the current element
    type TEXT,           -- 'object','array','string','integer', etc.
    atom ANY,            -- value for primitive types, null for array & object
    id INTEGER           -- integer ID for this element
    parent INTEGER,      -- integer ID for the parent of this element
    fullkey TEXT,        -- full path describing the current element
    path TEXT,           -- path to the container of the current row
    json JSON HIDDEN,    -- 1st input parameter: the raw JSON
    root TEXT HIDDEN     -- 2nd input parameter: the PATH at which to start
);

Here's json_each() in action:

sqlite> select key, value 
   ...> from s, json_each(s.d) 
   ...> where json_extract(d, '$.Ticker') = 'AAPL'
   ...> limit 10;
_id|{"$oid":"52853800bb1177ca391c1809"}
Ticker|AAPL
Profit Margin|0.217
Institutional Ownership|0.621
EPS growth past 5 years|0.424
Total Debt/Equity|0.14
Current Ratio|1.7
Return on Assets|0.186
Sector|Consumer Goods
P/S|2.75
sqlite> 

Let's run the same query in Pharo:

| q db v |
q := 'select key, value from s, json_each(s.d) where json_extract(d, "$.Ticker") = "AAPL" limit 10'.
db := NBSQLite3Connection openOn: '/tmp/s.db'.
[   (db execute: q) rows do: [ :r |
        Transcript show: (r at: 'key') asString, ' = '.
        v := r at: 'value'.
        Transcript show: v asString, ' (', v class asString, ')'; cr ]
] ensure: [ db close ]

Transcript displays thusly. As we can see, SQLite and NBSQLite3 handle numerical values in the JSON data properly.

_id = {"$oid":"52853800bb1177ca391c1809"} (ByteString)
Ticker = AAPL (ByteString)
Profit Margin = 0.217 (Float)
Institutional Ownership = 0.621 (Float)
EPS growth past 5 years = 0.424 (Float)
Total Debt/Equity = 0.14 (Float)
Current Ratio = 1.7 (Float)
Return on Assets = 0.186 (Float)
Sector = Consumer Goods (ByteString)
P/S = 2.75 (Float)

Notice that the first row is well-formed JSON. If we use json_tree() instead of json_row(), that JSON structure will be returned as a key/value row as well.

Finally, let's read the first JSON record using NeoJSON:

| q db  |
q := 'select json_extract(d, "$") as data from s limit 1'.
db := NBSQLite3Connection openOn: '/tmp/s.db'.
[   (NeoJSONReader on: ((db execute: q) rows first at: 'data') readStream)
        next explore
] ensure: [ db close ].

This is powerful stuff. Well worth a look.

Tags: NoSQL, SQLite

UnQLite Jx9 Part II

10 Jul 2015

Masashi Umezawa graciously granted me write-access to the PunQLite wrapper repository. (Thanks, Masashi!) I've now added functionality to set Jx9 variables from Smalltalk.

Here's an example. The variable $inject is to be set from Smalltalk.

if (db_exists("users")) {
  db_drop_collection("users");
}
db_create("users");
db_store("users", $inject); 
$extract = db_fetch("users");

Here's the Smalltalk code:

| j db inject extract |

j := '...'. "Above Jx9 program."

db := PqDatabase openOnMemory.
[   db jx9Do: [ :exe |
        exe compile: j.
        inject := exe newStructure: 'inject'.
        inject at: 'name' put: 'Jim'.
        inject at: 'age' put: 29.
        exe inject: inject.
        exe release: inject.
        exe execute.
        extract := exe @ 'extract'.
        Transcript show: extract isJsonObject asString; cr.
        exe release: extract ]
] ensure: [ db close ]

Transcript should show "true". Round-trip set/get tests between Pharo and the Python wrapper by Charles Leifer were successful.

I added an assert-filled version of the above code as a unit test. Running the test by itself succeeds. Running it from Test Runner fails...

I won't go figure, though. I don't intend to spend any more time on Jx9. Using Jx9 with Smalltalk effectively requires managing Jx9 source as strings within Pharo. I might as well stick with SQLite and SQL strings in that case. If I must have key-value stores I can always use 2-column tables.

The Python wrapper carries this note: "The authors of UnQLite, Symisc, have informed me that UnQLite is no longer being developed actively."

Tags: NoSQL

UnQLite Jx9

29 Jun 2015

I'm playing with Masashi Umezawa's PunQLite wrapper for unQLite. UnQLite provides both a key-value store and, interestingly, a JSON document store based on an embedded programming language named Jx9.

Here's an example Jx9 program:

if (db_exists("students")) {
  db_drop_collection("students");
}
db_create("students");
$s = { name: "james", age: 26 };
db_store("students", $s); 
$sid = $s.__id;

And here's Smalltalk code to execute the above Jx9 program:

| j db |

j := '...'. "Above Jx9 program."

db := PqDatabase openOnMemory.
[   db jx9Do: [ :exe |
        exe compile: j.
        exe execute.
        Transcript show: exe @ 'sid' asString; cr ]
] ensure: [ db close ]

PunQLite supports extracting values from an executed Jx9 program, as in "exe @ 'sid'" above. PunQLite currently does not support UnQLite's APIs for exchanging JSON objects between Smalltalk and an Jx9 program which provide functionality similar to parameter binding in SQL APIs. That means Jx9 programs, including data to be stored, has to be constructed from strings... As OWASP says about NoSQL injection:

Because these NoSQL injection attacks may execute within a procedural
language, rather than in the declarative SQL language, the potential
impacts are greater than traditional SQL injection. 
Tags: NoSQL, security

Seven Databases for Pharo and Squeak

07 Dec 2013

I'm reading the book Seven Databases in Seven Weeks by Eric Redmond and Jim Wilson. The databases covered are PostgreSQL, Riak, HBase, MongoDB, CouchDB, Neo4J and Redis.

There are several Pharo/Squeak libraries for PostgreSQL:

Riak:

HBase runs on JVM. It supports a RESTful HTTP API, Thrift, and a Java API. The first is probably the easiest way to write a Smalltalk interface.

MongoDB:

  • Estaban Lorenzano's Voyage is an object persistence layer that works with MongoDB.
  • MongoTalk, by Kent Beck and others.

CouchDB:

Neo4J is a graph database. It is provides a RESTful API. I've not played with Neo4J, but I'd imagine the Smalltalk environment, and by extension any Smalltalk object persistence mechanism, make up a graph database. Probably speaking from ignorance here, but I'm not sure what interest a Smalltalk programmer will have in a graph database written in Java. :-)

Finally, there is Redis Client by Mike Hale and others.

I haven't finished the book, but so far I haven't seen any discussion on authentication or security of these HTTP-speaking NoSQL databases. If the database is lacking authentication or SSL, and if your threat model covers that, probably the easiest is to put these behind a proxy. And, for database and other such connectivity from the Smalltalk client, I suggest SpsSplitPasswordStore.

Tags: NoSQL, OODB, PostgreSQL