From its documentation, SQLite does
not have a separate Boolean storage class. Instead, Boolean values are
stored as integers 0 (false) and 1 (true).
I've added Boolean handling to UDBC-SQLite. When writing to an SQLite
database from Pharo, true is written as integer 1 and false as integer 0.
SQLite uses dynamic typing, and any column in an SQLite database, except an
INTEGER PRIMARY KEY column, may be used to store a value of any type,
irrespective of the column's type declaration. As such, when writing
Boolean values to a database, UDBC-SQLite does not check the database's
When reading an SQLite database, UDBC-SQLite does check a column's type
declaration: If a column is Boolean, UDBC-SQLite reads 1 as true, 0 as
false, NULL as nil, and any other integer values raises an exception. I've
encountered real world data where the string "t" means true and "f" means
false for a Boolean column, so UDBC-SQLite handles these cases too.
Glorp has been similarly updated. Loading GlorpSQLite, from my
development fork for now, installs both UDBC-SQLite and Glorp:
All Glorp unit tests should pass. Tested on Linux using fresh 32- and 64-bit
An SQLite extension is built as a .so/dylib/dll shared library file. Let's
use SQLite's rot13 extension as our example. The source file rot13.c is
located in the
SQLite source code's ext/misc directory.
To build the rot13 extension, also download the
amalgamation. Unzip the
amalgamation and copy rot13.c into its directory. Build the extension:
Verify that the extension works:
For use with Pharo, copy rot13.so into the Pharo VM directory where all the
other .so files are.
Next steps are done in Pharo. For the purpose of this blog post, I
downloaded a fresh Pharo 60536 64-bit image. Start the image and install
GlorpSQLite from the Catalog browser, which installs the latest
UDBC-SQLite. (This also installs Glorp, of course. If you run Glorp's unit
tests from Test Runner you should get all 890 tests passed.)
In a playground, run this snippet and Transcript should show, first, the
text "no such function: rot13" and then the rot13 outputs.
Note the messages #enableExtensions and #loadExtension:. For security
reasons, extension loading is disabled by default.
The Hello World intro to machine learning is usually by way of the Iris
flower image classification or the MNIST handwritten digit recognition. In
this post, I describe training a neural network in Pharo to perform
handwritten digit recognition. Instead of the
MNIST dataset, I'll use the smaller
dataset. According to the
this dataset consists of two files:
optdigits.tra, 3823 records
optdigits.tes, 1797 records
Each record consists of 64 inputs and 1 class attributes.
Input attributes are integers in the range 0..16.
The class attribute is the class code 0..9, i.e., it denotes the digit that the 64 input attributes encode.
The files are in CSV format. Let's use the excellent NeoCSV package to
read the data:
Next, install MLNeuralNetwork by Oleksandr Zaytsev:
MLNeuralNetwork operates on MLDataset instances, so modify the CSV reader
Note MLMnistReader>>onehot: which creates a 'one-hot' vector for each
digit. One-hot vectors make machine learning more effective. They are easy
to understand "pictorially":
For the digit 0, [1 0 0 0 0 0 0 0 0 0].
For the digit 1, [0 1 0 0 0 0 0 0 0 0].
For the digit 3, [0 0 0 1 0 0 0 0 0 0].
For the digit 9, [0 0 0 0 0 0 0 0 0 1].
Since there are over 5,000 records, we precompute the one-hot vectors and
reuse them, instead of creating one vector per record.
Now create a 3-layer neural network of 64 input, 96 hidden, and 10 output
neurons, set it to learn from the training data for 500 epochs, and test
From the inspector, we can see that the network got records 3, 6 and 20
In the inspector's code pane, the following snippet shows that the
network's accuracy is about 92%.
Not bad for a first attempt. However, the data set's source states that
the K-Nearest Neighbours algorithm achieved up to 98% accuracy on the
testing set, so there's plenty of room for improvement for the network.
Here's a screenshot showing some of the 8x8 digits with their predicted and
actual values. I don't know about you, but the top right "digit" looks
more like a smudge to me than any number.