Notes on SQLite

I think SQLite a perfect database engine for projects that have a limited user scope. For instance, if you have an app on a phone, or you’re managing data for some statistical analysis, or you’re prototyping an app. It is not a good choice if you need lots of concurrency, two or more people need to write data to the database at once. SQLite is very fast if you use it properly

In Ubuntu you use SQLite3 to use the SQLite command line interface.

sudo apt-get install sqlite3

The package R package for SQLite frequently when I have a large set of complex data that I’m working with. However, I found that SQLite databases can frequently be easily corrupted. Here are some notes when using SQLite to get the best out of it.

Use indexes

I think the first problem you may encounter when trying to pull data out of a database with millions of rows is that it will feel slow. I have found that it sometimes is faster to build an index then run the search. Regardless, you should always be using indices on your tables.

create index if not exists indexname 
on tablename (columnname1, columnname2);

Although, if all you are doing is inserting data, and you have no interest in fetching data from a table. You could wait to build the index until you are done with your ingestion phase. Otherwise it updates the index every insert.

Create backups

A good way to make backups of the database is to use the .dump command and then restore it later. Make sure you aren’t making any writes to the databse when this backup is being performed.

sqlite3 databasename.db .dump > databasename.bak
sqlite3 restoreddatabase < databasename.bak
Use Transactions

I’m actually not sure that data retrieval is benefitted from transactions at all (I don’t know how they would). But data insertions certainly are. A transaction is a great way to prevent a database from being corrupted. As a practice now I do my inserts as a function. I open the database, begin a transaction, do a bulk insert, then commit, and disconnect from the database. I feel like this practice has led to far fewer corruptions.

I used to just keep a connection open during my entire analysis session. The connection string was at the top of my script files, and the disconnect at the bottom. This could be a problem if there is an error, or R crashes, etc. By using transactions and only opening a connection when you need it you are buffered from the inevitble crashing and errors.

insertMyData <- function(d) {
	con <- dbConnect(SQLite(), 'dbname.db')
	dbGetPreparedQuery(con, 'insert into thetable (name, funstuff) values (:name, :funstuff)',
	          = d)

Related Posts

A Simple Network Analysis

Introduction to SNA in R: A simple network analysis

Storing R Objects in SQL Tables

Keep your analyses and prepared rdata objects indexed in a database.

Getting Network Data In and Out of R

Another part of Intro the SNA in R. Imporing and exporting data, cleaning and preparing it.

R and Networks

The resources and tools available to you once you start is vast. Let’s get a lay of the land

An Introduction to Network Analysis in R

Table of contents for my Introduction to Network Analysis in R series.

Using Jekyll

Installing and testing Jekyll

Pixyll in Action

See what the different elements looks like. Your markdown has never looked better. I promise.