Storing R Objects in SQL Tables


I am running network simulations. But at this time I’m not certain what sorts of analyses should be run on the resulting networks. So I want to store the network objects for later anlysis. I could store each of the networks as .rda files with the save command, but I feel like that would result in a folder full of thousands, or even millions of files. That’s unseamly.

It’s not even about speed for me, it’s about cleanly managing the data. Plus I can store different attributes and information about the simulations (the size, the time it took, different parameters for its creation, etc). Then I can search and recall the results according to different parameters. It would be perfect.

Let’s say this is the object I want to store. It’s a list that contains graph objects resulting from a Watts-Strogatz model. I could vary the different parameters, but this is just a proof of concept for now. I want to store each graph as an entry in a SQLite table

gs <- list()
for(i in 1:10)
  gs[[i]] <- watts.strogatz.game(1, 100, 5, 0.05)

Blobs

So SQLite is the database of choice here. It stores as a single file that I can keep in a Dropbox folder. I like that. And SQLite has a datatype called a BLOB which stores a blob of byte data exact how it was input.

Here I create the table in the database. I only have an _id variable for indexing, but I could add other columns that refer to the size of the netork, or the different parameters of the WS model.

dbGetQuery(con, 'create table if not exists graphs 
                 (_id integer primary key autoincrement, 
                  graph blob)')

Serialize

I referred to the unit tests for RSQLite to see how to do a blob insert. We need the R-object condensed into a single item we can insert into a database. There are apparently many ways to accomplish this. One way is dump which outputs the structure of an R object. It usually dumps to a file that can be sourced, but I believe you can export it to a character string. I attempt to export it to a character string, without luck. It also spawned warnings about inadequate deparse or something.

But then I learned of serialize which will do exactly what I want: convert an R object to a vector of raw bytes. This line here converts the list into a data.frame with a column in which each row is a raw vector of the graph object. Then the I function forces the data.frame to store the whole vector as an entry in the data.frame.

df <- data.frame(g = I(lapply(gs, function(x) { serialize(x, NULL)})))

# And insert it
dbGetPreparedQuery(con, 'insert into graphs (graph) values (:g)', bind.data=df)

Retrieve the result

Now we can select the data out of the database and unserialize it. It’s pretty simple.

df2 <- dbGetQuery(con, "select * from graphs")
gs2 <- lapply(df2$graph, 'unserialize')

And now the compulsory network image:

g <- gs2[[1]]
V(g)$size <- log(betweenness(g)) + 1
V(g)$color <- "#66c2a4"
V(g)$frame.color <- "#238b45"
plot(g, vertex.label = NA)

Sqlite blob test.

The Full Code

Related Posts

A Simple Network Analysis

Introduction to SNA in R: A simple network analysis

Getting Network Data In and Out of R

Another part of Intro the SNA in R. Imporing and exporting data, cleaning and preparing it.

R and Networks

The resources and tools available to you once you start is vast. Let’s get a lay of the land

An Introduction to Network Analysis in R

Table of contents for my Introduction to Network Analysis in R series.

Notes on SQLite

Some notes and usage of SQLite and RSQLite

Using Jekyll

Installing and testing Jekyll

Pixyll in Action

See what the different elements looks like. Your markdown has never looked better. I promise.