20120711

Query and update Databases, CSV files, Excel spreadsheets, MongoDB, CouchDB and more in Java

Introduction to MetaModel 3

The other day we released version 3 of MetaModel, a project that I have thoroughly enjoyed working on lately. Let me share a bit of my enthusiasm and try to convince you that this is the greatest data access library there is for Java.

First let me also say, just to make it clear: MetaModel is NOT an ORM (Object-Relational Mapping) framework. MetaModel does not doing any mapping to your domain object model. Contrary, MetaModel is oriented towards working with the data model that already exists in your datastores (databases, files etc.) as it is physically represented. So the model of MetaModel is in deed a meta model, just as it is a metadata model - it works with the concepts of tables, columns, rows, schemas, relationships etc. But it is also oriented towards abstracting away all the cruft of having to deal with the physical interactions with each individual data storage technology. So unlike most ORMs, MetaModel allows you to work with arbitrary data models, stored in arbitrary technologies such as relational (JDBC) databases, text file formats, Excel spreadsheets, NoSQL databases (currently CouchDB and MongoDB) and more.

Here's an overview of the scope of MetaModel, depicted in our "module diagram".

It's important to be able to access, query and update all of these different datastore techonologies in the same way. This is basically the cross-platform story all over again. But the frontier of cross-platform is with MetaModel being moved to also entail datastore technologies, whereas it was previously mostly about the freedom of Operating System.
In addition it is important that this common data access abstraction is elegant, scalable and flexible. I hope to convince you of this with a few examples.

Code examples

Everything you do with MetaModel always starts with a DataContext object. A DataContext represents the basis for any operation with your datastore. Typically one would use the DataContextFactory to get an instance for the datastore of interest. We'll assume you are working on an Excel spreadsheet "people.xlsx":
DataContext dc = DataContextFactory.createExcelDataContext(new File("people.xlsx"));
Easy. Now let's explore the structure of this spreadsheet. We can do so either generically by traversing the graph of schemas, tables and columns - or by names if we already know what we are looking for:
// getting column by path
Column customerNameColumn = dc.getColumnByQualifiedLabel("customers.name"); 
// traversing all schemas, tables, columns
Schema schema = dc.getDefaultSchema();
Table[] tables = schema.getTables();
Column[] columns = tables[0].getColumns(); 
// step-wise getting specific elements based on names
Table customersTable = schema.getTableByName("customers");
Column customerBalanceColumn = customerTable.getColumnByName("balance");

Queries

Now let's fire some queries. This is where it gets interesting! Our approach builds upon basic knowledge of SQL, but without all the dialects and runtime differences. Technically we express queries in a completely type safe manner by using the traversed metadata objects above. But we can also put in String literals and more when it is convenient to get the job done.
// Simple query: Get all customer fields for customers with a credit balance greater than 10000.
DataSet ds = dc.query().from(customersTable).select(customersTable.getColumns())
    .where(customerBalanceColumn).greaterThan(10000).execute();
// Slightly more advanced query: Join customers with their associated sales representatives
// and group the result to count which sales reps have the most customers
Column salesRepId = customersTable.getColumnByName("sales_rep_id");
Column employeeId = dc.getColumnByQualifiedLabel("employees.id");
DataSet ds = dc.query()
    .from(customersTable).innerJoin("employees").on(salesRepId, employeeId)
    .selectCount().and("employees.name")
    .groupBy("employees.name").execute();

You can even grab the Query as an object and pass it on to methods and compositions which will eg. modify it for optimization or other purposes.

Updates and changes

The last missing piece of the puzzle is making changes to your data. We've spent a lot of time creating an API that is best suited for this task across all types of datastores. Since there's a big difference in how different datastores treat updates, and MetaModel tries to unify that, we wanted an API which clearly demarcates when you're doing an update, so that there is no doubt about transactional bounds, scope of a batch update and so on. This is why you need to provide all your updated in a closure-style object called an UpdateScript.
Let's do a series of updates.
// Batch #1: Create a table and insert a few records
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
        Table muppets = cb.createTable(schema, "muppets")
                .withColumn("name").ofType(VARCHAR)
                .withColumn("profession").ofType(VARCHAR).execute();
        cb.insertInto(muppets).value("name","Kermit the frog")
                .value("profession","TV host").execute();         cb.insertInto(muppets).value("name","Miss Piggy")                 .value("profession","Diva").execute();     } });
// Batch #2: Update and delete a record
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
        cb.update("muppets").value("profession","Theatre host")
               .where("name").equals("Kermit the frog").execute();
        cb.deleteFrom("muppets")
               .where("name").like("%Piggy").execute();
    }
});
// Batch #3: Drop the table
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
       cb.dropTable("muppets").execute();
    }
});
As you can see, using the UpdateScripts we've encapsulated each batch operation. If the datastore supports transactions, this is also the point of transactional control. If not, MetaModel will provide appropriate synchronization to avoid race conditions, so you can safely perform concurrent updates even on Excel spreadsheets and CSV files.

Wrapping up...

I am extremely happy working both as a developer of and a consumer of MetaModel. I hope you felt this blog/tutorial was a good kick-start and that you got excited by the library. Please give MetaModel a spin and share your thoughts and impressions.