20120711

Query and update Databases, CSV files, Excel spreadsheets, MongoDB, CouchDB and more in Java

Introduction to MetaModel 3

The other day we released version 3 of MetaModel, a project that I have thoroughly enjoyed working on lately. Let me share a bit of my enthusiasm and try to convince you that this is the greatest data access library there is for Java.

First let me also say, just to make it clear: MetaModel is NOT an ORM (Object-Relational Mapping) framework. MetaModel does not doing any mapping to your domain object model. Contrary, MetaModel is oriented towards working with the data model that already exists in your datastores (databases, files etc.) as it is physically represented. So the model of MetaModel is in deed a meta model, just as it is a metadata model - it works with the concepts of tables, columns, rows, schemas, relationships etc. But it is also oriented towards abstracting away all the cruft of having to deal with the physical interactions with each individual data storage technology. So unlike most ORMs, MetaModel allows you to work with arbitrary data models, stored in arbitrary technologies such as relational (JDBC) databases, text file formats, Excel spreadsheets, NoSQL databases (currently CouchDB and MongoDB) and more.

Here's an overview of the scope of MetaModel, depicted in our "module diagram".

It's important to be able to access, query and update all of these different datastore techonologies in the same way. This is basically the cross-platform story all over again. But the frontier of cross-platform is with MetaModel being moved to also entail datastore technologies, whereas it was previously mostly about the freedom of Operating System.
In addition it is important that this common data access abstraction is elegant, scalable and flexible. I hope to convince you of this with a few examples.

Code examples

Everything you do with MetaModel always starts with a DataContext object. A DataContext represents the basis for any operation with your datastore. Typically one would use the DataContextFactory to get an instance for the datastore of interest. We'll assume you are working on an Excel spreadsheet "people.xlsx":
DataContext dc = DataContextFactory.createExcelDataContext(new File("people.xlsx"));
Easy. Now let's explore the structure of this spreadsheet. We can do so either generically by traversing the graph of schemas, tables and columns - or by names if we already know what we are looking for:
// getting column by path
Column customerNameColumn = dc.getColumnByQualifiedLabel("customers.name"); 
// traversing all schemas, tables, columns
Schema schema = dc.getDefaultSchema();
Table[] tables = schema.getTables();
Column[] columns = tables[0].getColumns(); 
// step-wise getting specific elements based on names
Table customersTable = schema.getTableByName("customers");
Column customerBalanceColumn = customerTable.getColumnByName("balance");

Queries

Now let's fire some queries. This is where it gets interesting! Our approach builds upon basic knowledge of SQL, but without all the dialects and runtime differences. Technically we express queries in a completely type safe manner by using the traversed metadata objects above. But we can also put in String literals and more when it is convenient to get the job done.
// Simple query: Get all customer fields for customers with a credit balance greater than 10000.
DataSet ds = dc.query().from(customersTable).select(customersTable.getColumns())
    .where(customerBalanceColumn).greaterThan(10000).execute();
// Slightly more advanced query: Join customers with their associated sales representatives
// and group the result to count which sales reps have the most customers
Column salesRepId = customersTable.getColumnByName("sales_rep_id");
Column employeeId = dc.getColumnByQualifiedLabel("employees.id");
DataSet ds = dc.query()
    .from(customersTable).innerJoin("employees").on(salesRepId, employeeId)
    .selectCount().and("employees.name")
    .groupBy("employees.name").execute();

You can even grab the Query as an object and pass it on to methods and compositions which will eg. modify it for optimization or other purposes.

Updates and changes

The last missing piece of the puzzle is making changes to your data. We've spent a lot of time creating an API that is best suited for this task across all types of datastores. Since there's a big difference in how different datastores treat updates, and MetaModel tries to unify that, we wanted an API which clearly demarcates when you're doing an update, so that there is no doubt about transactional bounds, scope of a batch update and so on. This is why you need to provide all your updated in a closure-style object called an UpdateScript.
Let's do a series of updates.
// Batch #1: Create a table and insert a few records
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
        Table muppets = cb.createTable(schema, "muppets")
                .withColumn("name").ofType(VARCHAR)
                .withColumn("profession").ofType(VARCHAR).execute();
        cb.insertInto(muppets).value("name","Kermit the frog")
                .value("profession","TV host").execute();         cb.insertInto(muppets).value("name","Miss Piggy")                 .value("profession","Diva").execute();     } });
// Batch #2: Update and delete a record
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
        cb.update("muppets").value("profession","Theatre host")
               .where("name").equals("Kermit the frog").execute();
        cb.deleteFrom("muppets")
               .where("name").like("%Piggy").execute();
    }
});
// Batch #3: Drop the table
dc.executeUpdate(new UpdateScript() {
    public void run(UpdateCallback cb) {
       cb.dropTable("muppets").execute();
    }
});
As you can see, using the UpdateScripts we've encapsulated each batch operation. If the datastore supports transactions, this is also the point of transactional control. If not, MetaModel will provide appropriate synchronization to avoid race conditions, so you can safely perform concurrent updates even on Excel spreadsheets and CSV files.

Wrapping up...

I am extremely happy working both as a developer of and a consumer of MetaModel. I hope you felt this blog/tutorial was a good kick-start and that you got excited by the library. Please give MetaModel a spin and share your thoughts and impressions.

6 comments:

Nayan Gohil said...

Hi,

when I'm trying to update the row, it gets updated and printed(cut-paste) after the last row of the sheet and which makes actual row blank. So this results in many blank rows.

this is what I'm trying to do
private void updateInternalStepStatus() {

Column statusColumn = internalStepsTable.getColumnByName("Status");
Column internalActionsColumn = internalStepsTable.getColumnByName("Internal Actions");
UpdateableDataContext excel = DataContextFactory.createExcelDataContext(new File("spreadsheet.xlsx"));
Schema schema = dataContext.getDefaultSchema();
Table[] tables = schema.getTables();

//selecting internal steps sheet
Table internalStepsTable = tables[2];

excel.executeUpdate(new UpdateScript() {
@Override
public void run(UpdateCallback cb) {

for(int i = 0; i < listOfInternalActions.size(); i++){

String internalAction = listOfInternalActions.get(i);
String status = listOfStatus.get(i);

// update the status of internal actions
cb.update(internalStepsTable).where(internalActionsColumn).eq(internalAction).value(statusColumn, status).execute();

}

}
});

}

Please advise on how to avoid this blank rows?

If you can give a snap of code/link of exampls that would be helpful. Also. with newer version I know new classes are added to make it easier, so I'd appreciate your help on that front too with samples.

Thanks!!

Kasper Sørensen said...

Hi Nayan,

I think this was a bug that prevailed in some of the earlier 3.x version... Latest MetaModel version is 3.4.7 (in eobjects.org branch) and 4.0.0 (in apache.org branch)... Maybe you can give one of the new versions a spin and see if it helps?

Nayan Gohil said...

Thanks Kasper, for quick response!!

I'm currently using the 3.4.4 version and have tried to find the latest one.. but only able to find 3.4.4

can you help me with the link for latest MetaModel version either of 3.4.7 (in eobjects.org branch) or 4.0.0 (in apache.org branch).

Thanks !!

Kasper Sørensen said...

You can find v. 3.4.7 in the central Maven repos.

http://central.maven.org/maven2/org/eobjects/metamodel/

But if you're using 3.4.4 I am getting a little unsure if it will help ... Can you make a small unittest or main method to demonstrate your issue? Could use it as a baseline contribution for fixing the issue.

Nayan Gohil said...

Hey Kasper,

I have tried with v 3.4.7 but still the result is same. Its updating from last row and creating empty rows. :(

Please assist as I'm not getting why it's behaving like this, do I have to change/set any config settings or anything m I missing here?

Thanks!!

Kasper Sørensen said...

Hey Nayan,

I suggest you raise this as a bug on the metamodel mailing lists: http://metamodel.incubator.apache.org/#mailinglists

And again, if you can manage to provide a unittest or something like that, it always helps to get people activated.