20130618

Introducing Apache MetaModel

Recently we where able to announce an important milestone in the life of our project MetaModel - it is being incubated into the Apache Foundation! Obviously this generates a lot of new attention to the project, and causing lots and lots of questions on what MetaModel is good for. We didn't grant this project to Apache just for fun, but because we wanted to maximize it's value, both for us and for the industry as a whole. So in this post I'll try and explain the scope of MetaModel, how we use it at Human Inference, and what you might use it for in your products or services.


First, let's recap the one-liner for MetaModel:
MetaModel is a library that encapsulates the differences and enhances the capabilities of different datastores.
In other words - it's all about making sure that the way you work with data is standardized, reusable and smart.

But wait, don't we already have things like Object-Relational-Mapping (ORM) frameworks to do that? After all, a framework like OpenJPA or Hibernate will allow you to work with different databases without having to deal with the different SQL dialects etc. The answer is of course yes, you can use such frameworks for ORM, but MetaModel is by choice not an ORM! An ORM assumes an application domain model, whereas MetaModel, as its name implies, is treating the datastore's metadata as its model. This not only allows for much more dynamic behaviour, but it also makes MetaModel applicable only to a range of specific application types that deal with more or less arbitrary data, or dynamic data models, as their domain.

At Human Inference we build just this kind of software products, so we have great use of MetaModel! The two predominant applications that use MetaModel in our products are:
  • HIquality Master Data Management (MDM)
    Our MDM solution is built on a very dynamic data model. With this application we want to allow multiple data sources to be consolidated into a single view - typically to create an aggregated list of customers. In addition, we take third party sources in and enrich the source data with this. So as you can imagine there's a lot of mapping of data models going on in MDM, and also quite a wide range of database technologies. MetaModel is one of the cornerstones to making this happen. Not only does it mean that we onboard data from a wide range of sources. It also means that these source can vary a lot from eachother and that we can map them using metadata about fields, tables etc.
  • DataCleaner
    Our open source data quality toolkit DataCleaner is obviously also very dependent on MetaModel. Actually MetaModel started as a kind of derivative project from the DataCleaner project. In DataCleaner we allow the user to rapidly register new data and immediately build analysis jobs using it. We wanted to avoid building code that's specific to any particular database technology, so we created MetaModel as the abstraction layer for reading data from almost anywhere. Over time it has grown into richer and richer querying capabilities as well as write access, making MetaModel essentially a full CRUD framework for ... anything.
I've often pondered on the question of What could other people be using MetaModel for? I can obviously only provide an open answer, but some ideas that have popped up (in my head or in the community's) are:
  • Any application that needs to onboard/intake data of multiple formats
    Oftentimes people need to be able to import data from many sources, files etc. MetaModel makes it easy to "code once, apply on any data format", so you save a lot of work. This use-case is similar to what the Quipu project is using MetaModel for.
  • Model Driven Development (MDD) tools
    Design tools for MDD are often used to build domain models and at some point translate them to the physical storage layer. MetaModel provides not only "live" metadata from a particular source, but also in-memory structures for e.g. building and mutating virtual tables, schemas, columns and other metadata. By encompassing both the virtual and the physical layer, MetaModel provides a lot of the groundwork to build MDD tools on top of it.
  • Code generation tools
    Tools that generate code (or other digital artifacts) based on data and metadata. Traversing metadata in MetaModel is very easy and uniform, so you could use this information to build code, XML documents etc. to describe or utilize the data at hand.
  • An ORM for anything
    I said just earlier that MetaModel is NOT an ORM. But if you look at MetaModel from an architectural point of view, it could very well serve as the data access layer of an ORM's design. Obviously all the object-mapping would have to be built on top of it, but then you would also have an ORM that maps to not just JDBC databases, like most ORMs do, but also to file formats, NoSQL databases, Salesforce and more!
  • Open-ended analytical applications
    Say you want to figure out e.g. if particular words appear in a file, a database or whatever. You would have to know a lot about the file format, the database fields or similar constructs. But with MetaModel you can instead automate this process by traversing the metadata and querying whatever fields match your predicates. This way you can build tools that "just takes a file" or "just takes a database connection" and let them loose to figure out their own query plan and so on.
If I am to point at a few buzzwords these days, I would say MetaModel can play a critical role in implementing services for things such as data federation, data virtualization, data consolidation, metadata management and automation.

And obviously this is also part of our incentive to make MetaModel available for everyone, under the Apache license. We are in this industry to make our products better and believe that cooperation to build the best foundation will benefit both us and everyone else that reuses it and contributes to it.