20130618

Introducing Apache MetaModel

Recently we where able to announce an important milestone in the life of our project MetaModel - it is being incubated into the Apache Foundation! Obviously this generates a lot of new attention to the project, and causing lots and lots of questions on what MetaModel is good for. We didn't grant this project to Apache just for fun, but because we wanted to maximize it's value, both for us and for the industry as a whole. So in this post I'll try and explain the scope of MetaModel, how we use it at Human Inference, and what you might use it for in your products or services.


First, let's recap the one-liner for MetaModel:
MetaModel is a library that encapsulates the differences and enhances the capabilities of different datastores.
In other words - it's all about making sure that the way you work with data is standardized, reusable and smart.

But wait, don't we already have things like Object-Relational-Mapping (ORM) frameworks to do that? After all, a framework like OpenJPA or Hibernate will allow you to work with different databases without having to deal with the different SQL dialects etc. The answer is of course yes, you can use such frameworks for ORM, but MetaModel is by choice not an ORM! An ORM assumes an application domain model, whereas MetaModel, as its name implies, is treating the datastore's metadata as its model. This not only allows for much more dynamic behaviour, but it also makes MetaModel applicable only to a range of specific application types that deal with more or less arbitrary data, or dynamic data models, as their domain.

At Human Inference we build just this kind of software products, so we have great use of MetaModel! The two predominant applications that use MetaModel in our products are:
  • HIquality Master Data Management (MDM)
    Our MDM solution is built on a very dynamic data model. With this application we want to allow multiple data sources to be consolidated into a single view - typically to create an aggregated list of customers. In addition, we take third party sources in and enrich the source data with this. So as you can imagine there's a lot of mapping of data models going on in MDM, and also quite a wide range of database technologies. MetaModel is one of the cornerstones to making this happen. Not only does it mean that we onboard data from a wide range of sources. It also means that these source can vary a lot from eachother and that we can map them using metadata about fields, tables etc.
  • DataCleaner
    Our open source data quality toolkit DataCleaner is obviously also very dependent on MetaModel. Actually MetaModel started as a kind of derivative project from the DataCleaner project. In DataCleaner we allow the user to rapidly register new data and immediately build analysis jobs using it. We wanted to avoid building code that's specific to any particular database technology, so we created MetaModel as the abstraction layer for reading data from almost anywhere. Over time it has grown into richer and richer querying capabilities as well as write access, making MetaModel essentially a full CRUD framework for ... anything.
I've often pondered on the question of What could other people be using MetaModel for? I can obviously only provide an open answer, but some ideas that have popped up (in my head or in the community's) are:
  • Any application that needs to onboard/intake data of multiple formats
    Oftentimes people need to be able to import data from many sources, files etc. MetaModel makes it easy to "code once, apply on any data format", so you save a lot of work. This use-case is similar to what the Quipu project is using MetaModel for.
  • Model Driven Development (MDD) tools
    Design tools for MDD are often used to build domain models and at some point translate them to the physical storage layer. MetaModel provides not only "live" metadata from a particular source, but also in-memory structures for e.g. building and mutating virtual tables, schemas, columns and other metadata. By encompassing both the virtual and the physical layer, MetaModel provides a lot of the groundwork to build MDD tools on top of it.
  • Code generation tools
    Tools that generate code (or other digital artifacts) based on data and metadata. Traversing metadata in MetaModel is very easy and uniform, so you could use this information to build code, XML documents etc. to describe or utilize the data at hand.
  • An ORM for anything
    I said just earlier that MetaModel is NOT an ORM. But if you look at MetaModel from an architectural point of view, it could very well serve as the data access layer of an ORM's design. Obviously all the object-mapping would have to be built on top of it, but then you would also have an ORM that maps to not just JDBC databases, like most ORMs do, but also to file formats, NoSQL databases, Salesforce and more!
  • Open-ended analytical applications
    Say you want to figure out e.g. if particular words appear in a file, a database or whatever. You would have to know a lot about the file format, the database fields or similar constructs. But with MetaModel you can instead automate this process by traversing the metadata and querying whatever fields match your predicates. This way you can build tools that "just takes a file" or "just takes a database connection" and let them loose to figure out their own query plan and so on.
If I am to point at a few buzzwords these days, I would say MetaModel can play a critical role in implementing services for things such as data federation, data virtualization, data consolidation, metadata management and automation.

And obviously this is also part of our incentive to make MetaModel available for everyone, under the Apache license. We are in this industry to make our products better and believe that cooperation to build the best foundation will benefit both us and everyone else that reuses it and contributes to it.

9 comments:

Shameer Kunjumohamed said...

I am sure this project is going to be a big success, as this solves the problem currently faced by many enterprise systems, especially on cloud. The fact that it facilitates data standardization and aggregation from diverse data-sources will make it much accepted by the industry. Hope it takes care of the performance and compatibility issues in real time too. Few questions arise like (1) Does it cover ACID transactions across different data-sources (2) Annotations ? (3) Spring plugin (4) IDE support etc ?

Kasper Sørensen said...

Hi Shameer,

Thanks for your encouraging words :) Let me see if I can answer your questions properly:

(1) MetaModel does utilize whatever transactional properties are available for the individual data sources. For JDBC that typically means ACID, while on the other end, for CSV files, it only means synchronized writes. We currently do not have any specific support for XA transactions if that's what you're referring to. But I guess that could become a theme in future development.

(2) Which annotations are you referring to? Usually annotations at the data layer is used for mapping (ORM) and we don't have anything like that. MetaModel is "just" a library which you can use underneath your own annotation processors, but we dont want to tie it to any specific ORM style.

(3) The same idea applies here, although making a Spring module would be quite unintrusive, so I would consider it a bit more. When you have a "plain Java" library, it's quite easy to integrate with Spring, Guice, CDI or whatever dependency injection you would want.

(4) Any Java IDE would be able to deal nicely with MetaModel.

Shameer Kunjumohamed said...

Kasper,

(1) Nice to know that you consider XA transaction as a feature.
(2) I was referring to a set of standardized annotations(MetaModel specific) so that all my job is annotating some POJOs to map where the data comes from and going to(similar to JPA), and say READ, SAVE etc. Much of my boiler plate code is eliminated by the annotations. I would imagine few native annotations or a Spring plugin that does that job, with annotations plus a simple XMl config.
(3) A developer can always build a spring integration for any library of his own. However, as a lazy developer, I would love to have a readily available (and standard) plugin that saves my efforts. In fact, these plugins make the library more popular and easily adopted.
(4) IDE support - do you have wizards to generate the code(or annotations) visually ? A high hope ;)

Kasper Sørensen said...

I'm taking notes of your points.

Obviously the XA transaction story is a big one, and would need a bit of push from committed people. If this is high on your agenda, I suggest joining the MetaModel mailing list on apache and help move it forward :-)

Story 2 and 3 sound rather quick to do actually. We do want to keep the dependencies on XML files and so on to a minimum, but I imagine a convention based mapping with getters and setters of POJOs or something like that would be convenient and straight forward.

Regarding IDE support - nope, and I am not sure you actually need it. Try giving MetaModel a spin for some simple examples, and you'll see that there's not really anything that you need to generate :-) In my opinion at least...

Mark Nuttall said...

At first glance it seems interesting so I will need to dig deeper.

But also at first glance it seems that JBoss has some "similar" projects. Mainly Teiid seems the closest. There is Modeshape too. While Hibernate OGM is not really the same, it does show that the product can be use for more than just ORM.

Kasper Sørensen said...

I don't know those projects in depth, but I can try to compare based on my rough impressions:

* Teiid: This seems to be very web service oriented and provide more something like a data consolidation layer that is accessible in a SOA architecture. Configuration is done through XML files and it seems to be meant as a ready-to-use application for that.

On the contrary MetaModel is a core Java library. That means you have programmatic access to the data and you're not going to a web service for your data. MetaModel cannot be deployed on it's own, it needs an application to utilize it for something.

* ModeShape looks more like a JCR-283 implementation. That's again something quite different. It's a concrete application that provides a content repository with a specific set of metadata.

* Hibernate OGM - still a mapping framework :-) That's not what we're talking about here. MetaModel's strength is in the fact that it's driven by metadata, not by a priori assumptions about the data.

Ramesh said...

Kasper,

Congratulations on the Apache acceptance of MetaModel. Very interesting project indeed.

I saw your comments about Teiid (http://teiid.org), I read through web pages about MetaModel, and wanted make sure I correct context of what Teiid is and does.

Teiid used to be called MetaMatix in prior life and was a commercial company, Redhat acquired MetaMatrix and released code as open source as Teiid.

Teiid is full fledged Data Virtualization, Data Integration engine. Using it, you can access disparate sources as MetaModel does, but also provides
further facilities to build virtual schema based upon the schema you have imported from RDBMS, files etc to deploy into Teiid. Now, user can issue any ANSI SQL statements against this virtual schema using JDBC, ODBC. It can also expose the same data over REST (ODATA), or build SOAP based services. It has a very good query optimizer, which is cost and rule based. Teiid provides eclipse based tooling for virtual schema creation. Also supports, XML document models, XA transactions etc.

The deployment model is XML file or artifact generated by Teiid Designer tool, but it specifies DDL for metadata underneath.

One difference I saw and impressed by is MetaModel's fluent API. Teiid also has embedded mode, where user can deploy into their own application, but access is still through JDBC, and does not provide any custom libraries for access. The idea is to abstract to sources away from the user.

Thanks and Good Luck.

Ramesh..

Azzuwan said...

Hi Kasper,
I just want to say thanks for MetaModel. It's an awesome product. I hope it will quickly pass the incubator stage to a first tier apache project.

Kasper Sørensen said...

Thank you for the cheering and nice words Azzuwan :) I hope so too.