20080803

Considering MetaModel functions

This blog entry could just as well have been a feature request but I'm going to kick-start it with a couple of thoughts I have for one of the crucial improvements to MetaModel that I've been dreaming about.

The last couple of weeks have brought considerable interest in MetaModel, largely thanks to articles posted on the server side and infoq. It's been great to get the message out and it's also sparked a lot of great ideas from users/evaluators on the discussion forum. A couple of them have been requests that we build more advanced SELECT items into the query model. In this post I'm going to discuss type-casting and extraction functions and how they can be made possible using the new IQueryRewriter interface.

The idea about query rewriting had been going on for some time, of course inspired by Hibernate's dialects. The thing was though, that for a start I wanted to skip dialect handling completely in order to get to know how far one could actually go without having to do any "hacking" in SQL. It worked out quite well but now that we need to incorporate more advanced, non-standardised features, we will of course need to be able to manipulate with the standard output. This is what the query rewriter is for, and in particular the AbstractQueryRewriter helps you do. I've made my first query rewriting "hack" today - using the TOP function for limiting the result set size, which is (as far as I know) only available in MySQL.

What we need to do now is expand the Query model API. We need to incorporate type casting. My thoughts are:

  • We must take an interface-first approach - how would one most appropriately like to type-cast a select item in a query? I'm thinking that we should add a "castAs(ColumnType type)" method on SelectItem.
  • Because not all of the ColumnType's are supported by all databases we should consider making a more abstract type enum. Something that will only contain a couple of more basic types like String, Integer, Decimal, Date, Boolean.
  • We should use the query rewriting approach to generate the actual SQL cast syntax. Some databases use the CAST(x AS y) function, others use special-purpose functions like TO_NUMBER(x).
Another feature that I want to include in MetaModel is functions for calculating or extracting something on behalf of a column. Let's take for example the YEAR(x) function (or in some databases the EXTRACT(YEAR FROM x) function).
  • One would initially just think that we should add this function to the FunctionType enum and then take it from there. But actually it's quite a different type of function. While SUM, COUNT etc. are aggregate functions, the YEAR function is a single-value function, ie you can't call YEAR on a set of values.
  • Therefore we should consider a rename of FunctionType to AggregateFunction and create a new enum, CalculationFunction (or maybe we can come up with a better name?)
  • We can use the same approach as before (query rewriting) to handle different dialects, but we need to make sure that we pick function names that are widely accepted and understandable to the user. Personally I prefer YEAR(x) over EXTRACT(YEAR FROM x) as the syntax is clearer and there are no constants inside the parameter, which is more java-ish. The downside is that we will then also need a MONTH(x), DAY(x) etc. function but that's not a biggie I think.
One last note - we should also consider if it's reasonable to keep using enums. Maybe we should switch to interfaces (and constants in the interface to ensure no API changes) for the sake of extensibility.

2 comments:

Porter Woodward said...

Query Rewriting...

I'd consider looking around at various other OSS solutions for that piece. Why not interface with something that generates tried and true SQL for the relational database interface? No sense rewriting that from scratch - especially since people have already spent a ton of development time on getting projects like Hibernate (and others) to emit syntactically correct SQL for a variety of databases.

That won't really cover interfacing with CSV, XLS, etc. But the SQL thing is pretty well covered ground at this point.

TOP is also supported in MS-SQL Server:

SELECT TOP 1 * FROM TableName

Type Casting can be tricky business. Many solutions use either a .properties file, or an XML file to specify generic casts of things like varchar -> string, etc. It occurred to me that a good place for that might be the "Schema" object. Perhaps using a fluent style interface one could do the following:

Schema schema = dc.getDefaultSchema();

schema.typeCast(DataType.VARCHAR, java.lang.String).typeCast(...

Could also be useful to utilize the DataContext object there (ask it what the backing store is for the schema - CSV, MySQL, Oracle, etc).

Or something similar. That way the type mapping is established clearly; Also could be done via .properties file - or XML, or via annotations... Depending on how many ways you want people to be able to slice bread.

Possibly the Table object should have a similar interface; allowing for the override of the mappings established at the schema level. So when retrieving values from a table, the code would first check to see if the Table had a type mapping for that type - and if not default to the one specified at the schema level.

Hard to know what the best way to accomplish that might be though - making the DB do the cast, or doing it in Java. With JDBC it has functions like getString(), etc. - so it's possible that such type casting can be done by simply selecting the appropriate method call.

One of the downsides to using constant ints as emums is the compiler won't catch problems for you (from Eff. Java 2nd. ed):

public ststic final int APPLE_FUJI = 0;
public ststic final int APPLE_PIPPIN = 1;
public ststic final int APPLE_GRANNY_SMITH = 2;

public ststic final int ORANGE_NAVEL = 0;
public ststic final int ORANGE_TEMPLE = 0;
public ststic final int ORANGE_BLOOD = 0;

So - now you can do this:

int i = (APPLE_FUJI - ORANGE_TEMPLE) / APPLE_PIPPIN;

You can also do:

if(APPLE_FUJI == ORANGE_NAVEL) {

And you get _no_ type safety - you're able to hand an APPLE to a method that possibly expects an ORANGE (because ultimately they're all just ints, not actual types at all).

Kasper Sørensen said...

When referring to type-casting I meant within the query. Typically in SQL you can do something like this:

SELECT CAST(myInteger AS VARCHAR) FROM foobar

or

SELECT TO_NUMBER(myStringContainingANumber) FROM foobar