April 2008

20080423

A meta domain model

In the field of object-orientation it has for some years been common to talk about modelling the domain; to create domain models. This makes perfect sense if you're a car dealer or a pet shop, ie. a business with a specific domain. But what about those development projects that does not apply to a specific domain, but to all kinds of domains?

This was the situation when I decided to start the DataCleaner project some months ago. What I needed here was a domain model concentrated with data sources, their structures and the data they contained. I searched the Open Source offering on this and to my surprise didn't find any solid attempts to do this. Sure there are object-relational-frameworks like Hibernate which enables you to create your own domain model objects and "easily" map them to a database, but it's not possible to map the database itself to objects that represent the structure of the database.

Initially I just started making such a model myself - I mean, how hard could it be? The result was the DataCleaner metadata & data layer, which works fine in DataCleaner, but wasn't quite developed for reuse in other applications. So now I've started creating a new project, which I'm calling eobjects.dk MetaModel. The MetaModel project takes of where the DataCleaner metadata & data layer stops. We have classes like Schema, Table, Column etc. but we will try to remove (encapsulate) any tie to JDBC, because if you want to see a messy API (or more correctly: messy implementations), then look at JDBC. We will also use the MetaModel to take advantage of "new" language constructions in java like enumerations, generics etc.

Here are some of the plans for MetaModel:

Schema model: Schema (and unification of the very ambigious Catalog and Schema terminilogy in JDBC), Table, Column, ColumnType (enum), TableType (enum) etc.
Query model: Query, SelectClause, FromClause, WhereClause, GroupByClause, HavingClause, OrderByClause etc.
Data model: Dataset (with Streaming and keep-in-memory options), Row

My hope for this is to make an API which makes it possible to interact with your database in a type-safe manner and avoid query problems, hardcoded literals in the code etc.

20080419

Querying data with DataCleaner-core

I promised the other day that I would return on the topic of using the metadata & data layer of DataCleaner-core. So here's what we'll do;

1) Open up a connection to the database (this is plain old JDBC). Here's how to do it with Derby, but any database could do:

Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
Connection con = DriverManager.getConnection("jdbc:derby:my_database;");

2) Let's create a schema object for an easier, object-oriented way of accessing the data.

JdbcSchemaFactory schemaFactory = new JdbcSchemaFactory();
ISchema[] schemas = schemaFactory.getSchemas(con);

Note that by default the JDBC schema factory only retrieves relations from the database of type "TABLE". You could in some situations though wish to broaden this restriction, for example to enable views:

JdbcSchemaFactory schemaFactory = new JdbcSchemaFactory();
schemaFactory.setTableTypes(new String[] {"TABLE","VIEW"});
ISchema[] schemas = schemaFactory.getSchemas(con);

3) Let's try exploring our metadata, consisting of schemas, tables and columns.

ITable productTable = schemas[0].getTableByName("Products");
IColumn[] productColumns = productTable.getColumns();
IColumn productCodeColumn = productTable.getColumnByName("product_code");

//This next int represents one of the constants in java.sql.Types.
int productCodeType = productCodeColumn.getColumnType();
boolean isProductCodeLiteral = MetadataHelper.isLiteral(productCodeType);
boolean isProductCodeNumber = MetadataHelper.isNumber(productCodeType);

4) Time to make a query or two. Let's start off by just querying the whole table and then querying two specific columns.

JdbcDataFactory dataFactory = new JdbcDataFactory();
IData someDataThatWeWillDiscard = dataFactory.getData(con, productTable);
IData data = dataFactory.getData(con, productTable, productCodeColumn, anotherColumn);
while (data.next()) {

IRow row = data.getRow();
int count = data.getCount();
System.out.println("Observed " + count + " rows with product code: " + row.getValue(productCodeColumn));

}

Notice the IData.getCount() which is crucial to understand. The data factory will try to generate a group by query to optimize the load on traffic between database server and client. Sometimes this is not possible though (for example for TEXT types in Derby, where GROUP BY is not allowed). The getCount() method returns how many occurances there are of this distinct combination of values, represented by the IRow interface. So make sure to always check the count, maybe there are less rows in the result than in the actual database, because the results have been compressed!

Observe in general how strongly typed an API this is. In other data-oriented API's one would have to type in the same column name several places (at least in the query and when iterating through the results) but with the DataCleaner-core metadata and data layer we get a completely object oriented and type safe way to do this. The amazing thing about this API is also that we could have just as well done the same thing with flat files or other data source types.

20080418

Getting uispec4j to run in Hudson on a headless linuxbox

I've been fiddling around with this problem for some time now and I finally got all the pieces together so I guess I'd better share my newfound knowledge on these obscure topics that I hope I'll never have to encounter again.

It all started with a new fine and dandy testframework called uispec4j, that we wanted to use for DataCleaners GUI. Uispec4j is supposedly "Java GUI testing made simple" and so they caught our attention because the code coverage of DataCleaner GUI was not that impressive (yet, if you read this blog post and a lot of time has passed, it may hopefully be looking better).

So we started of by creating some neat unittests for DataCleaner GUI, using uispec4j. Hurray. They worked fine and dandy on our Windows development machines so we uploaded them to the repository and into the Continous Integration loop. This is where hell broke loose.

First off, our Continous Integration server was headless (ie. no screens, monitors, displays, whatever, just a remote console). Surely this wouldn't do because uispec4j requires a window manager to use for emulating the Java GUI. Fair enough, I installed X with the Xfce window manager:

apt-get install xorg xfce4

Then came the next problem. When starting X a fatal error occurred, telling me that no screens where installed. That seems fairly reasonable, but what the heck should I do about it? I decided to install a VNC server to host a remote screen. This would hopefully rid me of my troubles, since I didn't have the (physical) room for installing a monitor for the damn thing.

apt-get install vncserver

After configuring the vncserver I tried running my tests... Next obstacle: Telling Java which screen to use. This required to set the DISPLAY environment variable in /etc/profile:

export DISPLAY=:1

Now came the time for some mind-bobbling uispec4j errors. I found out that uispec4j only works with Motif on linux so you had to append "-Dawt.toolkit=sun.awt.motif.MToolkit" to your commandline like this:

mvn install -Dawt.toolkit=sun.awt.motif.MToolkit

every time you need to build the damn thing. Sigh, this wasn't something that my Continous Integration system (Hudson) was built for so I started to edit various batchscripts to see if I appended the damn "-Dawt.toolkit=sun.awt.motif.MToolkit" parameter to my containers startup script it would work, but no. Instead I found out that you could set the MAVEN_OPTS environment variable, so I did that in /etc/profile:

export MAVEN_OPTS="-Dawt.toolkit=sun.awt.motif.MToolkit"

But that didn't work either because Hudson doesn't comply with the damn thing :( I tried to set that "awt.toolkit" system property using some static initializers (which I generally think is a poor, poor, poor thing to do in Java in general), but guess what? Uispec4j is filled with static initializers as well, so that brought me no guarantees whether or not I was the first static initializer run. (edit: Apparently I might be wrong in this claim about uispec4j, check out the comments for more details).

Finally I got a new version of Hudson that had a per-project configuration of MAVEN_OPTS and that did the job. The last issue was actually a JVM issue. I had to change the runtime user of my J2EE container to be the same user that hosts the VNC server instance. If you try to access another users desktop, the JVM turns fatal. So don't touch my desktop or you'll get your fingers burnt!

Ah and a last thing about GUI testing: Make sure to set the Locale in your junit setUp methods or else the unittests won't be portable between computers if they have different languages and you assert on the labels of UI elements.

I once heard a very wise colleague and fellow developer say:

"You should test functionality and domain models through unittesting and test UI through UI!"

...

20080417

Maven-ing your way around DataCleaner

There seems to be quite some frustrations for old ANT-users switching to use Maven so I thought I would make a small post about the main differences and various hacks that are useful to know as a Maven user. The good thing about ANT is that you can always hack your way around a problem and it's quite easy to find the problems that are stopping you. The bad thing is of course that the build-files seem to grow enormously and that you have to enforce your project infrastructure with some kind of common JAR-download area like a FTP or something similar. In contrast Maven focuses not on the build as a process, but more on the content of the build, because in 99% of the times the process of building a java project is pretty much the same, so why not omit the "how" completely and only focus on the "what" of your build? This "what" is configured in the pom.xml file!

Admitted, that was not my primary reason for choosing Maven! :) The thing that won me over was of course the dependency handling system which I really love and loathe a bit at the same time. What you need to be aware of about the dependencies is this:

Maven automagically creates a local repository for all the JARs you use in your projects.

There's also a central repository where maven will download the JARs from, if they are not found in the local one.

If you are working offline or behind a proxy and you need a new JAR you're bound to mess this up :( When Maven can't find it's JARs in the central repository or locally it will blacklist it!

You can however delete the blacklisting by removing (part of) the local repository, it is found in ~/.m2/repository...

windows: C:/Documents and Settings/[username]/.m2/repository

or linux: /home/[username]/.m2/repository).

OK, so that was the background-knowledge you had to know - now for some of the build goals. The mostly used maven goal is "install", oftenly prefixed with "clean", like this:

mvn clean install

The install goal will build the project, run the unittests, verify that everything worked and then install the resulting JAR/WAR/Whatever into your local repository. This means that you can then use the project as a dependency to another project, smart eh?
And now for some other commonly goals:

mvn site

Create a nifty project site with all sorts of nice information and reports (javadoc, unittests, codecoverage etc. depending on your configuration).

mvn install -Dmaven.test.skip=true

Ah, the "skip test" parameter. I spent a long time figuring that one out. This is handy if you're working with several projects at the same time and you've (consciously) broken the build and want to keep on using the dependency.

mvn jetty:run

My new DataCleaner-webmonitor favourite. This will bring up a Jetty container with DataCleaner-webmonitor running on localhost. This of course requires a little configuration in pom.xml, I'm sure you can figure it out, just find the plugin-elements that has to do with jetty :)

20080416

Traversing schemas with DataCleaner-core

Being a new framework a lot of you guys probably wonder how to use DataCleaner as a Java API. Unlike a lot of other tools around that I've seen in the Business Intelligence domain DataCleaner was built bottom-up from a developers perspective and the User Interface was added on afterward so to use the DataCleaner-core API can be a real pleasure ... (so much for user-orientation, I'll have to elaborate on that another time)

Let's take a look at how to get the data of a CSV file. This will give us the data of the file using the same interfaces as JDBC-databases, excel-files and possibly other data sources.

ISchemaFactory schemaFactory = new CsvSchemaFactory();
File file = new File("my_file.csv");
ISchema[] schemas = schemaFactory.getSchemas(file);

Or in the case of a JDBC connection:

ISchemaFactory schemaFactory = new JdbcSchemaFactory();
Connection connection = DriverManager.getConnection("jdbc:my:database://localhost/foobar");
ISchema[] schemas = schemaFactory.getSchemas(connection);

The schemas retrievede here can be accessed in a very natural way and with a strong domain model, unlike traversing schemas in JDBC. Here's some examples:

ITable[] tables = schemas[0].getTables();
IColumn[] columns = tables[0].getColumns();
String columnName = columns[0].getName();

Handling the schemas this way serves an obvious purpose. We can now design our profiles, our validation rules etc. in a very uniform way that can be reused accross data source types. We'll talk about that next time :)

kasper's source