20080416

Traversing schemas with DataCleaner-core

Being a new framework a lot of you guys probably wonder how to use DataCleaner as a Java API. Unlike a lot of other tools around that I've seen in the Business Intelligence domain DataCleaner was built bottom-up from a developers perspective and the User Interface was added on afterward so to use the DataCleaner-core API can be a real pleasure ... (so much for user-orientation, I'll have to elaborate on that another time)

Let's take a look at how to get the data of a CSV file. This will give us the data of the file using the same interfaces as JDBC-databases, excel-files and possibly other data sources.


ISchemaFactory schemaFactory = new CsvSchemaFactory();
File file = new File("my_file.csv");
ISchema[] schemas = schemaFactory.getSchemas(file);

Or in the case of a JDBC connection:

ISchemaFactory schemaFactory = new JdbcSchemaFactory();
Connection connection = DriverManager.getConnection("jdbc:my:database://localhost/foobar");
ISchema[] schemas = schemaFactory.getSchemas(connection);

The schemas retrievede here can be accessed in a very natural way and with a strong domain model, unlike traversing schemas in JDBC. Here's some examples:

ITable[] tables = schemas[0].getTables();
IColumn[] columns = tables[0].getColumns();
String columnName = columns[0].getName();

Handling the schemas this way serves an obvious purpose. We can now design our profiles, our validation rules etc. in a very uniform way that can be reused accross data source types. We'll talk about that next time :)

No comments: