Being a new framework a lot of you guys probably wonder how to use DataCleaner as a Java API. Unlike a lot of other tools around that I've seen in the Business Intelligence domain DataCleaner was built bottom-up from a developers perspective and the User Interface was added on afterward so to use the DataCleaner-core API can be a real pleasure ... (so much for user-orientation, I'll have to elaborate on that another time)
Let's take a look at how to get the data of a CSV file. This will give us the data of the file using the same interfaces as JDBC-databases, excel-files and possibly other data sources.
ISchemaFactoryschemaFactory = new CsvSchemaFactory();
File file = new File("my_file.csv");
ISchema[] schemas = schemaFactory.getSchemas(file);
Or in the case of a JDBC connection:
ISchemaFactoryschemaFactory = new JdbcSchemaFactory();
Connection connection = DriverManager.getConnection("jdbc:my:database://localhost/foobar");
ISchema[] schemas = schemaFactory.getSchemas(connection);
The schemas retrievede here can be accessed in a very natural way and with a strong domain model, unlike traversing schemas in JDBC. Here's some examples:
ITable[] tables = schemas[0].getTables();
IColumn[] columns = tables[0].getColumns();
String columnName = columns[0].getName();
Handling the schemas this way serves an obvious purpose. We can now design our profiles, our validation rules etc. in a very uniform way that can be reused accross data source types. We'll talk about that next time :)
No comments:
Post a Comment