20101026

Preview of the UI in DataCleaner 2

Lately I've been blogging a lot about AnalyzerBeans, which is the name of the new engine of DataCleaner from version 2.0 and onwards. As AnalyzerBeans is nearing a state where it is usable and maturing I have now also taken the first steps of development on the roadmap for DataCleaner 2. As a techie I would like to attribute as much emphasis on the technical capabilities of AnalyzerBeans as possible but honestly it doesn't do much good without a good user interface also. So just as AnalyzerBeans was/is an attempt to rewrite the functional/logical part of DataCleaner, the new UI will be an attempt to deliver an user experience that feels new, exciting, more responsive and interactive. The "sketches" for the new UI is being drawn these days - I'll take you through a few examples.

In the two screenshots below you can see the source data selection and a transformation of this source data. The source selection is pretty similar to the existing DataCleaner UI but notice the new transformation-oriented features. In the example below I want to use a "Name standardizer" transformation which will turn my "real_name" column into four (virtual) columns: First name, Last name, Middle name and Titulation. Similarly I can convert data types, concatenate, tokenize, parse etc.

Another thing that is much needed in the existing DataCleaner UI is more elaborate configuration options for the various profiles. In the screenshot below you'll see the new and improved version of the Pattern finder which includes a new set of configuration options. Notice that both my physical columns (real_name) and my virtual columns (as mentioned before) are available for the Pattern finder.
There are a lot of other exciting things going into the new DataCleaner version but I will safe some news for later :) For now, I can only invite everyone to try it out. All you have to do is:
> mkdir datacleaner_dev
> cd datacleaner_dev
> svn co http://eobjects.org/svn/AnalyzerBeans/trunk AnalyzerBeans
> cd AnalyzerBeans
> mvn install
> cd ..
> svn co http://eobjects.org/svn/DataCleaner/trunk DataCleaner
> cd DataCleaner
> mvn install
> java -jar target/DataCleaner-2.0-SNAPSHOT.jar
Good luck and let us know what you think :-)

PS: Maybe I should not that even though the new version is usable there are still a lot of things NOT working. If you're wondering if something odd is a bug or a feature that has simply not yet been implemented yet - don't hesitate to ask.