In the cover of night we've released a new version of DataCleaner today (version 2.4.2). Officially it's a minor release because for the User Interface very few things have changed, only a few bugfixes and minor enhancements have been introduced. But one potentially major feature have been added in the inner workings of DataCleaner: The ability to persist the results of your DQ analysis jobs. Although this feature still has very limited User Interface support, it has full support in the command line interface, which I would argue is actually sufficient for the purposes of establishing a data quality monitoring solution. Later on I do expect there to be full (and backwards compatible) support in the UI as well.
So what is it, and how does it work?
Well basically it is simply two new parameters to the command line interface:
-of (--output-file) FILE : File in which to save the result of the job -ot (--output-type) [TEXT | HTML | SERIALIZED] : How to represent the result of the jobHere's an example of how to use it. Notice that I use the file extension .analysis.result.dat, which is the one thing that is currently implemented and recognized in the UI as a result file.
> DataCleaner-console.exe -job examples\employees.analysis.xml\ -ot SERIALIZED\ -of employees.analysis.result.datNow start up DataCleaner's UI, and select "File -> Open analysis job..." - you'll suddenly see that the produced file can be opened:
Notice also that there's a HTML output type, which is also quite neat and easy to parse with an XML parser. The SERIALIZED format is more rich though, and includes information needed for more refined, programmatic access to the results. For instance, you might deserialize the whole file using the regular Java serialization API and access it, as an AnalysisResult instance. Thereby you could eg. create a timeline of a particular metric and track changes to the data that you are monitoring. Update: Please read my follow-up blog post about the plans to include a full Data Quality monitoring solution as of DataCleaner 3.0.