A colorful value distribution

A few weeks ago I was dealing a bit of attention to the charts in DataCleaner. Of special interest are the value distribution charts which has caused some discussions...

Anyways, here's a proposal which includes nicer (IMO) coloring, a "distinct count measure", a dedicated "<blank>" keyword and a few other niceties.

Value distribution chart proposals

You can expect to see this live in DataCleaner 2.3 which is expected in august.


Proposal for writing data in MetaModel 2.0

Hi everyone,

For a long time we've had a lot of people asking "can I use MetaModel to not only read varying data formats, but also to write data?". So far the answer has been "no, MetaModel is a read-only API". But lately we've been working at Human Inference on a proposal for an API to write to the same DataContexts as you read from, in MetaModel.

Here's a glimpse of the API, by example. Currently we have fluent API's for creating tables and inserting rows:

UpdateableDataContext dc = ...
Schema schema = dc.getDefaultSchema();
Table table = dc.createTable(schema, "my_table")
dc.insertInto(table).value("id",1).value("name","john doe").execute();
dc.insertInto(table).value("id",2).value("name","jane doe").execute();

This API has so far been implemented succesfully for Excel spreadsheets, CSV files and JDBC databases - our 3 most used datastore types.

You can find the work-in-progress of the proposal in SVN at:

We would like to get your reactions on the API proposal. Does it suit your needs and do you like the approach? Will it be acceptible to launch 2.0 with just these "CREATE TABLE" and "INSERT" operations, or will other operations (such as DELETE, UPDATE, DROP, ALTER) be needed before it makes up a valid solution for you guys?

Best regards,

Update 2011-07-11
A few people have provided feedback (thank you for that) and also some performance tests on our side revealed that we need to apply a more batch-friendly approach, which also has better encapsulation and isolation properties for multiple and large updates. So, we've instead applied a pattern similar to Spring's template or Akka's atomic STM pattern. The idea is that the user supplies an UpdateScript which will be executed in isolation, like this:
UpdateableDataContext dc = ...
final Schema schema = dc.getDefaultSchema();
dc.executeUpdate(new UpdateScript() {
  public void run(UpdateCallback callback) {
    Table table = callback.createTable(schema, "my_table")
    callback.insertInto(table).value("id",1).value("name","john doe").execute();
    callback.insertInto(table).value("id",2).value("name","jane doe").execute();

On first sight it might not look quite as elegant, but I think that in the big picture this pattern is actually a lot nicer. First of all because it gives you a very clear understanding of exactly where in your code you modify your data. It also makes it a lot easier to write eg. fallback-scripts in case something goes wrong with your update. For datastore types that support transactions (eg. JDBC databases) it also makes it possible for us to easily demarcate the transactional boundaries.

... Please keep posting feedback!