August 2008

20080826

We're moving eobjects.dk to a new server

Hello everybody. This is a practical announcement ...

We're in the process of moving eobjects.dk to a new server.

Please return shortly to access the new and improved eobjects.dk!

Update

We're finally finished, but the IP address change will need some hours to cascade worldwide. If you're still being redirected to this page it's because the DNS changes haven't set in yet. The new eobjects.dk website is online! Impatient people who are suffering from the slow DNS cascades can also access the website directly by it's IP address (minor glitches is to be expected this way though).

Update

By now, all DNS changes should be complete så go ahead
enjoy the new eobjects.dk website

20080825

Development/snapshot release of DataCleaner 1.4

We've released a development/snapshot release of DataCleaner 1.4 in order to get early reactions for all the improvements and new features as well as supporting our users with up to date functionality. In my own opinion the development release is just as stable and "safe to use" as 1.3, but of course it lacks a bit of the manual testing that we put into the real releases.

You can download the development release at our sourceforge download site.

Here's a short list of fixes since DataCleaner 1.3:

Better memory handling and garbage collection
Reference columns in drill-to-details windows
Better error handling when loading schemas
Quoting of string values in visualized tables (in order to distinguish empty strings and white spaces)
New profile: Value Distribution, which is an improved version of the Repeated Values profile. The Value Distribution profile has an option to configure the top/bottom n values to include in the result.
Better control of profile result column width.
Bugfix: Copy to clipboard functions now work properly.
Bugfix: Scrollbars added to visualized tables.

20080819

New eobjects.dk website

Hi everybody,

I'm anticipating the release of a new eobjects.dk website design. The website will be launched pretty soon I hope - we're doing it as a part of a general server move. The move is to the laboratory at Copenhagen Business School (CBS) called Business of Open Source Software and Standards (BOSSS). With BOSSS we'll have a much better bandwidth and a better performing server as well as better physical security conditions.

The new website will be based on trac 0.11 (currently we use 0.10) and will feature a lot of improvements for visitors, users and contributors:

Better theming engine has enabled us to use a more flexible website design with wiki pages appearing as menu items.
A whole new news page which will be used to perform announcements on the progress of our projects.
A lot of other improvements caused by the trac upgrade.

Here's a little screenshot of the new webpage design (work in progress):

Any comments are welcome!

20080803

Considering MetaModel functions

This blog entry could just as well have been a feature request but I'm going to kick-start it with a couple of thoughts I have for one of the crucial improvements to MetaModel that I've been dreaming about.

The last couple of weeks have brought considerable interest in MetaModel, largely thanks to articles posted on the server side and infoq. It's been great to get the message out and it's also sparked a lot of great ideas from users/evaluators on the discussion forum. A couple of them have been requests that we build more advanced SELECT items into the query model. In this post I'm going to discuss type-casting and extraction functions and how they can be made possible using the new IQueryRewriter interface.

The idea about query rewriting had been going on for some time, of course inspired by Hibernate's dialects. The thing was though, that for a start I wanted to skip dialect handling completely in order to get to know how far one could actually go without having to do any "hacking" in SQL. It worked out quite well but now that we need to incorporate more advanced, non-standardised features, we will of course need to be able to manipulate with the standard output. This is what the query rewriter is for, and in particular the AbstractQueryRewriter helps you do. I've made my first query rewriting "hack" today - using the TOP function for limiting the result set size, which is (as far as I know) only available in MySQL.

What we need to do now is expand the Query model API. We need to incorporate type casting. My thoughts are:

We must take an interface-first approach - how would one most appropriately like to type-cast a select item in a query? I'm thinking that we should add a "castAs(ColumnType type)" method on SelectItem.
Because not all of the ColumnType's are supported by all databases we should consider making a more abstract type enum. Something that will only contain a couple of more basic types like String, Integer, Decimal, Date, Boolean.
We should use the query rewriting approach to generate the actual SQL cast syntax. Some databases use the CAST(x AS y) function, others use special-purpose functions like TO_NUMBER(x).

Another feature that I want to include in MetaModel is functions for calculating or extracting something on behalf of a column. Let's take for example the YEAR(x) function (or in some databases the EXTRACT(YEAR FROM x) function).

One would initially just think that we should add this function to the FunctionType enum and then take it from there. But actually it's quite a different type of function. While SUM, COUNT etc. are aggregate functions, the YEAR function is a single-value function, ie you can't call YEAR on a set of values.
Therefore we should consider a rename of FunctionType to AggregateFunction and create a new enum, CalculationFunction (or maybe we can come up with a better name?)
We can use the same approach as before (query rewriting) to handle different dialects, but we need to make sure that we pick function names that are widely accepted and understandable to the user. Personally I prefer YEAR(x) over EXTRACT(YEAR FROM x) as the syntax is clearer and there are no constants inside the parameter, which is more java-ish. The downside is that we will then also need a MONTH(x), DAY(x) etc. function but that's not a biggie I think.

One last note - we should also consider if it's reasonable to keep using enums. Maybe we should switch to interfaces (and constants in the interface to ensure no API changes) for the sake of extensibility.

kasper's source