Querying a CSV file!

Today I'm going to demonstrate some of the functionality of MetaModel 1.1, which provides a datastore-transparent querying API for Java. Actually what I'll show you isn't all that new, we've been able to do most of the querying stuff for CSV files (and other datastores) roughly since MetaModel 1.0, but it seems to me that too few realize what a powerful tool we have with MetaModel, so I'm just going to repeat myself a bit here ;) Also, I'll demonstrate on of the new cool features of MetaModel 1.1 - column type detection, narrowing and value transformation.

For this example I'll use some real data: I've extracted this CSV file from the eobjects.org trac system. It contains a list of all the tickets (issues, bugs, tasks, whatever) that are active at the time of writing... You'll notice if you take a look at the file, that it's not exactly a simple CSV file - a lot of text spans multiple lines and there are quite a lot of different data types.

Ok, so let's get started. I've saved the data to a file: tickets.csv. Now I want to read the file and let MetaModel generate a metadata model based on it. I will also let MetaModel try and detect the column types (since CSV only contains text-types natively) which will automatically transform all values to the most fitting and narrow type that MetaModel can find (this is indicated by the 'true' parameter in the code  below). Here's how we get a hold of the datastore model for the file:

File file = new File("tickets.csv");
DataContext dataContext = DataContextFactory.createCsvDataContext(file, true);

Once we have a DataContext object we are ready to go for our datastore-transparent way of querying. What we do is: We get a hold of the schema of our CSV file and we locate the table of interest. Since CSV is a single-table datastore type, getting the table of interest can be done in two ways:

Table table;

//get table by name
table = schema.getTableByName("tickets");

//get table by index
table = schema.getTables()[0];

Now we can go ahead and investigate the structure of the CSV file. Since we turned on automatic column type narrowing you will see that the 'ticket' column have been converted from a text-based column to an INTEGER type. Also, as MetaModel can verify that 'ticket'-values are never missing, it is asserted that the column is not nullable:

Column ticketColumn = table.getColumnByName("ticket");

//Will print: "Ticket column type: INTEGER"
System.out.println("Ticket column type: " + ticketColumn.getType());

//Will print: "Ticket column nullable: false"
System.out.println("Ticket column nullable: " + ticketColumn.isNullable());

And now for the fun and impressing part... Let's try to make some queries! Here are a couple of examples:

Query q;

//SELECT * FROM tickets ORDER BY ticket
q = new Query().select(table.getColumns()).from(table).orderBy(ticketColumn);

//SELECT SUM(ticket) FROM tickets
q = new Query().select(FunctionType.SUM, ticketColumn).from(table);

//SELECT _reporter AS rep_name, COUNT(*) AS num_tickets FROM tickets GROUP BY rep_name
Column reporterColumn = table.getColumnByName("_reporter");
q = new Query().select(reporterColumn,"rep_name).selectCount().from(table).groupBy(reporterColumn);

To execute the queries is very simple - just ask your DataContext object to execute the query object. MetaModel will then analyze the query and process the result behind the scenes:

DataSet dataSet = dataContext.executeQuery(q);

We can process MetaModel DataSets in quite a few ways. A typical way would be to iterate through it, similar to a ResultSet:

while (dataSet.next()) {
    Row row = dataSet.getRow();
    //Extract values or do something similar with the row
    System.out.println("row: " + row);

... Or we can transform the DataSet into a TableModel object:

TableModel tableModel = dataSet.toTableModel();

... Or we can transform it into a list of Object arrays:

List<Object[]> objectArrays = dataSet.toObjectArrays();

As you can see, with the MetaModel API a lot of things that used to be difficult is now really, really easy!

Click here to display the full example.

No comments: