20081117

Query your Excel spreadsheet with Java

The MetaModel project lets you do wonderful and advanced things like filtering, ordering, grouping etc. when working with otherwise static content in CSV files, Excel spreadsheets, XML files etc. Many times I have knocked my head to the door when trying to get simple summary data out of my excel spreadsheets or simply filtering on single rows in the sheets.

An example: Recently I was working on my tax return statement and throughout the year I had recorded all relevant activities in a simple spreadsheet. I had marked the activities on various accounts such as representation, spendings, earnings whatever. Simplified it looked something like this:

TitleAmountAccount
Asus EEE1.600Spendings
Advisory for Company X4.000Earnings

One thing I think is funny when dealing with Excel is that throughout time you've learned how to set up your spreadsheet for easy hacking instead of dynamic querying. What I mean by this is that if I wanted to get SUM for each account in my spreadsheet I would propbably have fundamentally changed my (otherwise nice and pure) spreadsheet in order to make it easy to perform the SUM function on singular accounts. For example I may have introduced an ammount column for each account. But alas, adding a new Account would be pretty cumbersome.

Enter MetaModel! With MetaModel I can write queries in a SQL-like manner. When I say "SQL-like" you may fret and think "oh but don't learn me another SQL dialect, just give me the real deal!". The answer to this is that what I'll learn you now will effectively replace all SQL dialects because the same model (or MetaModel ;)) is usable for all kinds of datastores: Excel, CSV/TSV files, XML and SQL-databases). Let's have a look at some code. First we'll load the Excel file into MetaModel and get a hold of our schema and table structure and identify our columns of interest:

// Use the factory to create a DataContext and automatically (true) narrow column types
DataContext dataContext = DataContextFactory.createExcelDataContext(new File("my_tax_return_activities.xls"), true);
Schema schema = dataContext.getDefaultSchema();
// A simple way of getting the table object is just to pick the first one (the first sheet in the spreadsheet)
Table table = schema.getTables()[0];
Column amountColumn = table.getColumnByName("Amount");
Column accountColumn = table.getColumnByName("Account");

Now that we have the columns represented and the DataContext which we can use to perform queries, here's how we make a couple of interesting queries using the MetaModel query API:

// SELECT SUM(Amount) FROM sheet WHERE Account = "Spendings"
Query q = new Query().select(new SelectItem(FunctionType.SUM, amountColumn)).from(table).where(accountColumn, OperatorType.EQUALS_TO, "Spendings");

// SELECT Account, SUM(Amount) FROM sheet GROUP BY Account
Query q = new Query().select(accountColumn).select(new SelectItem(FunctionType.SUM, amountColumn)).from(table).groupBy(accountColumn);

Now when we've authored the queries we can let MetaModel execute them and deal with the result as appropriate. If we just want to print out the results to the console we'll do something like this:

DataSet dataSet = dataContext.executeQuery(q);
while (dataSet.next()) {
    System.out.println(dataSet.getRow());
}
dataSet.close();

30 comments:

jorge said...

it helped me a lot. congrats for you exceptional piece of work.

one question. is it possible to load directly one HSSFWorkbook object?

jorge said...
This comment has been removed by the author.
Kasper Sørensen said...

Hi Jorge,
Glad you found it useful! Currently what you are asking is not possible. But if you feel like it -take a look at the source code for MetaModel. I'm sure it will be an easy patch to develop and I will welcome it for sure.

jorge said...

i'll sure do that. i've read your blog entirely and i'm amazed about your work. i've only tried metamodel to query some excel files and didn't notice that metamodel is just the tip of the iceberg. congratulations

Kasper Sørensen said...

Hi Jorge,

I'm honored that you think my work is impressive! Thank you very much.

If you want to join in on the development I recommend that you add your email to this mailing list: http://groups.google.com/group/metamodel

Currently the mailing list is rather dead, but we can change that of course :-) Please email in your idea about the HSSFWorkbook constructor for the ExcelDataContextStrategy and we'll take it from there! Looking forward to having you on board :)

Dimon4ik said...

Hi.
Kasper, can you explain how to use 'where' clause to query data from two worksheets? How to connect them? Is it possible?

Kasper Sørensen said...

Hi Dimon4k,

Actually just yesterday I answered that question at the eobjects forums so I'm just going to direct you with a link :)

Hope it works for you:
http://eobjects.org/trac/discussion/7/76

Entisar said...

Hi Kasper,

Thanks a lot for the useful information.
I tried the same example and I got correct result. However, Im trying to create a web service that get data from Excel, when I deployed my application, I got the error "NoClassDefFoundError" for the DataContext. I have included the jar file in the project. I'm currently using Eclipse and JBoss.. I hope you can advice me.
Thanks a lot
Entisar

Kasper Sørensen said...

Hi entisar,

My only guess would be that you are maybe not including all the needed JARs. If you need to use the excel spreadsheet functionality, you will need:

MetaModel-full, MetaModel-excel, MetaModel-querypostprocessor, MetaModel-core.

It just occurs to me that we should probably distribute an assembled jar as well. But that's probably for you to wait for MetaModel 1.1 for :)

Entisar said...

Thanks a lot for your reply.
i've checked it and kept only these four JARs, but Im getting now

ava.lang.NoClassDefFoundError: dk/eobjects/metamodel/MetaModelException
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
at java.lang.Class.privateGetPublicMethods(Unknown Source)
at java.lang.Class.getMethods(Unknown Source)
at org.jboss.ws.metadata.builder.jaxws.JAXWSMetaDataBuilder.processWebMethods(JAXWSMetaDataBuilder.java:885)
at org.jboss.ws.metadata.builder.jaxws.JAXWSWebServiceMetaDataBuilder.buildWebServiceMetaData(JAXWSWebServiceMetaDataBuilder.java:150)
at org.jboss.ws.metadata.builder.jaxws.JAXWSServerMetaDataBuilder.setupProviderOrWebService(JAXWSServerMetaDataBuilder.java:52)
at org.jboss.ws.metadata.builder.jaxws.JAXWSMetaDataBuilderJSE.buildMetaData(JAXWSMetaDataBuilderJSE.java:63)
at org.jboss.wsf.stack.jbws.UnifiedMetaDataDeploymentAspect.create(UnifiedMetaDataDeploymentAspect.java:66)
at org.jboss.wsf.framework.deployment.DeploymentAspectManagerImpl.deploy(DeploymentAspectManagerImpl.java:118)
at org.jboss.wsf.container.jboss42.ArchiveDeployerHook.deploy(ArchiveDeployerHook.java:97)
at org.jboss.wsf.container.jboss42.DeployerInterceptor.start(DeployerInterceptor.java:90)
at org.jboss.deployment.SubDeployerInterceptorSupport$XMBeanInterceptor.start(SubDeployerInterceptorSupport.java:188)
at org.jboss.deployment.SubDeployerInterceptor.invoke(SubDeployerInterceptor.java:95)
at org.jboss.mx.server.Invocation.invoke(Invocation.java:88)
at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:210)
at $Proxy45.start(Unknown Source)
at org.jboss.deployment.MainDeployer.start(MainDeployer.java:1025)
at org.jboss.deployment.MainDeployer.deploy(MainDeployer.java:819)
at org.jboss.deployment.MainDeployer.deploy(MainDeployer.java:782)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.jboss.mx.interceptor.ReflectedDispatcher.invoke(ReflectedDispatcher.java:155)
at org.jboss.mx.server.Invocation.dispatch(Invocation.java:94)
at org.jboss.mx.interceptor.AbstractInterceptor.invoke(AbstractInterceptor.java:133)
at org.jboss.mx.server.Invocation.invoke(Invocation.java:88)
at org.jboss.mx.interceptor.ModelMBeanOperationInterceptor.invoke(ModelMBeanOperationInterceptor.java:142)
at org.jboss.mx.server.Invocation.invoke(Invocation.java:88)
at org.jboss.mx.server.AbstractMBeanInvoker.invoke(AbstractMBeanInvoker.java:264)
at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659)
at org.jboss.mx.util.MBeanProxyExt.invoke(MBeanProxyExt.java:210)
at $Proxy9.deploy(Unknown Source)
at org.jboss.deployment.scanner.URLDeploymentScanner.deploy(URLDeploymentScanner.java:421)
at org.jboss.deployment.scanner.URLDeploymentScanner.scan(URLDeploymentScanner.java:610)
at org.jboss.deployment.scanner.AbstractDeploymentScanner$ScannerThread.doScan(AbstractDeploymentScanner.java:263)
at org.jboss.deployment.scanner.AbstractDeploymentScanner$ScannerThread.loop(AbstractDeploymentScanner.java:274)
at org.jboss.deployment.scanner.AbstractDeploymentScanner$ScannerThread.run(AbstractDeploymentScanner.java:225)
Caused by: java.lang.ClassNotFoundException: dk.eobjects.metamodel.MetaModelException
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1358)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1204)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)

Im not expert in setting up the environment properly, i dontknow if how do i obtain the assembled JAr.

Entiser

Kasper Sørensen said...

Hi Entisar,

There must be some other explanation (another cause in the stack trace maybe?) because the class that it is missing (MetaModelException) is in the MetaModel-core.jar file.

I'm writing this from a borrowed machine so I can't make a testcase (for the next couple of weeks actually) myself... Are you manually adding the jar's or are you using Maven? If you use Maven it gets so much easier to manage the dependencies, but that's a long story :)

Entisar said...

inglyThanks a lot Casper.
Im addind the jars in eclipse manually.
I dont have any knowledge about the maven.
I tried deleting them and add then again. but still giving errors.

I liked your API it seems clean and staraight forward comparing with other APIs i found for Excel.
but I may try another one.

Thanks again,
Ent.

Kasper Sørensen said...

Hi Entisar,

It just occurred to me, that one thing you might be forgetting is to include the Apache POI jars? As stated on the MetaModel, they are required for using MetaModels excel support. I believe that they are in the lib folder of the distributable zip.

Alberto said...

Hi,
I downloaded latest version of your MetaModel (1.1.5) and tried your example to query an Excel spreadsheet.

The problem is that I can't even compile since there is not a select(FunctionType, Column) method provided by Query object!

I tried downloading some older versions of your project, but the problem remains.

Can you help me? What am I doing wrong?

Thanks!

Kasper Sørensen said...

Hi Alberto,

Oops, you're right ... That reveals that I made the example by hand and not in eclipse ;-)

The fix is:

query.select(new SelectItem(FunctionType, Column))

Sorry for the inconvenience, I will correct the example right away.

Alberto said...

Thanks, that worked ;)

But I'm not done bothering you...

I'm doing some experiments with your MetaModel. I'm trying to query a spreadsheet with a financial report of some sort.

But I'm stuck at opening and reading the XLS file, with this exception which I don't understand:

Exception in thread "main" java.lang.IllegalStateException: Exception reading from file: d:\report.xls
at dk.eobjects.metamodel.ExcelDataContextStrategy.materializeMainSchemaTable(ExcelDataContextStrategy.java:106)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.materializeTable(QueryPostprocessDataContextStrategy.java:286)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.materializeFromItem(QueryPostprocessDataContextStrategy.java:231)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.executeQuery(QueryPostprocessDataContextStrategy.java:107)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.transformColumnTypes(QueryPostprocessDataContextStrategy.java:154)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.transformColumnTypes(QueryPostprocessDataContextStrategy.java:143)
at dk.eobjects.metamodel.QueryPostprocessDataContextStrategy.autoTransformColumnTypes(QueryPostprocessDataContextStrategy.java:193)
at dk.eobjects.metamodel.DataContextFactory.createExcelDataContext(DataContextFactory.java:156)
at it.previnet.bittoa.metamodel.ExcelQuery.main(ExcelQuery.java:17)
Caused by: java.lang.NullPointerException
at dk.eobjects.metamodel.ExcelDataContextStrategy.getCellValue(ExcelDataContextStrategy.java:168)
at dk.eobjects.metamodel.ExcelDataContextStrategy.materializeMainSchemaTable(ExcelDataContextStrategy.java:81)
... 8 more

Can you help me?

Thanks again ;)

Kasper Sørensen said...

I've been looking into this and I'm affraid you've found a "real" bug this time!

I will fix it right away and post a fix-release. Should be available in a couple of hours, so bear with me :)

Kasper Sørensen said...

Alright. MetaModel 1.1.6 is out. I'm uploading to the maven repository as I write this message, but you should be able to download it from google code already if you just need the jar. Let me know how it works.

Alberto said...

It seems to work fine, now!
Thank you very much :)

Tina said...

Hi, i just want to ask how to get formula values of excel columns? when i tried using dataset.getRows(), i only get the formula and not the result value of the formula. thanks.

Kasper Sørensen said...

Hi Tina,

Formula calculation is currently not supported. I will add a feature request for it, because I actually think that POI (the underlying excel api that we use) is able of calculating at least rather simple formulas, so it might be a feature that is quite easy to add.

Tina said...

Thanks for the quick response, I hope you can add this feature on your next release it would be a great help for me.. thanks again :-)

Kasper Sørensen said...

Hi Tina,

Actually I got quite excited when looking into how easy it was to evaluate the formulas so I just implemented the solution today :)

It will be released together with the upcoming MetaModel 1.5 within the next coming weeks. If you can't wait, you can also check it out from SVN yourself and build the latest snapshot. It's available here: http://eobjects.org/svn/MetaModel/trunk

Kasper Sørensen said...

Hi Tina and others,

MetaModel 1.5 is now out :)

Read more and get it at:
http://metamodel.eobjects.org/

vk said...

Hi,

Can you please help me understand, how can I query from a particular worksheet within the Excel workbook.

Kasper Sørensen said...

vk - each sheet is represented as a table. So simply use schema.getTableByName("foobar") and then you have your sheet to query from.

shiva kumar said...

Hi Guys,

See if this freeware helps you. You can download it from the following link. The freeware product name is : FILLO.

http://www.codoid.com/products/view/2/29

-Shiva

ROSHIN said...

Can we have an option for multiple
WHERE condition using metadata..?
Pls Help.

Kasper Sørensen said...

Hi Roshin,

Multiple WHERE items is already possible. If you're missing something, please elaborate some more.

Gaz said...

Here there is a working example: http://stackoverflow.com/questions/38105932/apache-metamodel-bad-performance-querying-spreadsheet :)