I'm so excited, since I just think that we've solved a very common problem in java applications that have to deal with huge amounts of data. Here's the trouble:
- Even though the JDBC spec. defines a way to specify the fetch size when executing queries, some drivers do not implement this feature, which means your program will run out of memory if you query eg. a couple of millions of records.
- Even if your driver works as it is supposed to (that would be a reasonable assumption in most cases) there's still no effective way to optimize the computation of the many records by multithreading since the data is streamed through a single connection.
- Consider we want to split up the query: "SELECT name, email FROM persons"
- We will investigate the persons table and find columns that can be used to split the total resultset. We might find a reasonable age-column for this, so the query could be split to:
- SELECT name, email FROM persons WHERE age < 30 OR age IS NULL
- SELECT name, email FROM persons WHERE age > 30 OR age = 30
DataContext dc = ...I'd love to know what you all think of this? Personally I think it's a lovely way to optimize memory consumption and it offers new ways to utilize grid computing by distributing partial queries to diffent nodes in the grid to do remote processing. Also a lot of databases (MySQL for example) only dedicates a single thread per query - so by splitting the queries one could further optimize multithreading on the database.
Query q = ...
QuerySplitter qs = new QuerySplitter(dc, q);
List<Query> queries = qs.splitQueries();