Is efficient client-side paging full table scanning impossible with MySQL?

It seems to be impossible to perform client-side paging full table scans within MySQL.

For example, say you want to take a 1GB file and page through it 10MB at a time.

With a flat file I could just read 10MB off disk, read the next 10MB, etc.

This would be amazingly efficient as you could write the data sequentially, without updating indexes, and then read the data back out by byte offset.

You could page through MySQL tables by adding an index to a table:

SELECT * FROM FOO WHERE PTR > 10000 LIMIT 10000;

for example … but I REALLY want to avoid an index step because it is not cheap and only required since MySQL doesn’t support cursors.

This index slows down the import stage and I have to buffer the index data in memory which is just a waste.

I could use LIMIT with OFFSET but this isn’t going to be amazingly efficient because it will either require us to use a temporary on disk table or force us to read each row off disk.

Technically, one could implement offset efficiently with a fixed with table since you can compute the byte offset as row_width*N but I don’t think MySQL implements this optimization.

Further, my tables are are variable width.

If MySQL supported cursors then I could just tell MySQL to perform a table scan and read the data from a cursor.

I’m implementing something similar to a Map Reduce job in MySQL and Java which would work VERY well on Hadoop but is nearly impossible to implement with MySQL efficiently since I can’t page through data without over-indexing my tables.

Thoughts? I would love it if there’s some stupid solution that I’m just missing.

I suspect this will become more and more of a problem as more developers want to just flat out Map Reduce their data.

Update:

I was wrong. The solution is HANDLER. Ryan Thiessen nailed it in the comments.


  1. Chris

    I agree that having cursors would be nice but for the time being you might want to try the glorified file server storage engine couchDB.

    Going out on a limb but even working with old BDB, or any other Key/Value DB might do the trick….

  2. Have you used HANDLER before in MySQL? It’s a little-used command but may be exactly what you need here, to mimic a cursor using HANDLER READ NEXT. It has some serious limitations but you may not care about them.

    http://dev.mysql.com/doc/refman/5.0/en/handler.html

  3. Hey Ryan.

    You rock! HANDLER is exactly what I need!

    I forgot about HANDLER for some reason.

    So now I can create a MyISAM table and page through it without index…. sequential disk access only :)

    This rocks!

  4. Hi Kevin,

    Handler aside, why not just “SELECT * FROM table” and read the results only as quickly as you need them, using a mysql_use_result equivalent API call? This will accomplish pretty much the same thing as HANDLER (i.e. the table is still locked [MyISAM] while you’re reading).

    Regards,

    Jeremy

  1. 1 The Pythian Blog

    Log Buffer #171: a Carnival of the Vanities for DBAs…

    Hello, and welcome to the 171st edition of Log Buffer, the weekly review of database blogs. Let’s get it going this week with . . .  Oracle Uwe Hesse, the Oracle Instructor look at result cache, another brilliant 11g new feat….






%d bloggers like this: