Monday, January 26, 2009

Multi row transaction capabilities

The project in which I wrote a python script to process rows in an Excel spreadsheet interfacing with a MySQLdb and fronted on the web end using django is coming to a close. There is just one more requirement to go. They want rollback capabilities with the ability to completely rollback all actions on a particular excel file if one of the rows in the file buffs up or errors out. This is so that they can correct just that one row in that Excel file then feed the same Excel file to my Python process script without worry about the sanctity of the data of the rest of the rows.

This means that I have to somehow implement a multi-row rollback functionality or application level transaction. Python's MySQLdb library supports rollback but only on each transaction. I need a "all or nothing" like functionality. This is something new for me, so I scouted around on google to get some inspiration or to see if someone else had done the same thing. I found this small transaction class by Harald Hoyer at his site. Planning to try out his stuff to see if they can fulfill my requirements. Instinctively, I feel the final solution that I come up with will be a bit different.

The only other way I can see this happening is to maintain another database table keeping track of all actions that are performed at the row level in the input Excel Forms eg: row1 : insert, row2: update. Then if the script detects an error or Exception while processing the form it will perform the reciprocal or rollback to all of the actions that have been performed on the data on the Excel file. So far whatever I have is still being worked out in the noggin' coming close to being implemented.

Update:
Something that I seem to have overlooked is that since I am using django for the web front maybe I can also use it's ORM at the backend. It all depends on how much work it would be to ORM-ize my existing code base.

12 comments:

Carl said...

I used the Django ORM on the import side for a project with very similar requirements, and implementing the transaction/rollback requirement was completely painless (literally four lines of code using django.db.transaction).

Carl said...

Sorry for the double comment; I don't know what DB backend you're using, but I have a hard time understanding why you'd consider writing custom transaction-like code rather than leveraging a decent database server with proper transaction support built in.

Unknown said...

Carl: I wasn't too sure about normal transactions working as some of the row operations are working on the same data set.

Carl said...

> I wasn't too sure about normal transactions working as some of the row operations are working on the same data set.

Not sure what you mean by that. Generally within a DB transaction, previous updates within that same transaction are visible to you (but possibly to no-one else, depending on the isolation level setting of your DB). You should be able to do multiple updates to the same rows within a single DB transaction without any trouble, and even read back the updated data if you need to. The only pseudo-exception I can think of off-hand is that recent PostgreSQL versions will barf if you try to do schema alterations within the same transaction as data updates, but I doubt you're needing to do schema alterations in a routine data import.

For sure I would at least give it a try with DB transactions and potentially save yourself a lot of unnecessary work.

Jökull Sólberg Auðunsson said...

I have some very similar requirements. Check out the built in transaction decorators. Also I hit a wall with excel files. Processing CSV files with a generator should be a lot faster. The big performance boost was switching from the ORM to raw SQL queries (sqlite3 from standard library).

The client also wanted a list of changes since the last import. For that I used django-reversion.

Carl said...

Jökull: How can you use "raw SQL queries" instead of the ORM and also gain the benefits of something like django-reversion?

Jökull Sólberg Auðunsson said...

Good catch. I'll let you know if I can get it working.

thorley said...

This may be a naive suggestion, but if your application is a single user system (as it sounds to me), you could just copy the table and if any operations fail fallback to the copy.

I've used this approach when prototyping and found that MySQL can copy tables fairly quickly.

lowkster said...

thorley: It's not exactly single user... (2 users actually .. hehe) but it's not meant to be single user but what you said is interesting. Might be applicable to another project I got running. Thanks.

Laurence Rowe said...

Just turn off autocommit and issue a single commit after inserting the last row.

qu1j0t3 said...

As Carl, Lawrence said... database transactions make any number of statements behave atomically (the A in ACID). MySQL InnoDB included.

Anonymous said...

If it's not too late...

Your post seems to leave some confusion on the notion of "transaction" as opposed to "query." A transaction by default consists of a single query, but that can be changed by issuing the "START TRANSACTION" query. In that case, all queries until the next ROLLBACK or COMMIT are part of the same transaction.

SQLAlchemy may help; it allows custom queries, but makes the most common cases (including transactions) easier and more robust.