Wednesday, November 11, 2009

django-evolution aka orm pain part 2

Finally bit the bullet and tried out a new application to help me do the changes I needed to do on the db and boy am I glad I did. django-evolution is a joy to use. It's easy to install with no strange dependencies and it did everything out of the box as described. Initially somethings did trip me up such as the django-evolution's initialization actually looks only at the models.py and does not actually bother what is really in the db, as in say the models.py you define a field that is not actually in the db during the first 'python manage.py syncdb' to create the django-evolution table, it does not actually know that and will just carry on not knowing that there are discrepancies between the models.py and the actual db.

After I figured that part out, it was easy. Just add the fields and their attributes, re-run an update script on my data and everything is done. Now my selects are much faster. Initially a 3k row db select was taking 50s and now using the caching functionality of select_related() it comes back in a blazing 4s. Ah here too initially I got tripped a bit as after putting in all the relationship I noticed that it still did not do a cache select_related until I did some digging and found that all columns defined with 'null=True' is not cached by select_related(). Looking at the actual sql generated by the Django CRM, it looks proper and very well done. I am still checking out the little nooks and crannies of the application but so far I give django-evolution the thumbs up! I am actually looking at the code to see if during the initial syncdb django-evolution can actually peer into the db to see if the definitions are actually kosher before proceeding.

Monday, November 9, 2009

When you play outside of django's ORM ... you get pain!

For one of my projects, I forgot to implement the relationship of one my models. I thought I could get away with it by writing a tag that made a query to the db upon being fed a string in the report. To my horror, this caused the report to take about 50seconds to generate for a paltry 3000 record db! Upon tailing the logs of query.log, I found that the multitude of query to the db was causing the problem. It was basically hitting the db about 3000+ times for a single page report. That is when I found out the good thing about Django's select_related(). The problem now is, I do not have that relationship and need to build that relationship into the db. Not exactly fun with a db that is already populated with data. Led me to thinking, while Django's ORM might be great for stuff like calling data from another table, but would it also incur a performance hit everytime data is being queried. I have heard Django being used for high traffic sites but then are these high traffic sites?

I tried looking around but I could not find a satisfactory answer to this: "What if you wanted to model data for an existing MyISAM table which does not have the relationship?" It would be nice if Django provided a mechanism to recreate the relationship between MyISAM tables that is implemented at the MiddleWare. Say if I were to go the Django way now, would I have to dump out the data, re-implement the tables in a 'Django-ic' model then reimport back my data?

Currently I am feeling that in order to play in Django ORM's park. I am being forced to play by it's rules. If I opt to write my sqls raw then the ORM takes back it's ball and refuses to let me even venture into it's side of the park and I am left to recreate back a lot of the convenient functionality that was provided by the ORM layer. There doesn't seem to be a nice middle ground and that just sucks.