Manual profiling of python code (especially if you use celery)




English: Official logo of the Université libre...
English: Official logo of the Université libre de Bruxelles (Photo credit: Wikipedia)

Profiling tools like cprofile provide hardly understandable output for my celery based app. So I used a tool called manual profiling by Pierre de Buyl from University of Toronto - Université Libre de Bruxelles (many thanks Pierre for your help :) )
It's a great tool. You import it in the source code that you want to profile:

from manual_profiler import Profiler
import warnings
I import warnings too, so that the print commands do not get lost in celery's arcanes. Then, in my code, I create an instance of the profiler, and register all the functions that I want to monitor: 

  # profiling
pr=Profiler()
correctErrors__=pr.register_function(correctErrors)
findSpellErrors__=pr.register_function(findSpellErrors)

As you can see, the register_function method rewrites the function to add the timing features to it. So I call the rewritten functions in my code.
Then, at the end of my code, I add a command to print the results:

    p.display()

With this configuration I obtain something like this:
[2012-11-27 22:24:35,937: WARNING/PoolWorker-1] Profiling information
--------------------------------------------------------------------------------
Name # of calls total time time per call
--------------------------------------------------------------------------------
[2012-11-27 22:24:35,937: WARNING/PoolWorker-1] correctErrors 41 64.102476 s 1.563475 s
[2012-11-27 22:24:35,938: WARNING/PoolWorker-1] findSpellErrors 1 61.000884 s 61.000884 s
Note that I improved the code a little bit to sort the entries by total time. I added the following line into the display function:

self.timers = sorted(self.timers, key=lambda t:t[1]._time, reverse=True)

This profiler gives information at functional level. Now, that I know that correctErrors is the function I need to improve, I use create the following class to monitor intra function events:
class ProfilingTimers():
""" define a few timers for manual profiling"""
def __init__(self):
"""Instancing the class prints the current time"""
self.start=datetime.now()
self.last=datetime.now()
self.elapsed=self.last - self.start
print strftime("Start time: "+str(self.start)) def elapsedSinceLast(self, eventName="N/A"):
""" Calling this method displays a the elapsed time since the last event"""
self.elapsed=datetime.now() - self.last
self.last=datetime.now()
print strftime("End of "+eventName+" - Time elapsed since last event: "+str(self.elapsed.days)+" days "+str(self.elapsed.seconds)+' s '+str(self.elapsed.microseconds/1000)+" ms")

In the code  at the start of the section of code to be monitored, I add :
p= ProfilingTimers()

And in a few places afterwards:

   p.elapsedSinceLast('image processing')

This command prints the elapsed time since the command was last invoked or since the ProfilingTimers instance was created. In the printed message, it includes the eventname that you have specified (here 'image processing'). With this method I can accurately assess with section of my function's code takes the most time.

With this great method, I could dig into my problem and after a few improvements... I obtained a 3200 times faster code execution!!!!

[2012-11-27 22:51:03,779: WARNING/PoolWorker-2] Profiling information
--------------------------------------------------------------------------------
Name # of calls total time time per call
--------------------------------------------------------------------------------
[2012-11-27 22:51:03,780: WARNING/PoolWorker-2] correctErrors 41 0.027236 s 0.000664 s
[2012-11-27 22:51:03,780: WARNING/PoolWorker-2] findSpellErrors 1 1.653973 s 1.653973 s


The poor man's method would be much less practical... It would involve putting in a few places the following command :

print strftime("time: %H:%M:%S", gmtime())
 
 
Share:

No comments:

Post a Comment