,

Django: how to run backgroung tasks in parallel and deferred processing (background tasks) thanks to celery

Head of celery, sold as a vegetable. Usually o...
Head of celery, sold as a vegetable. Usually only the stalks are eaten. (Photo credit: Wikipedia)
I am implementing a web app with Django. In my App, a user submits tasks, and receives an email when they are done.

 The tasks can take a few minutes to complete, and an end-user can submit several tasks at once. I want the end user to immediately see a confirmation page telling her that she will receive an email with the results in a few minutes. Now the question is, how do schedule the processing of her task to manage the parallel execution?

There are several technical options actually:
  1. schedule the tasks with a cron job. Thats not elegant, in my opinion, as I would like to remain in a full python/django environment and not too much rely on the environment.
  2. use multiprocessing with python. According to the web posts I read, this is incompatible with django.
  3. use multithreading with python. It could work but I would like to have more flexibility in the schedule of the tasks; and most importantly keep control on the number of threads that my code triggers, in case of flash crowd on my web app.
  4. use celery and rabbitMQ!   
So, let's talk about Celery and rabbitMQ (Note: In this tutorial I strongly base on the following great blog post). First, I install Celery and rabbitMQ on my debian squeeze.
RabbitMQ
RabbitMQ (Photo credit: Wikipedia)

sudo apt-get install gcc python-dev python-setuptools python-simplejson rabbitmq-server
easy_install celery
After installing all these pieces of software, I update the settings.py of my Django website

BROKER_HOST = "127.0.0.1"
BROKER_PORT = 5672
BROKER_VHOST = "/"
BROKER_USER = "guest"
BROKER_PASSWORD = "guest"
BROKER_URL = "amqp://guest:guest@localhost:5672//"
INSTALLED_APPS = (
...
'celery',
)

I update the database and start celery:

./manage.py syncdb
Now, to start using celery, I switch to celery's official documentation. I am never too comfortable with the concepts of multitasking so, let's follow the tutorial step by step before jumping to the actual usage of celery in my webapp. 

We first create a mytasks.py file:

from celery import Celery

celery = Celery('mytasks', broker='amqp://guest@localhost//')

@celery.task
def add(x, y):
return x + y

In the above code, note the python decorator "@celery.task". It has the same semantic has adding 'add = celery.task(add)' after our function definition.

We run this task with celery (mytasks is our module name ie. our filename minus ".py")

celery -A mytasks worker --loglevel=info

So now it looks like we have a function adding two figures, and a process that runs to process calls to this functions. Let's play with this in a new console tab (in the same virtualenv of course):

python manage.py shell
>>> from mytasks import add
>>> add.delay(4, 4)

In our first tab with the workers console output, we see that the task completed:

[2012-11-10 11:14:14,087: INFO/MainProcess] Got task from broker: mytasks.add[93cf07f6-3d49-4319-adc4-9227e7639c37]
[2012-11-10 11:14:14,137: INFO/MainProcess] Task mytasks.add[93cf07f6-3d49-4319-adc4-9227e7639c37] succeeded in 0.00116801261902s: 8

Ok, nice, but I am interested in the result! Where do I get it? So, the tutorial says that we have to configure celery with other options if we want the results. So, we go back to mytasks.py file and change the celery line to:

#celery = Celery('mytasks', broker='amqp://guest@localhost//')
celery = Celery('mytasks', backend='amqp', broker='amqp://')

Today's latte, RabbitMQ.
Today's latte, RabbitMQ. (Photo credit: yukop)
so, we go back to our python console

result=add.delay(4, 8)
result.ready()

We see 'True" indicating that the result is ready, and we can use 'result.get()' to obtain that result.

That's it! With this, we are ready to play.

Share:

No comments:

Post a Comment