Multiple instances of the service - How can I ensure that an object is processed only once?


I have the following setup: there are N instances of Azure Worker Role deployed. Our desktop application uploads a message to the Azure, and then a set of images related to the message is uploaded. Message knows what images it needs.

These 2 activities (message upload and images upload) are independent - images can be uploaded BEFORE message even generated by user (call it caching - but it's more complex), or few seconds/minutes AFTER message was uploaded to Azure.

I store message in Azure MSSQL database, images are stored in blobs, and URLs to them are stored in database. Also there is MessageToImage table which stores links to images for the message. Here is a simplified DB structure (pardon my C#):

class Message
    public int Id;
    public string Text;

class Image
    public int Id;
    public string Name;
    public string BlobUrl;   // Null if image was not received by the service yet

class MessageToImage
    public int MessageId;
    public List<int> ImageIds;

And when we have message with all images ready (i.e. all images are uploaded), we need to do something else with it (let's say, post to Facebook). HERE IS THE QUESTION: how can I guarantee that message will be processed only once? In the worst case scenario, I will have N instances receiving N images for the message at the same time - and which instance will "choose" that it should send message to the further processing? And how can I guarantee that it will happen only once?

So far I came up with following ideas:

  1. Make sure that "update BlobUrl for Image" database logic will be atomic, and will return number of "missing" images for the message. This way I will trigger further processing only on one instance - the one which receives "0" as the result of database update. BUT: how can I do that on MSSQL level? and more complex - how can i do that using Entity Framework?

  2. Have a dedicated worker role which job will be selecting messages which have all images - and send them for processing. But that does not scale well... and looks a bit ugly.

Any other ideas/suggestions?


UPDATE1 @Richard and @Rob suggested to use Service Bus Queue. I did look into it. The part which I still don't have an answer for is how the piece of code in WORKER ROLE which decides WHEN to send the Message to the Queue for processing should look like? Message gets sent to the queue only when all images are present in the database/blobs (i.e. uploaded to the Azure cloud). And here I would still like to point to my corner-case example - I have 10 images being simultaneously processed by 10 Worker Roles. For all instances processing ends at the same time. Each role updates database with uploaded image URL. And THEN I should somehow trigger final Message processing - meaning one of the instances should get the priority. And I'm not clear HOW I should do this.

Hope this makes my question it a bit cleaner.

Create an Azure Service Bus Queue and have your client apps post messages to the queue. Then your worker roles can pull messages from the queue and process the messages.

The great thing about Service Bus Queues is that they guarantee that messages can only be pulled off the queue once, whereupon the message is marked as being 'acquired'. If the transaction is not marked as being completed within (a configurable) time period, the message is returned to the queue ready to be pulled by the next worker request.

This means that if your worker role fails mid-way through processing, the message will eventually reappear in the queue for the next worker to pick it up and (hopefully) complete the required work.

Read this for more information:

How to use Service Bus Queues