Special Case of Producer / Consumer


I'm trying to synchronize a special kind of producer/consumer problem. This is the problem:

I have 2 Queues link_queue, page_queue.

Thread class ProducePages_RequireLinks (call it class A), as the name says, consumes items from the link_queueand puts an arbitrary amount (>=1) of pages from each link into page_queue.

Contrary, the main thread class ProduceLinks_RequirePages (call it class B) consumes pages from the page_queueand enqueues an arbitrary amount (>=0) of links into link_queue.

Now it is possible that class B generates links faster than class A generates pages. On the other hand, the reverse is also possible.

How do I properly synchronize those threads in Ruby 1.9.2?

I tried to use monitors in both but at some point i end up with deadlocks.

(If i failed to be precise, tell me via comments and I post some example classes)

EDIT: A picture of what is going on:


link_queue is initialized with 1 item page_queue is initialized with 0 items

We have 4 threads of class A and 1 thread of class B. Each line will be 1 timestep.

Threads A.1 grabs 1 link (linkQ=0) outputs 1 page (pageQ=1)

Thread B grabs 1 page (pageQ=0) outputs 400 links (linkQ=400)

Thread A.3 grabs 1 link (linkQ=399) outputs 1 page (pageQ=1)

Thread A.2 grabs 1 link (linkQ=398) outputs 1 page (pageQ=2)

Thread B grabs 1 page (pageQ=1) outputs 100 links (linkQ=498)

Thread A.1 grabs 1 link (linkQ=497) outputs 1 page (pageQ=2)

Thread A.4 grabs 1 link (linkQ=496) outputs 1 page (pageQ=3)

Thread B notices that linkQ is too large and waits until linkQ<16

. . . Threads A.* continue work . . . afterwards (linkQ=15) and (pageQ=484)

And now we have the opposite problem. Now Threads A have to wait until pageQ drops below a certain threshold. Otherwise we will run out of memory at some point.


Whether you are using Ruby or any other language, whenever you have a producer-consumer design like you are describing here, no matter whether the producers and consumers are threads or processes, you must never assume that the consumers will be able to "keep up" with the producers. You must always use bounded queues. Even using external queues as you mentioned in your comment does not solve the problem in the general case, because while much larger than RAM, external storage is not infinite.

The Ruby standard library has SizedQueue, which you get with require 'thread'. A SizedQueue is a thread-safe queue which is limited in size. If a producer thread tries to push an item onto the queue when it is already full, the producer will block until a consumer pops an item off the queue (making space for the new one). This will give the consumers a chance to "catch up". Likewise, if a consumer thread tries to pop an item off the queue when it is empty, the consumer will block until an item is available.

If overall throughput is limited by the producers, they will tend to get more CPU time (as the consumers block). On the other hand, if the consumers are the bottleneck, they will tend to get more CPU time. This is better than allowing the producers to burn system resources filling your queue with ever-growing numbers of items, while the consumers could be using those resources to work through the backlog.