Multithreading performance problem with a frame sensor (using DeckKink SDK)


I developed a C# frame grabber using Blackmagic hardware and the decklink SDK. My main program is running in MTAthread mode.

On each new frame I have a callback with a function called VideoInputFrameArrived(). I'm calling some multithreading tasks from this function, like this :

t1 = Task.Factory.StartNew(() => tempmatch.PictureAnalysis(x1));
t2 = Task.Factory.StartNew(() => tempmatch.PictureAnalysis(x2));
t3 = Task.Factory.StartNew(() => tempmatch.PictureAnalysis(x3));
t4 = Task.Factory.StartNew(() => tempmatch.PictureAnalysis(x4));

Task.WaitAll(t1, t2, t3, t4);

It's working fine, but I can't go higher than 50% CPU usage; each of the 4 cores of my CPU is running at 50%. I spent a lot of time to understand what is going on, but I haven't found the way to go.

First of all: do you expect the individual tasks to be cpu limited? If you run on a single thread, will it always utilize a core to 100%? If not, you may be limited by e.g. I/O. What is the result if you run on two cores instead of four?

If you expect the individual tasks to be cpu limited, find out if there is any lock contention going on. Are these tasks completely separate, or will they ever contend for locks, for example by storing a result in a shared data structure such as a ConcurrentDictionary somewhere? There are tools in (some editions) of Visual Studio that will allow you to visualize lock contention. Go to Analyze > Start Performance Wizard.

A common problem in parallel .NET code is being GC limited, in the concurrency profiling tools in VS this will show up as one threads being paused periodically waiting for GC because an allocation is done on another thread. If you experience this, you are basically allocating memory at a too high rate in your analysis. Try to allocate result structures up front rather than inside the analysis, and try to allocate as little as possible during the parallel execution. You can also try experimenting with different GC modes (Server/Workstation) and GC Latency Modes to reduce GC pauses.