I have a computational intensive task which I used CUDA to implement it and now I want to make it even faster with FPGAs (if possible)
The system I want to implement is a series of computations each similar to matrix multiplication in sense of being parallel. It also has some non-parallel parts in between. It works with big amounts of data.
Although I want it as fast as possible, I have enough time to learn and explore with FPGAs.
here I'm asking for suggestions on how I start my path? Which FPGA to choose and where to learn about it. any website or online class or books? I've decided to do this anyway but your idea of whether this will be faster on FPGA or not would be helpful too.
The big wins from an FPGA over using a GPU come from:
- Using non-standard word widths optimised to your application. This allows denser logic, which allows more parallel processing blocks
- using your knowledge of the required accesses to external RAM to schedule them in hardware more efficiently than a general purpose memory controller can.
The downside is getting data to and from the FPGA. Draw a data-transfer diagram before you start. Even if the FPGA provides infinite speedup, you might still find it's not worth the effort if there's loads of data to be shuffled to and fro!
It's likely you'll be wanting a PCI express based board. Which is (I imagine) a whole new learning-curve before you get to do anything with the FPGA - but if you're up for it, it'll be a very interesting task!
In terms of choosing FPGAs, have a play with the software tools from the various vendors - at the learning stage that's much more important than the chips themselves. You won't find (at this early learning-stage) a show-stopper feature in any of the various chips. Also take into account the availability of boards with your required interfaces on, and any IP-core you might need to do the high-speed interfacing (eg PCIe)