The History, Status, and Future of FPGAs

The History, Status, and Future of FPGAs(queue.acm.org)

170 points by skovorodkin 5 years ago | 158 comments

jcranmer 5 years ago |

As a bit of a counterpoint:

One of my prior projects involved working with a lot of ex-FPGA developers. This is obviously a rather biased group of people, but I saw a lot of feedback around that was very negative about FPGAs.

One comment that's telling is that since the 90s, FPGAs were seen as the obvious "next big technology" for HPC market... and then Nvidia came out and pushed CUDA hard, and now GPGPUs have cornered the market. FPGAs are still trying to make inroads (the article here mentions it), but the general sense I have is that success has not been forthcoming.

The issue with FPGAs is you start with a clock rate in the 100s of MHz (exact clock rate is dependent on how long the paths need to be), compared with a few GHz for GPUs and CPUs. Thus you need a 5× performance win from switching to an FPGA just to break even, and you probably need another 2× on top of that to motivate people going through the pain of FPGA programming. Nvidia made GPGPU work by being able to demonstrate meaningful performance gains to make the cost of rewriting code worth it; FPGAs have yet to do that.

Edit: It's worth noting that the programming model of FPGAs has consistently been cited as the thing holding back FPGAs for the past 20 years. The success of GPGPU, despite the need to move to a different programming model to achieve gains there, and the inability of the FPGA community to furnish the necessary magic programming model suggests to me (and my FPGA-skeptic coworkers) that the programming model isn't the actual issue preventing FPGAs from succeeding, but that FPGAs have structural issues (e.g., low clock speeds) that prevent their utility in wider market classes.

kyboren 5 years ago | |

GPUs work great for accelerating many applications, and it's true that that reduces interest in FPGAs. For applications that map well to GPUs, you're absolutely correct that the higher clock speeds (and greater effective logic area) make GPUs superior as accelerators.

However, some applications do not map well to GPUs. Particularly those applications with a great deal of bit-level parallelism can achieve enormous speedups with bespoke hardware. For those applications where it doesn't make sense to tape out an ASIC, FPGAs are beautiful--even if they only operate at a few hundred MHz.

I think the "programming model" is actually the biggest barrier to wider adoption. Your comment is suffused with what I believe is the source of this disagreement: The idea that one programs an FPGA. One designs hardware that is implemented on an FPGA. The difference may sound pedantic, but it really is not. There is a massively huge difference between software programming and hardware design, and hardware design is downright unnatural for software developers. They are completely different skill sets.

On top of that add all the headaches that come with implementing a physical device with physical constraints (the article complains about P&R times but this is far from the only burden) and it becomes clear that FPGAs are quite frankly a massive pain in the ass compared to software running on CPUs or GPUs.

exmadscientist 5 years ago | | |

Very much this.

(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)

The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.

This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.

Thus, FPGAs languish.

Stubb 5 years ago | | |

Another big advantage of FPGAs is low latency and the ability to hit precise timing deadlines. When working with radio hardware, you still need an FPGA for automatic gain control calculations and recording/playing out samples. Similarly, you need to do your CRC and other calculations in an FPGA if you need to immediately respond to incoming signals, such as the CTS->RTS->DATA->ACK exchange in 802.11.

cmrdporcupine 5 years ago | |

See it's funny, I (software guy) have recently started doing a bunch of FPGA stuff on the side for "fun" and I find the programming model to not be the biggest challenge.

The tools, yes, because it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs with half the features that us software developers have, and with a crapload of vendor lock-in... but I digress.

I find working in Verilog to be pretty pleasant. Yes I can see that with sufficient complexity it wouldn't scale out well. But SystemVerilog does give you some pretty good tools for managing with modularity.

On the other hand, I've never particularly enjoyed working with GPUS, CUDA, etc.

So I would agree with your statement that the structural issues prevent their utility in wider market classes -- and those really are as you say ... lower clock speeds, cost, but also vendor tooling.

FPGAs could really do with a GCC/LLVM type open, universal, modular tooling. I use fusesoc, which is about as close to that as I will get (declarative build that generates the Vivado project behind the scenes), but it's not perfect, still.

jjoonathan 5 years ago | | |

I don't mean to belittle your exploration, but are you sure it's an apples-to-apples comparison? This suggests to me that it isn't:

> it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs

Hardware engineers feel pain just like you do. The reason why they put up with those awful software suites is because they have features they need that aren't available elsewhere. In particular, they interface with IP blocks and hard blocks, including at a debug + simulation level. Those tend to evolve quickly and last time I looked -- which admittedly was a while ago -- the open source FPGA tooling pretty much completely ignored them, even though they're critical to commercial development.

If you are content to live without gigabit transceivers, PCIe controllers, DRAM controllers, embedded ARM cores, and so on, I suspect it would be relatively easy to use the open source tooling, but you would only be able to address a small fraction of FPGA applications.

tieze 5 years ago | | |

LLVM folks have actually just started on such tooling: CIRCT. With Chris Lattner at the helm, and industry players like Xilinx and Intel seemingly on board.

daxfohl 5 years ago | | |

Agreed. I never thought the mental leap to Verilog was a big hurdle. It's just C-like syntax with some new constructs around signaling and parallelism. I found this interesting rather than foreboding.

The main challenge I had was compilation time. It can sometimes take overnight to compile a simple application if there's a lot of nested looping, only to have it run out of gates. This can be a royal pain.

I'd expect most HPC scenarios would have lots of nested looping, and probably memory accesses, and thus have to spend a lot of time writing state machines to get around gate count limitations and wait for memory responses, at which point you're basically designing a 200 MHz CPU.

So I don't see it as being very useful for general purpose acceleration, but could be a good CPU offload for some very specific use cases that are more bit-banging than computing. Azure accelerates all its networking via FPGA, which seems like the ideal use case.

lnsru 5 years ago | |

It’s not the speed, that holds FPGA adaptation back. It’s development process/time. While one can start with GPU immediately, there is a need for FPGA to develop whole PCIe infrastructure and efficient data movers. One is done with GPU while FPGA developers just start with algorithms. As long as one does not need real time capability, GPU is an obvious choice. My 200 MHz design outcompetes every CPU and GPU out there with very narrow data processing window, but development time is 5x compared to regular software.

sfgnilnio 5 years ago | |

You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

Verilog and VHDL have basically nothing in common with any language you've ever used.

Compilation can take multiple days. This means that debugging happens in simulation, at maybe 1/10000th of the desired speed of the circuit.

If you try to make something too big, it just plain won't fit. There is no graceful degradation in performance; an inefficient design will just not function, come Hell or high water.

The existing compilers will happily build you the wrong thing if you write something ill-defined. There are a ton of things expressible in a hardware description language that don't actually map onto a real circuit (at least not one that can be automatically derived). In any normal language anything you can express is well-defined and can be compiled and executed. Not so in hardware.

Timing problems are a nightmare. Every single logic element acts like its own processor, writing directly into the registers of its neighbours, with no primitives for coordination. Imagine if you had to worry about race conditions inside of a single instruction!

Maybe if all these problems are solved FPGAs still wouldn't catch on, but let's not pretend the programming model isn't a problem. Hardware is fundamentally hard to design and the tooling is all 50 years out of date.

formerly_proven 5 years ago | | |

> You ever work with an FPGA? The programming model and the tooling are a huge part of the problem.

I'd argue FPGAs aren't programmed and don't have a programming model. Complaints that the programming model of FPGAs holds their adoption back are thus conceptually ill-founded. (The tooling still sucks).

tyingq 5 years ago | |

There is another traditional FPGA use case where you need real time data capture or signal generation. That seems to be getting eaten from the bottom now that there are really high speed MCUs that are easier to program. It's less efficient, but easier to develop for.

exmadscientist 5 years ago | | |

The other problem with using an FPGA here is that microcontrollers are cheap and have great cheap dev boards. FPGAs, not so much. I've wanted to just "drop in" a small FPGA in several designs, the way you can drop in a microcontroller, but there's no available FPGA that's not a massive headache in that use case. Trust me, I've looked.

The iCE40 series is almost there but not quite. It's a bit pricey (this is sometimes okay, sometimes a dealbreaker) but its care and feeding is too annoying. Who wants to source a separate configuration memory? Sometimes I don't have the space for that crap.

If any company can bring a small, cheap, low power FPGA to the market, preferably with onboard non-volatile configuration memory, a microcontroller-like peripheral mix (UART, I2C, SPI, etc.), easy configuration (re)loading, and with good tool and dev board support, they'll sell a lot of units. They don't even have to be fast!

rogerbinns 5 years ago | | |

I good example of this is XMOS. Their chips are divided into "tiles" which can simultaneously run code, together with multiple interfaces such as USB, i2s, i2c, and GPIO. Latency is very deterministic because the tiles are not using caches, interrupts, shared buses etc.

Their development environment is Eclipse based with numerous libraries such as audio processing, interface management, DFU etc. They use a variant of C (xc) that lets you send data between channels/tiles, and easily parallelize processing.

An example use is in voice assistants where multiple microphones need to be analyzed simultaneously, echo and background noise has to be eliminated, and the speaker isolated into a single audio stream. I've used it for an audio processing product that needed match hardware timers exactly, provide USB access, matched input and output etc.

borramakot 5 years ago | |

Just to throw in one more complication, I'll assert that the only benefits of FPGAs over ASICs are one time costs and time to market. Those are big benefits, but almost by definition, they aren't as important for workloads that are large scale and stable. So, if you do have a workload that's an excellent match for FPGAs, and if that workload will have lots of long term volume, you should make an ASIC for it.

So, for FPGAs to be the next big thing in HPC, you'd need to find a class of workloads that benefit from the FPGA architecture, for long enough and with high enough volume to be worth the work to move over, and are also unstable or low volume enough that it's not worth making them their own chip.

cbzoiav 5 years ago | | |

Thats not entirely true - the flexibility can have its own value. Unlike an ASIC you can handle multiple workloads or update flows.

For example timing protocols on backbone equipment handling 100-400Gbps. Depending on how its configured you may need to do different things. Additionally you probably don't want to replace 6 figure hardware every generation.

Another example is test equipment where you can't run the tests in parallel. A single piece of hardware can be far more portable / cost effective.

Unklejoe 5 years ago | | |

> I'll assert that the only benefits of FPGAs over ASICs are one time costs and time to market.

There's one more big one: the ability to update the logic in the field.

rthomas6 5 years ago | |

Take a look at Vitis. Xilinx is aware of this problem and are seeking to capture the market of people that want magic programming solutions to speed up existing software. Who knows if it will be successful, but they are trying more than ever to make FPGAs usable without having to know how to make hardware designs and verification.

mdiesel 5 years ago | |

I work with fpgas, but from LabVIEW. NI have put some effort into making the same language work for everything including fpgas, and a graphical language is great for this kind of work.

It's so easy that it's quite common to see people pass off work onto the fpga if it involves some slightly heavier data processing, which is exactly how it should be.

lnsru 5 years ago |

I am working right now on bare metal websockets implementation on Xilinx Series 7 FPGAs. Currently it’s ZynQ SoC, but final product will probably have Kintex 7 inside, so no Linux. The tools make me cry, no examples, application notes from 2014 with ancient libraries. I hope, vendors will fix tooling. But I see, Xilinx has released Vitis, so their scope is elsewhere, no interest in old crap. Using Git with Vivado is already enough pain. So I keep my text sources in Git and complete zipped projects as releases. Ouch!

d_silin 5 years ago |

I wonder if it is possible to add a (small) FPGA to a personal computer that could accelerate any specific software tasks (video/audio encoding, ML algorithms, compression, extra FPU capabilities) on user demand.

retro_guy 5 years ago |

Maybe you will find this article about Large-Scale Field-Programmable Analog Arrays [FPAAs] interesting as well: https://hasler.ece.gatech.edu/FPAA_IEEEXPlore_2020.pdf

justicezyx 5 years ago |

FPGAs are good at nothing in the scale that can challenge non-configurable silicons...

They are good at a lot of things that are in a smaller scales. Like general prototyping/testing/simulation, telecom, special-purpose real-time computing etc.

The behind-scene logic is that FPGAs can never make things as flexible as software. And flexible software always offset the inefficiency in a non-configurable chips. Just comparing FPGAs and CPUs/GPUs will never teach FPGAs vendors the reality, or they choose to ignore after all...

GeorgeTirebiter 5 years ago | |

I believe you are incorrect. A counterexample to your claim is the increasing use of FPGAs in the datacenter. And various AI engines are FPGA-based. You'll do better for a CPU in Real Silicon; but a full-featured MPU w/standard peripherals + FPGA for unusual & must-be-fast functions is hard to beat.

justicezyx 5 years ago | | |

Tell me how much users are using FOGAs and why xillinx is just a fraction of nVidia's market cap. 5 years ago, nvidia was 2x of xillinx in market cap, now it's 10x.

inaccel 5 years ago |

2 are the main challenges of the FPGA utilization:

- The first one is the FPGA programming. Now using OpenCL and HLS is much easier compared to VHDL/verilog to design your own accelerators.

- The second one is the FPGA deployment and integration. Until now it was very difficult to integrate your design with applications, to scale-out efficiently and to share it among multiple threads/users. The main reason was the lack of an OS_layer (or abstraction layer) that would enable to treat FPGAs as any other computing resource (CPU, GPU).

This is why at inaccel we developed a unique vendor-agnostic orchestrator for FPGAs. The orchestrator allows much easier integration, scaling and resource sharing of FPGAs.

That way we have managed to decouple the FPGA designer from the software developer. The FPGA designer creates the bitstream and the software developer just call the function that wants to accelerate. No need to define the bitstream file, no need to define the interface or the memory buffer allocation.

And the best part: It is vendor and platform agnostic. The FPGA designer creates multiple bitstream for different platform and the software developer couldn't care less. The developer just call the function and the inaccel FPGA orchestrator magically configure the right FPGA for the right function.

rwmj 5 years ago |

> Intel, AMD, and many other companies use FPGAs to emulate their chips before manufacturing them.

Really? I'm assuming if this is true it can only be for tiny parts of the design, or they have some gigantic wafer-scale FPGA that they're not telling anyone about :-) Anyway I thought they mainly used software emulation to verify their designs.

andromeduck 5 years ago |

IMO the next big application for FPGAs is going to be to serve as a programmable DMA-engine of sorts. Have some a bunch of hard logic like ALUs and/or IO/s strewn about. Like for hw accelerated sql queries, malloc/free, data-specific compressors and the like.

Koshkin 5 years ago |

I wonder what would be the advantages of using an FPGA to test a CPU design - compared to relying on a (presumably more accurate) computer-based simulation. (I understand the reasons one might want to implement a CPU in an FPGA.)

m3kw9 5 years ago |

The thing with FPGA is that companies when faced with cash and time crunch will opt to use a FPGA instead of designing ASICs. The tools suck but companies will hire someone that will do it. FPGA fit a very particular constraint and still solves very specific problems efficiently

bsder 5 years ago |

The problem that FPGAs have is that they are only good for low-volume solutions that require flexibility and have no power constraints.

That's a really narrow market. Telecom equipment and lab equipment, basically.

If I need volume, I need at least an ASIC. If I need to manage power, I need a full custom design.

GeorgeTirebiter 5 years ago | |

MicroSemi (now part of Microchip) makes some low-power FPGAs. Xilinx has made the coolrunner CPLDs for years that are mighty low-power (they're not huge, but often are big enough for some needed extra logic.). (Another not care too much about power is Military.)

wwarner 5 years ago |

This is really interesting. If a cpu hardware vulnerability like spectre could be repaired by patching an fpga on the SOC that would be incredible. That type of functionality would overtake the entire cloud market in about 3 days.

PanosJee 5 years ago |

inaccel.com is making lots of steps to bring FPGA to 2020

Spark/k8s integration Abstraction of popular cores Python APIS Serverless deployments Etc