The History, Status, and Future of FPGAs(queue.acm.org) |
The History, Status, and Future of FPGAs(queue.acm.org) |
One of my prior projects involved working with a lot of ex-FPGA developers. This is obviously a rather biased group of people, but I saw a lot of feedback around that was very negative about FPGAs.
One comment that's telling is that since the 90s, FPGAs were seen as the obvious "next big technology" for HPC market... and then Nvidia came out and pushed CUDA hard, and now GPGPUs have cornered the market. FPGAs are still trying to make inroads (the article here mentions it), but the general sense I have is that success has not been forthcoming.
The issue with FPGAs is you start with a clock rate in the 100s of MHz (exact clock rate is dependent on how long the paths need to be), compared with a few GHz for GPUs and CPUs. Thus you need a 5× performance win from switching to an FPGA just to break even, and you probably need another 2× on top of that to motivate people going through the pain of FPGA programming. Nvidia made GPGPU work by being able to demonstrate meaningful performance gains to make the cost of rewriting code worth it; FPGAs have yet to do that.
Edit: It's worth noting that the programming model of FPGAs has consistently been cited as the thing holding back FPGAs for the past 20 years. The success of GPGPU, despite the need to move to a different programming model to achieve gains there, and the inability of the FPGA community to furnish the necessary magic programming model suggests to me (and my FPGA-skeptic coworkers) that the programming model isn't the actual issue preventing FPGAs from succeeding, but that FPGAs have structural issues (e.g., low clock speeds) that prevent their utility in wider market classes.
However, some applications do not map well to GPUs. Particularly those applications with a great deal of bit-level parallelism can achieve enormous speedups with bespoke hardware. For those applications where it doesn't make sense to tape out an ASIC, FPGAs are beautiful--even if they only operate at a few hundred MHz.
I think the "programming model" is actually the biggest barrier to wider adoption. Your comment is suffused with what I believe is the source of this disagreement: The idea that one programs an FPGA. One designs hardware that is implemented on an FPGA. The difference may sound pedantic, but it really is not. There is a massively huge difference between software programming and hardware design, and hardware design is downright unnatural for software developers. They are completely different skill sets.
On top of that add all the headaches that come with implementing a physical device with physical constraints (the article complains about P&R times but this is far from the only burden) and it becomes clear that FPGAs are quite frankly a massive pain in the ass compared to software running on CPUs or GPUs.
(Also, in general, FPGA tools are just some of the lowest quality garbage out there... and that is saying something. They're that bad. This is a completely unnecessary speedbump.)
The rebuttal to your objection is always tools like "HLS" (High-Level Synthesis), or in English it's "C to HDL" (FPGAs are 'programmed' in the two Hardware Definition Languages VHDL (bad) or Verilog (worse, but manageable if you learn VHDL first).) These are not programming languages, they are hardware definition languages. That means things like "everything in a block always executes in parallel". (Take that, Erlang?) In fact, everything on the chip always executes in parallel, all the time, no exceptions; you "just" select which output is valid. That's because this is how hardware works.
This model maps very, very poorly to traditional programming languages. This makes FPGAs hard to learn for engineers and hard to target for HLS tools. The tools can give you decent enough output to meet low- to mid-performance needs, but if you need high performance -- and if not, why are you going through this masochism? -- you're going to need to write some HDL yourself, which is hard and makes you use the industry's worst tools.
Thus, FPGAs languish.
The tools, yes, because it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs with half the features that us software developers have, and with a crapload of vendor lock-in... but I digress.
I find working in Verilog to be pretty pleasant. Yes I can see that with sufficient complexity it wouldn't scale out well. But SystemVerilog does give you some pretty good tools for managing with modularity.
On the other hand, I've never particularly enjoyed working with GPUS, CUDA, etc.
So I would agree with your statement that the structural issues prevent their utility in wider market classes -- and those really are as you say ... lower clock speeds, cost, but also vendor tooling.
FPGAs could really do with a GCC/LLVM type open, universal, modular tooling. I use fusesoc, which is about as close to that as I will get (declarative build that generates the Vivado project behind the scenes), but it's not perfect, still.
> it seems like hardware engineers have a fetish for all-encompassing painful vendor specific IDEs
Hardware engineers feel pain just like you do. The reason why they put up with those awful software suites is because they have features they need that aren't available elsewhere. In particular, they interface with IP blocks and hard blocks, including at a debug + simulation level. Those tend to evolve quickly and last time I looked -- which admittedly was a while ago -- the open source FPGA tooling pretty much completely ignored them, even though they're critical to commercial development.
If you are content to live without gigabit transceivers, PCIe controllers, DRAM controllers, embedded ARM cores, and so on, I suspect it would be relatively easy to use the open source tooling, but you would only be able to address a small fraction of FPGA applications.
The main challenge I had was compilation time. It can sometimes take overnight to compile a simple application if there's a lot of nested looping, only to have it run out of gates. This can be a royal pain.
I'd expect most HPC scenarios would have lots of nested looping, and probably memory accesses, and thus have to spend a lot of time writing state machines to get around gate count limitations and wait for memory responses, at which point you're basically designing a 200 MHz CPU.
So I don't see it as being very useful for general purpose acceleration, but could be a good CPU offload for some very specific use cases that are more bit-banging than computing. Azure accelerates all its networking via FPGA, which seems like the ideal use case.
Verilog and VHDL have basically nothing in common with any language you've ever used.
Compilation can take multiple days. This means that debugging happens in simulation, at maybe 1/10000th of the desired speed of the circuit.
If you try to make something too big, it just plain won't fit. There is no graceful degradation in performance; an inefficient design will just not function, come Hell or high water.
The existing compilers will happily build you the wrong thing if you write something ill-defined. There are a ton of things expressible in a hardware description language that don't actually map onto a real circuit (at least not one that can be automatically derived). In any normal language anything you can express is well-defined and can be compiled and executed. Not so in hardware.
Timing problems are a nightmare. Every single logic element acts like its own processor, writing directly into the registers of its neighbours, with no primitives for coordination. Imagine if you had to worry about race conditions inside of a single instruction!
Maybe if all these problems are solved FPGAs still wouldn't catch on, but let's not pretend the programming model isn't a problem. Hardware is fundamentally hard to design and the tooling is all 50 years out of date.
I'd argue FPGAs aren't programmed and don't have a programming model. Complaints that the programming model of FPGAs holds their adoption back are thus conceptually ill-founded. (The tooling still sucks).
The iCE40 series is almost there but not quite. It's a bit pricey (this is sometimes okay, sometimes a dealbreaker) but its care and feeding is too annoying. Who wants to source a separate configuration memory? Sometimes I don't have the space for that crap.
If any company can bring a small, cheap, low power FPGA to the market, preferably with onboard non-volatile configuration memory, a microcontroller-like peripheral mix (UART, I2C, SPI, etc.), easy configuration (re)loading, and with good tool and dev board support, they'll sell a lot of units. They don't even have to be fast!
Their development environment is Eclipse based with numerous libraries such as audio processing, interface management, DFU etc. They use a variant of C (xc) that lets you send data between channels/tiles, and easily parallelize processing.
An example use is in voice assistants where multiple microphones need to be analyzed simultaneously, echo and background noise has to be eliminated, and the speaker isolated into a single audio stream. I've used it for an audio processing product that needed match hardware timers exactly, provide USB access, matched input and output etc.
So, for FPGAs to be the next big thing in HPC, you'd need to find a class of workloads that benefit from the FPGA architecture, for long enough and with high enough volume to be worth the work to move over, and are also unstable or low volume enough that it's not worth making them their own chip.
For example timing protocols on backbone equipment handling 100-400Gbps. Depending on how its configured you may need to do different things. Additionally you probably don't want to replace 6 figure hardware every generation.
Another example is test equipment where you can't run the tests in parallel. A single piece of hardware can be far more portable / cost effective.
There's one more big one: the ability to update the logic in the field.
It's so easy that it's quite common to see people pass off work onto the fpga if it involves some slightly heavier data processing, which is exactly how it should be.
They are good at a lot of things that are in a smaller scales. Like general prototyping/testing/simulation, telecom, special-purpose real-time computing etc.
The behind-scene logic is that FPGAs can never make things as flexible as software. And flexible software always offset the inefficiency in a non-configurable chips. Just comparing FPGAs and CPUs/GPUs will never teach FPGAs vendors the reality, or they choose to ignore after all...
- The first one is the FPGA programming. Now using OpenCL and HLS is much easier compared to VHDL/verilog to design your own accelerators.
- The second one is the FPGA deployment and integration. Until now it was very difficult to integrate your design with applications, to scale-out efficiently and to share it among multiple threads/users. The main reason was the lack of an OS_layer (or abstraction layer) that would enable to treat FPGAs as any other computing resource (CPU, GPU).
This is why at inaccel we developed a unique vendor-agnostic orchestrator for FPGAs. The orchestrator allows much easier integration, scaling and resource sharing of FPGAs.
That way we have managed to decouple the FPGA designer from the software developer. The FPGA designer creates the bitstream and the software developer just call the function that wants to accelerate. No need to define the bitstream file, no need to define the interface or the memory buffer allocation.
And the best part: It is vendor and platform agnostic. The FPGA designer creates multiple bitstream for different platform and the software developer couldn't care less. The developer just call the function and the inaccel FPGA orchestrator magically configure the right FPGA for the right function.
Really? I'm assuming if this is true it can only be for tiny parts of the design, or they have some gigantic wafer-scale FPGA that they're not telling anyone about :-) Anyway I thought they mainly used software emulation to verify their designs.
That's a really narrow market. Telecom equipment and lab equipment, basically.
If I need volume, I need at least an ASIC. If I need to manage power, I need a full custom design.
Spark/k8s integration Abstraction of popular cores Python APIS Serverless deployments Etc
https://github.com/xupgit/FPGA-Design-Flow-using-Vivado/tree...
https://www.xilinx.com/support/university.html
https://www.xilinx.com/video/hardware/getting-started-with-t...
There are others thst cover the SDK side of things, but the HW side/Vivado is well documented.
If some FPGA company comes along and throws out conventional market wisdom (the old Henry Ford quote seems pertinent: "If I'd asked customers what they wanted, they would have said "a faster horse"") and makes a FPGA with software tools that are fast, non-buggy, with good UI/UX, I think they would be able to steal significant market share. Early FPGA patents should be expiring by now...
I guess the one place where GPGPU-based solutions wouldn't work, is when the code you want to accelerate is necessarily acting as some kind of Turing machine (i.e. emulation for some other architecture.) However, I can't think of a situation where an FPGA programmed with the netlist for arch A, running alongside a CPU running arch B, would make more sense than just getting the arch-B CPU to emulate arch A; unless, perhaps, the instructions in arch-A are very, very CISC, perhaps with analogue components (e.g. RF logic, like a cellular baseband modem.)
You saw correctly, work is indeed being done to build "shells" that can accept workloads without the user having to go through the FPGA tooling/build process.
So it's unlikely ever to gain broad acceptance because the software vendors would have to support such a high number of permutations and the return can be questionable. This is why you see far more accelerators based on ASICs that have higher clock speeds and baked-in circuitry for specific tasks, with standardized APIs.
But sure, there's nothing preventing you from buying an FPGA board, hooking it up to your PC, creating a few images that do the accelerations you want, and writing software that uses them, swapping the image in when your program loads. You could even write a smart driver that swaps the image only if it's not in use by another app, or whatever. It's just unlikely you'll ever find a bunch of third-party software that supports it.
I could imagine that Apple will include something like this in their Apple Silicon SOC for ARM macs.
The Afterburner Card is not user programmable, but maybe it may in the future and this was just the first try to get the hardware in the field.
1. It's not just a single FPGA but a large box full of them. for example: https://www.synopsys.com/verification/emulation/zebu-server....
2. Software models are employed for parts of the system (For example, the southbridge and all the peripherals connected to it are generally a software model which communicates with the hardware emulated portion in the FPGA via a PCIe model which is partly in hardware and partly in software.) This saves a lot of gates in the FPGA - those parts have already been well tested anyway so no need to put them into the hardware emulation.
- modern FPGAs are huge.
- when an asic design won't fit in a single FPGA, it's usually possible to partition the design into multiple FPGAs
- software emulation/ simulation is not guaranteed to be "more accurate". FPGAs can interact with a real-world environment in ways that simulation simply cannot
- simulations run 1000s of times slower than FPGAs. Months of simulation time can be covered in minutes on the FPGA
Edit: to be clear, they all use simulation too, but FPGAs are used to accelerate the verification process
We had 10 such boards, good for millions of dollars in hardware, and a small team to keep it running.
These platform were mostly used by the firmware team to develop everything before real silicon came back. It could run the full design at ~1 to 10MHz vs +500MHz on silicon or 10kHz in simulation.
After running for a while, that FPGA platform crashed on a case where a FIFO in a memory controller overflowed.
Our VP of engineering said that finding this one bug was sufficient to justify the whole FPGA emulation investment.
One of the nicer stories about the first ARM chip is that they built a software simulator to verify the design and as a result they found plenty of bugs in the hardware before committing to silicon. The first delivered chips worked right away.
Also, there are prototyping boards specifically built for emulation that integrate multiple FPGAs, although this does introduces a partitioning problem that has to be solved either manually or via dedicated emulator software.
First off, mapping an entire CPU to an FPGA cluster is a design challenge itself. Assuming you can build an FPGA cluster large enough to hold your CPU, and reliable enough to get work done on it, you have the problem of partitioning your design across the FPGA's. Second problem: observability. In a simulator, you can probe anywhere trivially, with an FPGA cluster, you must route the probed signal to something you can observe. (I am not even going to talk about getting stimulus in and results out, since with FPGA or simulator, either way you have that problem, it is just different mechanics.)
The big problem is that an FPGA models each signal with two states: 1 and 0. A logic simulator can use more states, in particular U or "unknown". All latches should come up U, and getting out of reset (a non-trivial problem), to grossly oversimplify, is "chasing the U's away". An FPGA model could, in theory, model signals with more than two states. The model size will grow quickly.
Source: Once upon a time I was pre-silicon validation manager for a CPU you have heard of, and maybe used. Once upon a time I was architect of a hardware-implemented logic simulator that used 192 states (not 2) to model the various vagaries of wired-net resolution. Once upon a time I watched several cube-neighbors wrestle with the FPGA model of another CPU you have heard of, and maybe used.
Note: What would 3 state truth tables look like, with states 0,1,U? 0 and 1 is 0. 0 and U is 0. 1 and U is U -- etc. You can work out the rest with that hint, I think.
Edit to add: Why are U's important? They uncover a large class of reset bugs and bus-clash bugs. I once worked on a mainframe CPU where we simulated the design using a two-state simulator. Most of the bugs in bring-up were getting out of reset. Once we could do load-add-store-jump, the rest just mostly worked. Reset bugs suck.
Indeed they do. And even if you have working chips you get the next stage: board level reset bugs. A MC68K board I helped develop didn't want to boot, some nasty side effect of a reset line that didn't stay at the same level long enough stopped the CPU from resetting reliably when everything else did just fine. That took a while to debug.
Or you might imagine a chip that has an FPGA on the side (I expected Intel would ship this after acquiring Altera, but it never happened). But the FPGA would somehow have to have access to the paths that caused the vulnerability, which is highly unlikely, and would also be really slow compared to what they actually do which is hacking around it by microcode changes.
They did: https://www.anandtech.com/show/12773/intel-shows-xeon-scalab...
But I get the sense this part was aimed at a few very specific customers. It required some PCB-level power delivery changes, so you couldn't even drop it into a standard server motherboard.
I don't think they are very popular though. Maybe they are used sometimes for machine learning?
But my point is that for FPGAs to come to prominence as a major computation paradigm, it probably won't be because it outperforms GPU on one really big workload like bitcoin or genetic analysis or something. It'll have to be a moderately large number of medium scale workloads.
HLS notwithstanding, you don't use traditional control structures to tell an FPGA what to do. You use clocked FSMs and asynchronous expressions to tell it what to be.
It froze the IDE for 45 minutes before I could do anything else.
This was on a beefy machine at AWS too, not some cheap home desktop thing.
That wasn't compiling, no synthesis, P&R, nothing.
There was no giant netlist I'd been working on either. Most of the FPGA was empty.
That was literally just adding a small source file which the IDE auto-indexed so you could browse the contents.
In Verilator, an open source Verilog simulator, that same source file loaded, completed its simulation and checked test results in less than a second. So it wasn't that hard to compile and expand its contents.
Vivado is excellent for some things. But the excellence is not uniform unfortunately. On that project, I had to do most of the Verilog development outside Vivado because it was vastly faster outside Only importing modules when they were pretty much ready to use and behaviorally validated.
Okay, bit-banging is another advantage of FPGA that GPU doesn't do as well. There are a few things.
If you want to compare apples-to-apples, we have done a comparison with realistic (and not synthetic) data regarding the performance of GPUs and FPGAs.
https://medium.com/@inaccel/faster-inference-real-benchmarks...
What we really need is a lightweight, open source toolchain for FPGAs and one or more "higher level" synthesis languages. I've always wondered if a DSL using a higher language like Python isn't a better way to do this. Rather than try to transpile an entire language, just provide building blocks and interfaces that can then be used to generate verilog/VHDL.
nMigen: python based DSL to verilog translator
LiteX: Open source gateware
SymbiFlow: Open source verilog compiler + PnR tooling.
There a linux kernel running on liteX and a Risc V core running on an ECP5 running out on the internets.
A micropython version running on a risc V core and migen (earlier version of nMigen) can also be found here: https://fupy.github.io/
Like this? http://www.myhdl.org/
Also, Dini got acquired by Synopsys a few years ago.
"Although no specific testing was performed, sophisticated statistical finite element models and back of the envelope calculations are showing the number of status LEDs to be bright enough to execute dermatological procedures normally done with CO2 lasers. Contact the factory for more information about this sophisticated feature and make sure an adult is present during operation. These LEDs are user controllable from the FPGAs so can be used as visual feedback in addition to burning skin."
"As with all of our FPGA-based products boards, the DNVUPF4A is loaded with LEDs. The LEDs are stuffed in several different colors (red, green, blue, orange et al.). There are enough LEDs here to melt cheese. Please don't melt cheese without adult supervision. These LEDs are user controllable from the FPGAs so can be used as visual feedback in addition to the gratifying task of creating gooey messes."
Huge PCBs, ~2ft by 2ft.
Is board to board connection with high speed connectors feasible? This was what I heard from verification folks.
I wasn't talking about open sourcing. I accept we won't have open source DRAM controllers and the like from them. I understand the licensing restrictions. I just don't like how they force all this stuff to be gatewayed through their baroque and over complicated GUI tools.
I prefer tools that are scriptable, that can work with the build system of my choice, that work properly with source control (imagine that!), where you have your choice of editor rather than having their garbage one rammed down your throat, where there's wizbang features like reformatting and auto-indentation... Hell, even refactoring.
Vivado and Quartus just get in the way. There's no reason to tie all the stuff you're talking about into an integrated tool. They could just ship libraries.
Fusesoc does in fact try to make them behave this way. But you can tell it's a bit of a war to make it happen.
I'm glad Fusesoc is fighting the good fight and I'm glad you're fighting the good fight, but as you point out, it's definitely a fight. It was hardly fair to call the desire to avoid said fight a "fetish."
So many things that could be done in a programatic, testable, declarative, scriptable, repeatable way are done with futzy GUI tools in hardware land. Schematic design _could_ be a matter of declaring components, buses, etc. and letting the tool produce something (and then manually manipulate the visual layout if necessary) ; I mean you could literally describe your board using something similar to Verilog and get the tool to produce the schematic for you... we have these kinds of powers in the 21st century -- Instead it's futz with tools that are vaguely Illustrator-esque, find that half your connection points are not actually connected, etc. Why do people want to suffer like this?
Want to use a DRAM controller in Vivado? Find the wizard, enter into 10 text boxes... and if you're lucky you can find the Tcl scripts it generated and in the future just write your Tcl script... but they certainly won't make it easy.
Vivado project in source control? You're going to jump through hoops for that.
I want hardware engineers to demand better.
"ignored" as in the vendors aren't cooperating with the developers of the open source tools? What the opensource tools are doing is hard enough as is. When you consider how fragmented FPGA chips are it's difficult to support a wide variety of them even if you wanted.
Fortunately there is (finally!) an open source PCB design program that doesn't suck: Horizon EDA. I've only made one PCB with it but honestly it was pretty great and the author fixed every usability bug I reported in a matter of hours, which is an insane difference from KiCad's "you're holding it wrong".
The only think I don't like about it is it has an unnecessarily powerful and confusing component system (there are modules, entities, gates, etc.). But really it is the best by far.
Anyway, on FPGAs, I think the tools are only vaguely mature for iCE40 and even then you basically need to already be an expert unfortunately.
I've only recently starting designing PCBs and I started with KiCad, but I've found it to be very easy to use after watching one video of someone going through a simple board design.
I didn't need a video to figure out how to use Horizon.
I have yet to design a PCB where I didn't have to create basically all of the components myself.
I am curious to know if you are using Qemu by any chance to prototype your hardware. I am doing some work on Qemu to make prototyping easier of a custom hardware and would love the pain points.
http://zedboard.org/product/minized
https://www.avnet.com/shop/us/products/avnet-engineering-ser...
Oh, and Vivado (the FPGA development IDE) is free (as in beer) for that FPGA as well as Xilinx' other mid to low end FPGAs.
Zynq chips are beautiful parts. But they are not "low-cost drop-in" anything. They are chips that you can architect an entire system around and replace a dozen other chips with. I know; I've done it. (But they didn't bite on our proposal, so my sketched architecture remained just a detailed sketch.)
Not to mention there are many FPGA applications where one purpose of the FPGA is to avoid having software in the path. If software is only responsible for configuration load, it's better, but still can be a problem.
It's ridiculous for anybody to insist that programming an FPGA isn't writing software. By definition, anything you can put in a text file that ends up controlling what some piece of hardware does is software. Probably almost all of what is wrong with FPGA ecosystems comes from failure to treat it like software.
It's not much like your typical C program, but that's a very parochial viewpoint. The languages available to program FPGAs in are abysmal, a poor match to the hardware: actually too much like ordinary programming languages, to their detriment. A person who makes an FPGA do something is going to be an engineer, and to an engineer any microprocessor and any FPGA are just two different state machines. Somebody who studied "computer science" will be disoriented, but that is just because the field has narrowed, as network effects pared down the field of computing substrates until practically nothing is left.
FPGAs emulating ASICs or von Neumann CPUs is the greatest waste of potential anywhere. If the architecture of (some) FPGAs could be elucidated, it could fuel a renaissance of programming formalisms. We could begin program them in a language actually well-suited to the task, and vary their configuration in real time according to the instantaneous task at hand.
And their internal architecture is pretty well documented. See, for example, the Spartan-6 slices: https://www.xilinx.com/support/documentation/user_guides/ug3...
What's less well documented, at least publicly, is the routing, but on some level that's less interesting since it's "just" how you get the electrons from point A to point B, not about choosing A or B. But even the routing is decently well described, though you have to look in some fairly obscure places (like the device floorplan viewer).
I'm not sure why you think FPGAs emulating ASICs is a "waste of potential". By definition, ASICs are strictly more capable and more powerful than FPGAs, so you're climbing up the potential ladder, not down!
If you think an FPGA is not inherently and necessarily a state machine, no matter how it is programmed (provided power and clock are in specified bounds), that only means you don't know what a state machine is. All clocked digital devices are state machines, and can never be anything other than state machines.
(There is an argument to be made that an FPGA is, itself, an ASIC: an IC whose Specific Application is to be an FPGA. But such an argument would be transparent sophistry.)
Maybe that's what most hobbyists need?
Those things are fantastic for hobbyists and can be nice for low-volume production. But they're kind of crap for higher volume work:
* Expensive
* Physically fragile/easy to kill: personal experience suggests they are noticeably more fragile than their competition; ALWAYS add pull resistors and ESD diodes to their JTAG/SWD pins and use a real voltage supervisor, not the internal PoR/brownout, no matter what the datasheet says because it does not speak the truth
* Actually, just add external ESD diodes to anything even the least bit sketchy
* On-chip analog not good enough for serious applications or stupidly limited (just give me two of those please? no?)
* On-chip routing is very, very limiting
* Weak MCU cores
* Few large parts (high GPIO, fast core, ...); the 5LP is better but needs a refresh with bigger, better, cheaper flagships
* More digital blocks (UDBs). They use a crappy old macrocell architecture, which wouldn't be a problem except they only give you TWO of them!
I've actually whined about the last one to the Cypress FAE (great guy!) and he just started laughing. Turns out, he's repeatedly said that to their higher-ups and gotten shot down... only to have customers like me ask for it again, over and over....
Hopefully under Infineon the PSoC line will be better managed. It could be a huge powerhouse, but right now it just does not have a good enough lineup of sane models.
(Small install, no need for licences and license renewal, work reasonably well on a cheap laptop)
But that's PSoC Creator, used for their PSoC 4 and 5 lines. (Avoid the 3 and older -- they're really old.) The newer 6 requires Modus Toolbox, which I think doesn't support the 4 or 5 lines (STUPID). I have no experience with that one. It's Eclipse based, so who knows.
Of course everything is analog, and ultimately quantum-electrodynamic, but the languages FPGAs are programmed in don't provide access to those domains.
A weak spot of high-end commercial HLS tools (Catapult, Stratus) is in interfacing with the rest of the hardware world, and how the clock is handled (SystemC, you handle it yourself) or kind of vaguely (Catapult's ac_channel). Getting HLS to deal with pipeline scheduling is great, but sometimes you want to break through and do something with the clock. Want to write a memory DMA in HLS? Talk AXI? Build a NoC in HLS? Build even something like a CPU in HLS? Interface with "legacy" RTL blocks, whether combinational or straight pipeline or with ready/valid interfaces or whatever? These things are sort of/just feasible at present with these commercial HLS tools, but very very hard (I've tried it).
If they want to stick with it, I think C++11 could provide a superior type-safe metaprogramming facility for building hardware (compared to the extremely primitive metaprogramming and lack of type safety notions in SystemVerilog) or generators such as Chisel or the hand-written Perl/Python/TCL/whatever ones in use at most companies, but sometimes you need to break down and do something with the clock or interface with things that care about a clock, much in the same way that one would put inline asm statements in code. I want to do that, but not have to deal with the clock 95% of the time when I don't really need to, which is where the generators fail (let the tool determine the schedule most of the time). HLS needs to sit between the two: not a generator (glorified RTL), but not "pretend you write untimed C++ all the time" (not hardware at all).
I worked on hardware for something akin to a FPGA on a much coarser granularity (kind of like coarse-grained reconfigurable arrays)--close enough that you have to adapt tools like place-and-route to compile to the hardware. The programming for this was mostly driven in pretty vanilla C++, with some extra intrinsics thrown in. This C++ was close enough to handcoded performance that many people didn't even bother trying to tune their applications by resorting to hand-coding in the assembly-ish syntax.
This helped bolster my opinion that FPGAs aren't really the answer that most people are looking for, and that there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU.
> [...] there are useful nearby technologies that can leverage the benefits of FPGAs while having programming models that are on par with (say) GPGPU
I think CGRAs are really cool but they're even more niche, and I suspect your original point about GPUs eating everyone's lunch applies particularly strongly to CGRAs. The point is well taken, though, and I don't necessarily disagree.
I think things are about to change thanks to yosys and other open source tools.
> VHDL (bad) or Verilog (worse,
VHDL (and its software counterpart Ada) are very well thought and great to use once you get to know them (and understand why they are the way they are). Yeah, they are a bit verbose but I prefer a strong base to syntactic sugar.
As a professional FPGA developer: VHDL (and Verilog even moreso) are bad [1] at what they're used for today: implementing and verifying digital hardware designs. In fact, they're at most moderately tolerable at what they were originally intended for: describing hardware.
[1] They're not completely terrible – a completely terrible idea would be to start with C and try to bend it so that you can design FPGAs with it...
Yup. I know HLS has gotten a lot better recently but my impression is that, somewhat like fusion, HLS as a first-class design paradigm is always a decade away.
> FPGA tools are just some of the lowest quality garbage out there
Absolutely. I think the problem is vendors see FPGA tooling as a cost center and a necessary evil in order to use their real products, the chips themselves. Users are also highly technical and traditionally have no alternative, so (mostly) working but poor-quality software is simply pushed out the door. "They'll figure it out".
Finally, to expand on the difficulties imposed by physical constraints, I think another huge blocker to wide adoption is that FPGAs are physically incompatible. I cannot take a bitstream compiled for one FPGA and program it to any other FPGA. Hell, I can't even take a bitstream compiled for one FPGA and use that bitstream for any other device in the same device family. Without some kind of standardized portability, FPGAs will remain niche devices used only for very specific applications.
There's a subtle point in that Verilog/SystemVerilog and VHDL are also just not powerful languages. While parametric, they lack polymorphism, object oriented programming (excluding SV simulation-only constructs), functional programming, etc.
Your point about the abstraction being different is well taken---hardware description languages describe circuits and programming languages describe programs. However, it's exceedingly unfortunate that the industry is stuck in a rut of such weak languages and trying to explain that weakness to hardware engineers, who haven't seen anything else, runs into the "Blub paradox" (e.g., a programmer who only knows assembly can't evaluate the benefits of C++). [^1]
Edit: Disclaimer, I'm well aware of the pros and cons of these paradigms in software development and use them plenty
Polymorphism makes it way easier to build hardware that can handle any possible data type. Things like queues and arbiters beg for type parameters (you should be able to enqueue any data). Without polymorphism you can make something parameterized by data width (and then flatten/reconstruct the data), but it's janky and you lose any concept of type safety (as you're "casting" to a collection of bits and then back).
There was some interesting work out of the University of Washington [^1] to build a "standard template library" using SystemVerilog. Polymorphism was identified as one of the shortcomings that made this difficult (Section 5: "A Wishlist for SystemVerilog"). [^2]
[^1]: https://github.com/bespoke-silicon-group/basejump_stl [^2]: http://cseweb.ucsd.edu/~mbtaylor/papers/BaseJump_STL_DAC_Sli...
Like considering dumping memory content on a PC and reinject it on another with different RAM layout and devices and complaining the OS and programs can't continue running? Is that a sane expectation?
There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.
Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
Alternatively, would building whole images for many families of FPGA make sense? Feels like programs distributed as binaries for p OS variants times q hardware architectures, each producing a different binary... random example https://github.com/krallin/tini/releases/tag/v0.19.0 has 114 assets.
No. Bitstream formats are not in any way compatible across devices. Because timing is a factor, even if you had the same physical layout of LUTs and routing, it's unlikely that your design would work.
(From parent)
> use that bitstream for any other device in the same device family
Not at the bitstream level. However, you can take a place&routed chunk of logic and treat it as a unit. You can replicate it (without repeating P&R), move it around, copy it onto other devices in the same family. This is super useful as most FPGA applications have large repeating structures, but P&R doesn't know that it's a factorable unit. It'll repeat P&R for each instance and you'll get unpredictable timing characteristics.
> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
> would building whole images for many families of FPGA make sense
You can license libraries that are a P&R'd blob and drop them into your design. There's no easy way to make this generalizable across devices without shipping the original RTL, and conversion from RTL->bitstream is where most of the pain lies.
Even worse; it's more like that plus extracting the raw microarchitectural state of a CPU, serializing it in a somewhat arbitrary way, trying to shove that blob into a different CPU and still expecting everything to continue running.
I'm not necessarily complaining, just pointing out this significant difference WRT software programs running on CPUs.
> There are upstream formats targeting FPGAs that can be shared, although yes redoing place and route is slow.
Can you show me an example? I'd like to see this. You do not mean FPGA overlays, correct?
> Should manufacturers provide new formats closer to final form yet would allow binaries that can be adjusted, kind of like .a .so or even llvm?
Like you say, at the very least you will need to re-do place and route. But actually the problem is much worse than this. Different FPGAs have different physical resources. Not just differing amounts of logic area, but different amounts of block RAM, different DSP blocks and in varying numbers, high-speed transceivers, etc. This necessitates making different design trade-offs. Simply shoehorning the same design into different FPGAs, even if it were kind of possible, will not work well.
> Alternatively, would building whole images for many families of FPGA make sense?
Currently I think that's the only real option. But the extreme overhead, duplication of effort and maintenance burden make it very unattractive.
My napkin sketch is some sort of generalized array of partial reconfiguration regions with standardized resources in each region. Accelerator applications can distribute versions targeting different numbers of regions (e.g. one version for FPGAs supporting up to 8 regions, one for FPGAs supporting up to 16 regions, etc.). The FPGA gets loaded with a bitstream supporting a PCIe endpoint and management engine, and some sort of crossbar between regions. At accelerator load time, previously mapped, placed, and routed logical regions used in the application are placed onto actual partial reconfiguration regions and connections between regions are routed appropriately. The idea is to pre-compute as much of the work as possible, leaving a lower dimension problem to solve for final implementation. Timing closure and clock management are left as exercises for the reader :P.
Yes to a degree, but another part of the problem is the "physical constraints" you mention. FPGA tooling has to solve multiple hard problems, on the fly, at large scale (some of the latest chips are edging up to 10M logic elements). Unfortunately for the FPGA industry, I think that this is unavoidable - though a lot of interesting work is being done around partial reconfiguration, which should allow for users to work with smaller designs on a large chip.
I think partial reconfiguration is really sexy, but it's been around for a long time. What's new and exciting there? Genuinely curious.
Some of the coolest work to come out of the Chisel project is their intermediate representation FIRRTL.
I'm a casual in that space but I thought Chisel was an HDL that could be used to support HLS.
You could build something HLS like using Scala/JVM and Chisel, but Chisel itself is much closer to traditional HDLs.
VHDL is bad because it's bad at prototyping and implementing digital hardware [1]. One reason why it's bad at that task is the mismatch between the hardware you want and the way you have to describe it in the language. For example: You want a 32-bit register x which is assigned the value of a plus b whenever c is 0, and you want its reset value to be 25. VHDL code:
signal x: unsigned(31 downto 0);
...
process (clk, rst)
begin
if rst then
x <= to_unsigned(25, x'length);
elsif rising_edge(clk) then
if c = '0' then
x <= a + b;
end if;
end if;
end;
The synthesis software has to interpret the constructs you use according to some quasi-standard conventions, and will hopefully emit those hardware primitives you intended. I say "hopefully", because of the many, many footguns arising from those two translation steps.[1] Okay, I concede that in theory, there might be a use case where VHDL is perfectly suited for, which would make VHDL a not-bad language. But designing digital hardware is not such a use case.
---
There are some minor issues with your code that shows you are probably a verilog/SV guy and not an experienced VHDL guy.
Please read Andrew Rushtons "VHDL for Logic Synthesis". I also recommend you read on VHDLs 9-valued logic and why it was designed this way and how it differs from verilogs Bit.
Wrong on both counts.
Please, enlighten me, what's wrong with my code? Note that it's in VHDL-2008, and the async. reset is intentional.
> I also recommend you read on VHDLs 9-valued logic and why it was designed this way
My main issue with VHDL is not the IEEE 1164 std_(u)logic, although it really doesn't help that this de-facto standard type for bitvectors and numbers (via the signed/unsigned types) is just a second-class citizen in the language – as opposed to bit and integer, which are fully supported syntactically and semantically, but which have serious shortcomings.
https://github.com/ghdl/ghdl-language-server
Edit: typo in url
Nothing major, but in my books this is the difference between a Jr and a Sr designer. Nitpicking, yes. But the hardware business is like.
Do you mean this: "x <= to_unsigned(25, x'length);" ? Some tools, like Synopsys, allow "x <= 25;" here, but other tools, like ModelSim, do not. The VHDL-2008 standard does not allow "x <= 25;".
> Inconsistent Boolean expressions
Do you mean because I wrote "if rst ..." but later "if c = '0'..."? Come on, you're not nitpicking, you're trying to find issues where there are none. Fixating on such anal-retentive details does not make you a "Sr designer", it makes you a bad engineer.