Libre-SoC 180nm Power ISA v3.0 ASIC Submitted to IMEC MPW(openpowerfoundation.org) |
Libre-SoC 180nm Power ISA v3.0 ASIC Submitted to IMEC MPW(openpowerfoundation.org) |
GCC's impact was possible because it was (with GAS - the assembler) 100% feasible to have an open source toolchain. Yes more software was necessary for a complete system (linker, libc, etc), but GCC made it possible to build from the ground floor up.
Also, yes, the initial GCC was worse than any proprietary decent tool chain at the time, but it got better and better because each improvement built on all the earlier open sourced efforts.
Think about how hard Linux kernel development would have been if it had to rely on different proprietary tool chains for every target architecture (and possibly chip version).
Hardware definition languages (Verilog/VHDL, etc) enable high level chip design like high level programming languages, but making the physical chip requires a PDK (process design kit) that encodes how each critical silicon feature is built.
So a chip built for TSMC 28nm contains TSMC proprietary material and is essentially unportable. It can take several years to move a major chip from one foundry to another (or even a shrink at the same foundry), and the proprietary tool chains preclude a development process that can incrementally improve portability.
This announcement is a a major step toward a similar foundation being available for silicon design. It is very important that it is a large complex chip, rather than just a research development vehicle.
[disclaimer - past life as OpenPOWER participant]
we've developed a dynamically SIMD-partitionable-maskable set of "base primitives" for example, so you set a "mask" and it automatically subdivides the 64-bit adder into two halves. but we didn't leave it there, we did shift, multiply, less-than, greater-than - everything.
https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee... https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/iee...
can you imagine doing that in VHDL or Verilog? tens of engineers needed, or some sort of macro-auto-generated code (treating VHDL / Verilog as a machine-code compiler target).
the reason for doing this - planning it well in advance - is because we're doing Cray-style Vectors (Draft SVP64) with polymorphic element-width over-rides. yes, really. the "base" operation is 64-bit, but you can over-ride the source and destination operation width.
the reason why we're using our own Cell Library is actually down to transparency. we want customers to be able to compile the GDS-II files themselves, fully automated, no involvement from us, no manual intervention.
ironically, as an aside: Staf's Cells are 30% smaller (by area) than the Foundry equivalents.
There is a huge amount of great stuff going on this this area.
Tim Ansell - Skywater PDK: Fully open source manufacturable PDK for a 130nm process
Staf actually developed actual IOpad Cells (from scratch), actual Standard Cells and a 4k SRAM block: we did not use the NDA'd TSMC Cell Libraries, here.
if we had used Skywater 130nm we would have been forced to ditch LIP6.fr (i cannot express enough how hard Jean-Paul Chaput has worked on coriolis2 for the past 18 months), we would not have been able to test the IOpads that Staf developed... yeah.
bottom line is we used a complete independent VLSI toolchain - fully automated - that has nothing to do with the USA or DARPA Military funding - and was developed with European expertise.
It may be 180nm (1999-era technology), but that's still hugely important. The world of semiconductor design is incredibly closed source and secretive.
In other words, this chip isn't even remotely open-source.
What they sent to the foundry isn't the "ghost cells" (which don't have transistors in them and therefore don't work).
This fails the most basic requirements of being open source.
Coriolis2 source code: http://coriolis.lip6.fr/
Chips4Makers FlexLib Cell Library based on FreePDK45: https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases
Automated Layout scripts for generation of GDS-II Files: https://git.libre-soc.org/?p=soclayout.git;a=summary
please do try to get your facts right and not mislead people by making false claims, eh?
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/...
i'm currently in the middle of a rabbit-hole exploration of being able to do in-place RADIX-2 FFT, DCT and DFT butterflys, the target is a general purpose function to cover each of those, in around 25 Vector instructions.
not 2,000 optimised loop-unrolled instructions specifically crafted for RADIX-8, another for RADIX-16, another for RADIX-32 ..... RADIX-4096 (as is the case in ffmpeg): 25 instructions FOR ANY 2^N FFT.
btw if you're interested in "real-world" SVP64 Vector Assembler we have the beginnings of an ffmpeg MP3 CODEC inner loop:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=medi...
that's under 100 instructions, more than 4x less assembler for the same job in PPC64. and 6.5 times less assembler than ffmpeg's optimised x86 apply_window_float.S
you will no doubt be aware of the huge power savings that brings due to reduced L1 cache usage.
it's 64-bit, LE/BE, and it's implementing a "Finite State Machine" (similar technique to picorv32, if you know that design). this because we wanted to keep it REALLY basic, and also very clear as a Reference Design, none of the "optimised pipelined decoders and issuers" that you normally find, which make it really, really difficult to see what the hell is going on.
bear in mind this includes SVP64: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple...
if you go back several revisions, the non-Vectorised version is like... 400 lines?
* In a few years (maybe 5?), it might be possible to build a computer that you can trust has no intentional back doors in the CPU, but is modern enough to run software from within the last decade.
* If this catches on, and is used by enough people, economies of scale might kick in, and bring costs for advanced custom chips down by an order of magnitude (if the cpu is small enough, and if more fab capacity is built). Not Intel/AMD/ARM parts - those prices will remain stable, at first.
* Maybe we can have another decent consumer-grade router? No, this is a pipe-dream.
* Our Amiga accelerator boards will become SMOKING fast.
Staf will also "protect" you from the Foundry NDAs. you develop with a "symbolic" version of the Cell Library, he runs the "Real" one and sends it to IMEC on your behalf. here's Staf's "symbolic" Cell Library, it's based on FreePDK45 https://gitlab.com/Chips4Makers/c4m-pdk-freepdk45/-/releases
Coriolis2 - http://coriolis.lip6.fr/ - is entirely Libre-Licensed. it's fully automated, you don't have to do any "hand-editing", it has unit tests (so you have demos you can look at and also check you installed everything right). we have some automated setup scripts for it if you're interested: https://git.libre-soc.org/?p=dev-env-setup.git;a=blob;f=cori...
LIP6 have a Silicon-proven ENTIRELY Libre Cell Library called nsxlib, if you really want to go that route. it's Silicon-proven in 360nm and 180nm.
Also, LIP6 have a relationship with a small town in Japan, they have 2 micron fab which is used for "training" of employees of the town. submission for that is entirely free. i know this exists but have not used it, and don't know more details, but i can probably put you in touch with Sorbonne University if you're serious.
and if you really really want to do "at home" stuff, Libre-Silicon is developing a 2in wafer fab, using Ultra-Violet DLPs and high-accuracy stepper motors, that you'll be able to buy and operate from your garage or lab. think "3D printing", i think they're aiming for 2000 nm or something (20 micron)? really big, but proves the concept.
Neither one has published an easily-replicable process, meaning I can't really repeat what they've done. IMO what this space needs is an open source build plan/BoM, with a cottage industry of people selling DiY and pre-assembled kits. Once the 3d printing community got there, that's when things took off -- before kits or at least build guides with proper BoMs, it was just disparate individuals doing their own thing.
Connect me with anyone who's got a good approach to building some sort of replicable open-source fab though, and I'll quit my job and join the project full-time (that's not a joke: I'm serious).
[1] http://sam.zeloof.xyz/category/semiconductor/ [2] https://libresilicon.com/
180nm is still by far and above the world's most heavily-used geometry, because the price-performance (bang per buck, however you want to put it) is so extremely high.
an 8in wafer is USD 600 and that's extremely low. any power MOSFET, power transistor, diode or other high current semiconductor you absolutely don't want small "things" (detailed tiny tracks) you want MASSIVE ones.
why on earth would you waste money on tiny features, it's like using the latest 0.15mm 3D printing nozzles to 3D print a massive 300x300x300 mm cube that's going to be used for nothing more than a foot-stool. you want a 1.2mm nozzle for that!
then any processor below 300 mhz, you can get away with 180nm. need only an 8 mhz 8-bit or 4-bit washing machine or microwave processor, or something to go in a cheap digital watch? 180nm is your best bet: you'll get tens of thousands of < 1 mm^2 ASICs on a single wafer which means you're well below $0.05 per individual die.
a 28nm 8in wafer would be about... 10x that cost, you'd end up with exactly the same transistor (or 8 mhz 8-bit processor), why would you pay more money for what you don't need?
btw the real reason why there's a chip shortage: the Automotive industry, who are cheap bar-stewards, wanted even lower than $600 per 8in wafer so they went with 360nm and cruder geometry. that's equipment that's even older than the 1990s, like 40+ years in some cases.
so then the stupidity hit, and they stopped ordering. then 18 months later they phone up these old Foundries and say, "ok, we're ready to start ordering again". and the Foundries say, "oh, we switched off the equipment, and it cooled down and got damaged (just like that massive Electric plant in S. Australia that was de-commissioned, the concrete cracked when they switched it off, and it's completely unsafe to start up again). you were our only customer for the past 30 years, so we scrapped it all. you'll have to now compete with the consumer-grade smaller geometry Fabs like everyone else".
which is something that none of the Automotive companies have told their Governments, because then they can't go crying "boo hoo hoo, we can't make chips any more at the price that we demand, waaa, waaaa, i wannnt myyy monneeeeey"
and now of course they can't use the old masks, because those were designed for 360nm and cruder geometries, they have to redesign the entire ASIC for 180nm and that's why you can't now get onto 180nm and other MPW Programmes because the frickin Automotive Industry has jammed them all to hell.
In my opinion, an area of interest going forward into the next decade of more safety-critical software written by smaller and smaller orgs (e.g. eVTOL companies, sensor companies, etc) is continuing to push forward which objectives can be accomplished by formal means instead of primarily through testing.
An NXP or IBM processor might be great, and might be mature, and might be very well tested -- but I, as a safety-critical software developer, have little way of demonstrating that to certification authorities. The availability of open-source processor designs and, in the future, traceable and accountable conversion from those HDL designs to RTL, to masks, and then to silicon, gives a path to showing that portions of a processor are correct-by-design, and thus a path to the goal of showing that my machine-code-as-authored(-by-an-assembler) and machine-code-as-executed(-by-a-processor) semantics match.
The Talos is currently the only fully libre computer available for high-perf computing, and it uses POWER9 CPUs. If you want a fully free CPU, your choices are either very dated CPUs or POWER.
Many distros (inc. Debian, and most source-based ones) support ppc64/POWER officially quite well and go out of their way to ensure a high degree of portability.
The fact that the POWER architecture may be niche is not a problem since so much software can be compiled for it. See the thalos workstations: https://www.raptorcs.com/TALOSII/ and the powerpc notebook: https://www.powerpc-notebook.org/en/
For people who are willing to use niche hardware for more control on what is running, this is seems like a very important step.
What's good about this is that the source is available and can be verified to some degree against the hardware (by decapping it.) That puts a log of constraints on what kinds of secret back doors people can build that we didn't have before.
Off topic: where did you get this rule?
the problem with this particular irate individual is that he's assumed that because TSMC's DRC rules are only accessible under NDA that automatically absof*** everything was also "fake open source".
idiot.
sigh.
clearly didn't read the article.
whilst both Staf Verhaegen and LIP6.fr signed the TSMC Foundry NDA, we in the Libre-SOC team did not. we therefore worked entirely in the Libre world, honoured our committment to full transparency, whilst Staf and Jean-Paul and the rest of the team from LIP6 worked extremely hard "in parallel".
the ASIC can therefore be compiled with three different Cell Libraries:
* LIP6.fr's 180nm "nsxlib" - this is a silicon-proven 180nm Cell Library * Staf's FreePDK45 "symbolic" cell library using FlexLib (as the name says, it uses the Academic FreePDK45 DRC) * the NDA'd TSMC 180nm "real" variant of Staf's FlexLib
i was therefore able to "prepare" work for Jean-Paul, via the parallel track, commit it to the PUBLIC REPOSITORY (the one that's open, that our resident idiot didn't bother to check existed or even ask where it is), which saved Jean-Paul time whilst he focussed on fixing issues in coriolis2.
it was a LOT of work.
on top of that, because it's an entirely separate processor, to get it to do anything you actually have to have a Remote Procedure Call system, operating over Shared Memory!
oink.
so the process for running a GPU shader binary is as follows:
step 1: fire up a compiler (in userspace) step 2: compiler takes the shader IR and turns it into GPU assembler step 3: the userspace program (game, blender, whatever) triggers the linux kernel (or windows kernel) to upload that GPU binary to the GPU step 4: the kernel copies that GPU binary over Shared Memory Bus (usually PCIe) step 5: now we unwind back to userspace (with a context-switch) and want to actually run something (OpenGL call) step 6: the OpenGL call (or Vulkan) gets some function call parameters and some data step 7: the userspace library (MESA) "packs" (marshalls) those function call parameters into serialised data step 8: the userspace library triggers the linux (windows) kernel to "upload" the serialised function call parameters - again over Shared Memory Bus step 9: the kernel waits for that to happen step 10: the userspace proceeds (after a context-switch) and waits for notification that the function call has completed...
... i'm not going to bother filling in the rest of the details, you get the general idea that this is completely insane and goes a long way towards explaining why GPU Cards are so expensive and why it takes YEARS to reverse-engineer GPU drivers.
in the Libre-SOC architecture - which is termed a "Hybrid" one, the following happens:
step 1: the compiler is fired up (in userspace, just like above) step 2: compiler takes the shader IR and turns it into *NATIVE* (Power ISA with Cray-style Vectors and some custom opcodes) assembler step 3: userspace program JIT EXECUTES THAT BINARY NATIVELY RIGHT THERE RIGHT THEN
done.
did you see any kernel context-switches in that simple 3-step process? that's because there aren't any needed.
now, the thing is - answering your question a bit more - that "just having vector capabilities" is nowhere near enough. the lesson has been learned from Nyuzi, Larrabee, and others: if you simply create a high-performance general-purpoes Vector ISA, you have successfully created something that absolutely sucks at GPU workloads: about TWENTY FIVE PERCENT (one quarter) of the capability of a modern GPU for the same power consumption.
therefore, you need to add SIN, COS, ATAN2, LOG2, and other opcodes, but you need to add them with "reduced accuracy" (like, only 12 bit or so) because that's all that's needed for 3D.
you need to add Texture caches, and Texture interpolation opcodes (takes 4 pixels @ 00 01 10 11 square coordinates, plus two FP XY numbers between 0.0 and 1.0, and interpolates the pixels in 2D).
you need to add YUV2RGB and other pixel-format-conversion opcodes that are in the Vulkan Specification...
and many more.
but, we first had to actually, like, y'know, have a core that can actually execute instructions at all? :) and that's what this first Test ASIC is: a first step.
half-float we'd like to do by using a dynamic SIMD-aware 64-bit ALU that has auto-partitioning. we do however already have an actual FP16 implementation https://git.libre-soc.org/?p=ieee754fpu.git;a=tree;f=src/iee...
or more to the point, one that is compile-time configureable with one parameter (bit-width), so the same HDL does FP16, FP32 and FP64. i'd like to make that dynmaically-SIMD-configureable but it'll take some base work in nmigen to do without massive code-explosions.
However, one thing to keep in mind is that below 500nm a lot of the chemicals are extremely toxic and not the kind of thing that garage hackers are qualified to handle in an environmentally safe manner.
Arsenic, phosphene gas, hydrogen fluoride, nasty solvents. I build a lot of crazy stuff in my shop, but I don't even trust myself to dispose of these correctly. If makers like myself get involved in this we're going to end up with a lot of new superfund sites. In residential neighborhoods.
And then of course there's the ion implanter, which none of the fab employees want to spend much time around...
Or maybe we adopt a 500 nm node and stop there :-)
https://git.libre-soc.org/?p=power-instruction-analyzer.git;...
before you even get to that stage, you run the Formal Correctness Proofs and unit tests on the HDL, so that YOU have confidence that the HDL which you're about to generate the GDS-II files from is actually correct and does the damn job.
example of a Formal Correctness Proof for the fixed arithmetic Power ISA pipeline:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu...
runs with symbiyosys, so you end up running SAT Solvers like yices2 and z3.
basically we absolutely do not want to be the people you come to and say, "can we trust your ASIC?" and like Intel they lie to you and say "of course!", we want to say, "don't bloody well ask us, go run the damn tools yourself! oh, btw, if you want help with that we charge USD 5k per hour"
so the VLSI tool can still Place-and-Route them, you can still creaate GDS-II Files, but if you send them to the Foundry, the Foundry will look at you like you have two heads or something and won't talk to you again.
that said: some Foundries have their own Symbolic ("ghost") Cell Libraries, which they send you. you run the VLSI tools with those, then when they get the GDS-II files they SUBSTITUTE the REAL cells for the ghost Cells... and then put that into the Fab.
they do this because they're so paranoid they don't even want you to know what's inside their "Symbolic" (ghost) Cells.
Foundry Symbolic Cells are invariably available only under NDA.
sigh.
which begs the question, how the hell is any information is going to leak out from a completely empty Cell, and unfortunately the answer is: quite a lot. number of layers, what the "stack" is of those layers, distance between tracks, width of tracks, and so on, and the PDK also has to include via sizes and so on anyway.
this starts to give you some idea of the levels of insanity we had to workaround, to meet our Audit and Transparency objectives.
bottom line is until we can bust through these final layers of NDAs, customers who really want to verify the complete GDS-II Files are also going to have to sign a Foundry NDA.
and in the Mars Rover, which is a radiation-hardened 133mhz 32-bit Power ISA system.
Any process can be used well or poorly, and DO-178C isn't really a process, it's a set of objectives that a process must accomplish. When used in good faith, I believe it can lead to software of higher quality than almost any other approach (although, to be fair, at higher software development cost than almost any other approach). That doesn't mean that chanting the document name and using hand-me-down rituals is sufficient to achieve high quality software, of course :-).
So many people asked about this that the foundry had to make a FAQ about it:
https://www.skywatertechnology.com/ufaqs/can-i-customize-the...
Your comment glosses over a ton of critical details.
The most important of them being that even on such an old technology generation (180nm-110nm) no two fabs are so compatible that you can send a GDS designed for one of them to the other unless (a) one of them licensed their process from the other, like IBM/GloFo/Samsung back in the 2010s or (b) you planned for this in advance and designed a custom "least common denominator" process (like MOSIS SCMOS) to target which means making very large performance sacrifices. The (b) approach is much harder than it looks; I know of no examples other than MOSIS SCMOS, and in spite of being the pioneer experts at doing this they had a hard time at 180nm and failed on the following (90nm) generation.
The other, lesser, problem is that no foundry will let you even submit a GDS without signing their NDA. Even if you swear to them that you don't need their design rules for some reason. They don't care. NDA or no chips, not up for discussion. In fact, technically SkyWater still works this way -- to avoid the NDA you must submit through eFabless, not directly to the foundry (maybe this will change someday) and eFabless signed their NDA, then (obviously) negotiated a waiver. So saying "you can work around this problem that the only no-NDA foundry has by just going to another foundry" because there are no other no-NDA foundries, nor are there any on the horizon.
That's because they used TSMC, not SkyWater.
I think you're deliberately creating confusion here.
Also, as the webpage states, they signed TSMC's NDA:
> LIP6 were able to create the GDS-II tape-out under NDA
Sure, if you sign the NDAs, you can use whatever toolflow you want.
Look, I don't mean to in any way denigrate your techinical achievement here, and I have no beef with your project. But the absence of no-NDA foundry access is a huge, massive obstacle to a truly public and free open-source ecosystem, and lately there have been a lot of people and organizations papering over that problem and bamboozling software folk who aren't aware of the issue and its details. Hiding the problem isn't going to get it fixed.
FALSE. again. i do not work for LIP6. i do not work for Chips4Makers. i am an independent *LIBRE* Developer. i have NEVVERRRR signed a Foundry NDA and, having a background involving security analysis and Reverse-Engineering, it would be suicidally and monumentally stupid and counter-productive for me, personally, to do so.
please try to not conflate matters (twice in succession) that you haven't checked or read properly. the best thing to do is to ask questions, such as:
"You're a Libre Project. that has significant implications that everything is entirely Libre. I notice however that you say that someone signed a Foundry NDA? what impact did this have for you? did it stop you from releasing any source code as per obligations of LIBRE Licenses?"
and then i can answer positively and in a friendly way rather than having to publicly waste both my time and that of readers in first unpicking the mistakes, embarrassing you in the process (which risks a public confrontation that annoys everybody even more), and it all goes to hell pretty quickly after that.
answering the question above that you didn't ask: as you know there are about five layers of NDAs in the Silicon Industry.
we've managed to bust through three of those, and so have managed - as a LIBRE Team - to fulfil our obligations both to our funding body, NLnet, under their Privacy and Enhanced Trust Programme, and to Libre/Open Hardware developers by releasing all HDL under LGPLv3 Licenses
https://git.libre-soc.org
and using Libre-Licensed VLSI toolsand using Libre-Licensed Cell Libraries
now, the TEAM THAT DEVELOPED the VLSI tool - signed a TSMC NDA.
NOBODY ON THE LIBRE-SOC TEAM SIGNED THAT NDA.
also, Chips4Makers - the developers of FlexLib - signed a TSMC NDA CHIPS4MAKERS != Libre-SOC
we are three separate and INDEPENDENT teams, working together, to tackle an insane situation, at different levels. i'll say it again: LIBRE-SOC HAS NOT SIGNED AAAANNNYYYYY FOUNDRY NDAs.
are we clear about that, now?there happens also to be another team, Libre-Silicon, also funded by NLnet, who are developing an actual Libre VLSI process and actually developing a mini home-grown Fab.
then there is another NLnet-sponsored project, working with the Libre Silicon team, to develop another Libre-Licensed Standard Cell Library, that is targetted at Libre-Silicon's PDK (when it's available)
https://nlnet.nl/project/LibreSiliconStandardCellLibrary/
however neither of these are ready, so we went with the pragmatic route, after exhausting all other options: the parallel track.It's hard to come up with a good analogy ... it's like you need to write your own serial driver for your new open-source programming language to do any I/O, because you can't call any libraries or OS syscalls because they're all NDA, even on a 15+ year old computer/OS.
LIP6 does actually have a fully NDA-free silicon-proven Cell Library, called nsxlib, it's been used in 360nm and 180nm, the 180nm was done by a Japanese University. i think i may have mentioned this already, it's a small town with a 2(?) micron foundry, they make it available to people anywhere in the world entirely for free, it's for training the employees of the town, because it's so old and basic it's hard to mess it up. so they want people to submit designs that the trainees can learn how to fab, before they move on to the more expensive equipment.
but, really, use Chips4Makers, he has 360nm available, EUR 1750 for 20 MPW chips in QFP, i believe.
Yes, you can just install most of at least Debian, Fedora, RHEL, at least, though it needs an "alt" kernel on RHEL7 P9. There are a few things which haven't been ported, mainly due to assembler, I guess. (PRoot and DMTCP are two I know.) Even x86 SIMD intrinsics will largely work, if not necessarily very efficiently.
The availability of open design rules improves the feasibility of either approach. Any other foundry implementing a 130nm process will be able to refer to the open rules quite directly, with no strings attached - and they will also have very clear incentives to push in that direction. So no, this is not a full fab process, but it's getting remarkably close compared to what we had before.