Tesla Transport Protocol over Ethernet (TTPoE)

Tesla Transport Protocol over Ethernet (TTPoE)(github.com)

167 points by super_linear 1 year ago | 92 comments

throw0101a 1 year ago |

Previous discussion from ~month ago, "Tesla’s TTPoE at Hot Chips 2024: Replacing TCP for Low Latency Applications":

* https://news.ycombinator.com/item?id=41374663

* https://chipsandcheese.com/2024/08/27/teslas-ttpoe-at-hot-ch...

nimbius 1 year ago |

Tesla seeks to work to standardize a new high-speed/low-latency fabric (be that TTPoE or otherwise) for AI/ML/Datacenters however theres nothing inherently abject about TCP as it exists today. RDMA over Converged Ethernet suffices perfectly well for whatever an "AI/ML/Datacenter" is and if we're being fair, the lackadaisical approach to the documentation suggests that they may not be taking it as seriously as they could anyway.

If Tesla were really seeking to shake things up they wouldnt have picked IPv4 to do it when the newest release has been around for nearly 30 years and has latency reduction baked in.

this smacks of a pandersome attempt from a company that sees the quite mandarin writing on the walls and has decided (in true Muskovite fashion) they too are just a misunderstood font of futurism.

pclmulqdq 1 year ago | |

RoCE sends huge packets down the wire.

TCP has the wrong abstraction for truly high performance.

I wouldn't necessarily standardize what Tesla does here, but most of the big companies have their own layer 3 transport protocol for things that need truly high speed and are operating within a datacenter.

Cray/HPE has their own Ethernet-based protocol (Slingshot was an earlier version of it - not sure what its name is now) which seems to be better than whatever Tesla has, but is not necessarily published.

digitallis42 1 year ago |

I did a skim and didn't see any explanation of why one would want it over TCP. Did I miss, or is it non obvious?

lloeki 1 year ago | |

From a cursory look:

- looks dead simple

- no IP layer (there's a ttpip folder in that repo though)

- distributed congestion control (TCP has a "window" field + a bunch of tentative RFCs, this has a purposeful "congestion")

- 100% implementable in hardware (TCP can, but it's complex)

Not a general TCP replacement, but the README properly highlights a "many endpoints local link" use case:

> the protocol executed entirely in hardware and deployed to a very large multi-ExaFlops (fp16) supercomputer with over 10s of thousands of concurrent endpoints. This protocol does not need a CPU or OS to be involved in any way to link and execute.

cyp0633 1 year ago | | |

In Tesla's presentation slides, "Tesla Transport Protocol Over Ethernet (TTPoE): A New Lossy, Exa-Scale Fabric for the Dojo AI Supercomputer", they mentioned that the network layer is optional (but not removed)

dboreham 1 year ago | | |

Resume-driven engineering.

Beretta_Vexee 1 year ago | |

I think it's better to think of it as a fibre channel protocol rather than TCP. It's intended for use on managed internal data centre networks. It skips OSI layers to gain speed and probably do 100% hardware routing with FPGAs.

It's of no interest on the internet or any small scale netwwork.

KaiserPro 1 year ago | | |

> fibre channel protocol

Apart from FC is is explicitly lossless and ordered

delfinom 1 year ago | | |

Elon just doesn't want to pay Nvidia for Infiniband. Lol

kardianos 1 year ago |

There was a talk about this prior. This was used in place of TCP, but where TCP is designed to run over unreliable networks, this protocol achieves speed and latency figures comparable to others, while still being able to retain commodity IP switches in the cluster. By having a fixed buffer, no lingers, faster opens, they increase the speed and latency, without going to dedicated vendors or other stacks.

vardump 1 year ago | |

> they increase the speed and latency

I suppose you mean "increase the speed and decrease the latency"?

kardianos 1 year ago | | |

Yes. Typo.

FuriouslyAdrift 1 year ago |

Be interesting to see how this stacks up to the dominant protocol in supercomputers/ai clusters : Infiniband.

nine_k 1 year ago | |

AFAICT this is very much about handling unreliable links and congestion control.

Infiniband instead makes the sides bargain to avoid packet loss, while the medium is supposed to be reliable.

throw0101b 1 year ago | |

> Be interesting to see how this stacks up to the dominant protocol in supercomputers/ai clusters : Infiniband.

As mentioned in README, this was submitted to the larger Ultra Ethernet consortium for consideration:

> Deliver an Ethernet based open, interoperable, high performance, full-communications stack architecture to meet the growing network demands of AI & HPC at scale

* https://ultraethernet.org

glzone1 1 year ago | |

I thought infiniband was more expensive and that even AI where bandwidth is super important was trying to get away from it towards cheaper options.

electronbeam 1 year ago |

Is it missing a license?

pnw 1 year ago | |

The header files all say it's GPL 2. But yes, they should have a license file at the top level.

iamleppert 1 year ago |

How is this better than UDP? Or for that matter, just plain old Ethernet MAC addressing? You can achieve lower latency and speed (than this) if you don't care about reliability in your transport layer.

This reaks of NIH.

mannyv 1 year ago | |

I worked with a company that wrote its own protocol for Ethernet and got almost wire speed. It was worth it for 10, but not worth it at 100mbps.

You can always beat general purpose solutions like the TCP/IP/UDP stack if you try. For most it isn’t worth it.

leetharris 1 year ago | |

Did you even try reading the README?

- TTPoE is designed to be implemented at hardware level unlike UDP

- UDP cannot guarantee transmission whereas this does

- TTPoE is built for distributed resilience

bilekas 1 year ago |

> Some variables may have changed slightly without documentation updates, but we're sure you can figure it out

I hope they're not hoping for mass adoption with an attitude like that. Not exactly inspiring confidence in the longevity and maintainability.

Sysreq2 1 year ago | |

I don’t think mass adoption is their goal. They had a problem. They solved that problem. They shared how they solved said problem.

Every engineering company releases stuff like this. It’s not meant to change the world. It’s marketing to recruit other engineers who would find that problem interesting.

bilekas 1 year ago | | |

> I don’t think mass adoption is their goal.

I'm not so sure about that.. FRom the repo :

> Tesla also announced joining the Ultra Ethernet Consortium (UEC) to share this protocol and work to standardize a new high-speed/low-latency fabric (be that TTPoE or otherwise) for AI/ML/Datacenters

Also it's a protocol, personally I will only use a protocol that's fully spec'd. It's a pain sometimes to have consensus among all contributors but it's valuable.

> edit : I will only use a protocol that's fully spec'd IN PROD

RajT88 1 year ago | |

At least they're honest about it.

This is currently the state of much modern documentation from huge tech companies.

serf 1 year ago | |

it feels more like a way to sweep liability away rather than a real warning..

..which also does not inspire confidence.

glzone1 1 year ago | | |

Why does this not inspire confidence in Tesla. Their internal software stack is available to their own developers who can review what is actually there.

Why does it have to be perfectly documented in a public github? Are all other car companies "properly" publically documenting things in github?

Does it inspire more confidence in VW's software stack if they don't share it? Is VW's confidential stack some big competitive advantage? I've used a VW ID electric vehicle. I did not come away that impressed.

elcritch 1 year ago |

Twice now I’ve been excited that this was for realtime ethernet used in teslas vehicles. Alas, it is not.

sgu999 1 year ago | |

Any reason to believe they don't use one of the standard industrial protocols like the poorly named EtherNet/IP?

kvmet 1 year ago | | |

Licensing probably?

CAN (or one of its more modern variants) are historically more common in automotive. However with 2-wire Ethernet connections becoming more commonplace I do think you're right that more and more cars will be moving to ethernet fieldbus.

EtherNet/IP is not as robust for many applications as its competitors (PROFINET, EtherCAT) since it is not fully deterministic. EtherCAT is my personal favorite.

MisterTea 1 year ago | | |

Please no EIP, its utter crap and designed by an OOP huffing committee. The only serious protocol is EtherCAT with honorable mentions for Sercos 3 and Ethernet Powerlink (CANopen over Ethernet).

LeifCarrotson 1 year ago | | |

Of all the (current) industrial protocols they could have picked, Ethernet/IP would be the worst.

Its only advantage is that it can coexist with other TCP traffic and run over standard switches, but that just results in unreliable fieldbus performance.

high_na_euv 1 year ago |

Really interesting

thelastparadise 1 year ago | |

Why?

high_na_euv 1 year ago | | |

Recreating foundational infra doesnt seem so common, especially for car company

KeepOnTruckin1 1 year ago |

[flagged]

dang 1 year ago | |

Can you please not post like this? Regardless of who you're talking about or how you feel about them, it's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.