Ethernet History Deepdive – Why Do We Have Different Frame Types?(lostintransit.se) |
Ethernet History Deepdive – Why Do We Have Different Frame Types?(lostintransit.se) |
IMHO Ethernet is one of the of great examples of backwards compatibility in the computing world. Even the wireless standards present frames to the upper layers like they're Ethernet. It's also a counterexample to the bureaucracy of standards bodies --- the standard that actually became widely used was the one that got released first. The other example that comes to mind is OSI vs DoD(TCP/IP).
Sounds like a cautionary tale: whatever gets released first will stick. If you make a blunder, generations will have to live with it (like IPv4).
Likewise then, IS-IS is the protocol that OSI standardized, but we're not using as part of an OSI system.
We'd get rid of the idea of a broadcast domain. We'd get rid of Mac address and ARP. Switches and routers would become the same device. We'd just use ip addresses for routing, and the 'next hop' would always be the opposite end of the link you sent a packet over.
The world would be a simpler place, and no functionality would have been lost.
What you are describing is more in line with MPLS or Infiniband.
I agree with you frustration. I prefer to design networks that start routing right at the access port or even using an agent, virtual network port, or VPN endpoint at the client or application (like QUIC), but that is very expensive from a resource standpoint.
IPv6 is also another way to get closer to what you are describing.
In my perfect world, we'd move to something like a mashup of MPLS and HIP (https://en.wikipedia.org/wiki/Host_Identity_Protocol)
If want to study something more "routed" and more point to point, look at private mobile networks (5G).
What we don't want is more layers of abstraction... that's making every slow, brittle and impossible to troubleshoot.
So all remnants of the original Ethernet could be removed from wired Ethernet, which does not need layer 2 protocols, while keeping adequate layer 2 protocols for wireless communications. Besides WiFi, there are also long-range point-to-point wireless links, where directive antennas are used at both ends. For these, there is no difference from wired links, so they do not need layer 2 protocols.
huh? where "segment" means where you are using a hub not a switch? cuz that was a long time ago
10base-T1S is just beginning to ramp up in the automotive industry, which modifies the super-successful 100base-T1 to be cheaper by (a) allowing cheaper PHYs; (b) allowing cheaper endpoints due to the lower data rate to handle; (c) allowing lower-spec single twisted pair wiring; and ... (d) allowing multi-drop. This is intended to allow Ethernet to push down into the space that CAN-FD is currently occupying, and looks likely to succeed, at least in some niches.
I think that standard is a huge mistake... 10Mbits isn't enough for a modern vehicle (no cameras, radars, screens etc). Many sensors alone can push megabits, and in the modern world engineers want to send their data json formatted not with bitfields.
Instead they should have used an cdma-like design with the physical being a 2 cent microcontroller for things like bulbs and micro switches. Then, for things like cameras which require more megabits use a 30 cent microcontroller with a higher chip rate, all transmitting in the same bus and using code division to avoid needing to worry about scheduling.
It talks about how when IPv6 was being designed, they wanted to do exactly that: drop most of the layer 2 stuff, abandon the idea of a bus network, make everything point-to-point, all switches would be L3 routers, etc.
Search for “What if instead the world worked like this?” for the relevant part.
My question though, is how would IP assignment work for each of the intermediary devices between me and (say) my ISP’s gateway? My computer is plugged into a switch right now, which is plugged into another switch, which is plugged into my router, which has a point-to-point link to the ISP gateway. Would my router get a /64, then delegate a /68 to the next “router” (ie. The physical thing I currently call a switch), which would delegate a /72 to the next one, etc? How would it determine the optimal IP allocation? What if there’s a cycle? Aren’t we sorta reinventing spanning tree at this point? (I’m genuinely curious about this, because I don’t really grok all of the implications of an “everything is L3” world like this.)
You use link-local autoconfiguration, and use appropriately-scoped multicast addresses to ask "all-nodes" or "all-routers", making autoconfiguration a breeze compared to v4 world. In v4 world a similar setup is also possible, though specific details of the setup differ, and you have to setup addresses manually for each p2p link.
This is another weird thing about networking. As far as I've been able to learn, a "router" is a device with two ports that handles transmission of data between those ports, whereas a "switch" is a device with more than two ports that handles transmission of data between those ports.
But nobody would ever care about that distinction.
I have redundant internet/nat routers at home (overkill!), and they communicate amongst each other to decide which is active and which isn't, but either way, the active one ARPs for the router address with 00-00-5E-00-01-01 as the mac address. The rest of the network just sends off-network packets to 00-00-5E-00-01-01, and failover happens because switches figure out what port is currently using that address.
I share a different mac address for the upstream connection, which is PPPoE (sadly), but same deal --- when failover happens, the new computer starts using the address and everything figures it out, because stations are allowed to move to different ports by design.
Things tend to get a little messier than people expect in figuring out the "what values do I use for the point to point links and how do they get assigned" step of things, though there are some clever answers there too.
Of course that changed when switches came out. I have a 10/100 hub in a closet somewhere for debugging, since it's nice to not have to remember how to get into a switch and set the monitor port.
Token ring equivalents are still used in lots of places. From what I remember cable modem data is basically token ring off of channel 0 (though that may not be accurate anymore).
It's actually much simpler than that. When you transmit you also listen. If what you hear is not what you sent, there is a collision, and you backoff.
We're at the point where we can have millions of packets per second going through a network interface, and it starts to get very silly.
It's at the point where even a 10G connection requires some thought to actually perform properly. I've managed to get bottlenecked on high end hardware requiring a whole detour into SR-IOV just to get back to decent speeds.
Looks like it took some years for one standard to prevail. Also TCP/IP was not clear winner in the early days.
The clock precision (100s of ppm) of the NIC oscillators on either side of a network connection gives a physical upper limit on the Ethernet packet size. The space between the packets lets the slower side "catch up". See https://en.wikipedia.org/wiki/Interpacket_gap for more info.
We could use more precise oscillators to have longer packets but at a more expensive cost.
Ethernet specs the IPG as a fixed number of bits, but it could easily be proportional to the size of the previous packet.
I just now ran iperf3 from my Mac to my Synology without jumbo frames:
[ ID] Interval Transfer Bitrate Retr
[ 7] 0.00-10.00 sec 10.0 GBytes 8.61 Gbits/sec 0 sender
[ 7] 0.00-10.00 sec 10.0 GBytes 8.60 Gbits/sec receiver
Given how rarely I actually care to saturate the 10Gbit link, I'd rather use the slightly hypothetically slower default settings that are highly likely to work in all scenarios.Changing the MTU is awful because parameters like MTU get baked into hardware in the form of buffer sizes limited by actual RAM limits. Like everything else on a network once the network is deployed changing it is very hard because you have to change everything along the path. Networks are limited by the lowest common denominator.
This kind of thing is one of the downsides of packet switching networks like IP. The OSI folks envisioned a network that presented a higher level interface where you'd open channels or send messages and the underlying network would handle all the details for you. This would be more like the classic analog phone network where you make a phone call and a channel is opened and all details are invisible.
It's very loosely analogous to CISC vs RISC where the OSI approach is more akin to CISC. In networking RISC won out for numerous reasons, but its simplicity causes a lot of deep platform details to leak into upper application layers. High-level applications should arguably not even have to think about things like MTU, but they do.
When higher level applications have to think about things like NAT and stateful firewall traversal, IPv4 vs IPv6, port remapping, etc. is where it gets very ugly.
The downside of the OSI approach is that innovation would require the cooperation of telecoms. Every type of connection, etc., would be a product offered by the underlying network. It would also give telecoms a ton of power to nickel and dime, censor, conduct surveillance, etc. and would make anonymity and privacy very hard. It would be a much more managed Internet as opposed to the packet switching Wild West that we got.
As far as most applications are concerned, the hostname is just a string that is interperated by the network layer. Be that through a DNS lookup, or parsing as an address native to the underlying network protocol.
A minority of applications get fancy and request a datagram oriented link, which the network layer also provides (with an admittadly small limit of 65KB that leaks over from that.
Few applications ever go deeper than that.
That's why I'm in full support of a world ending apocalypse that allows society to restart from scratch. We've made so many bad decisions this time around, with packet sizes being some of the worst.
All my hardware at the time was xeon 2690, v1-4. Nics were Intel x520/x540 or similar (whatever SuperMicro was using back then). IIRC, v1 could do 10G easy without TLS, 8-9G with TLS, v3 improved AES acceleration and we could push 2x10G. When I turned off NIC packetization acceleration, I didn't notice much change in CPU or throughput, but if packetization was a bottleneck it should have been significant.
At home, with similar age desktop processors with @ dual core pentium g3470 (haswell, same gen as a 2690v3), I can't quite hit 10g in iperf, but it's closeish, another two cores would probably do it.
In some cases, you can get some big gains in efficiency by lining up the user space cpu with the kernel cpu that handles the rx/tx queues that the NIC hashes the connection to, though.
That's not a good thing for a server that runs a bunch of VMs.
Fortunately SR-IOV exists, but it seems a tad silly to me that I have to do all this weird PCIe passthrough stuff just for this. It's nice, don't get me wrong, but a bit too exotic for what should be a simple setup.
You have to start somewhere. They're going for 25 Gbps:
> In addition to the more computer-oriented two and four-pair variants, the 10BASE-T1,[20] 100BASE-T1[21] and 1000BASE-T1[22] single-pair Ethernet (SPE) physical layers are intended for industrial and automotive applications[23] or as optional data channels in other interconnect applications.[24] The distances that single pair operates at full duplex depends on the speed: 1000m (1km) with 802.3cg-2019 10BASE-T1L; 15 m or 49 ft with 100BASE-T1 (link segment type A); up to 40 m or 130 ft using 1000BASE-T1 link segment type B with up to four in-line connectors. Both physical layers require a balanced twisted pair with an impedance of 100 Ω. The cable must be capable of transmitting 600 MHz for 1000BASE-T1 and 66 MHz for 100BASE-T1. 2.5 Gb/s, 5 Gb/s, and 10 Gb/s over a 15 m single pair is standardized in 802.3ch-2020.[25] In June 2023, 802.3cy added 25 Gb/s speeds at lengths up to 11 m.[26]
* https://en.wikipedia.org/wiki/Ethernet_over_twisted_pair#Sin...
In reality, the wiring harness is one of the more expensive and complex components in a modern vehicle. The majority of the data being carried is low speed and low risk signalling: climate system controls, entertainment system, lighting control, etc. Obviously braking, steering, throttle controls and things like that are a different class.
Look at a modern luxury vehicle and how many things are in one door alone. Accent lights, window controls and motors, locks and lock controls, speakers (yeah, often plural), side mirrors and controls, side-looking cameras, etc. The wiring harness into a door needs consolidation or else it can become a giant heavy thing, the multi-drop approach makes a lot of sense here.
(Wildly oversimplifying here, there are always exception, YMMV, no warranty expressed or implied, may cause blurred vision or a rash)
A switch is a device that handles things like "I need to get a packet from my desktop PC to the printer down the hall". It has lots of ports because, usually, there are a lot of things local to you that you might want to talk to, and you want that traffic as fast as possible without the expense and overhead of 'routing'. If those things aren't on the same L3 network, a switch won't be able to get there[1].
A router is something that maintains a table that maps non-local (not the same L2 network, collision domain, VLAN, whatever) destinations to a 'next hop' based on various metrics[2]. In the general case, routers are concerned with questions like "I'm in Atlanta and I need to get this packet to Tokyo so is the best way to send it via my connection to Comcast or Level-3?".
The degenerate case for routers is a two-port box that does nothing but move packets not destined for something in your local network to your ISP or other upstream network for forwarding to a non-local destination. Since that's the use case most folks see, it's easy to misunderstand the bigger pix. Much is done via various kinds of virtual interfaces now, but I have in my career worked on routers with hundreds of physical ports.
[1] Yes, some switches have a router in them. Stop overthinking. [2] Yes, it's more complicated than that. Stop overthinking.
2013 me: slap the app on a Dell. It'll be fine.
2017 me: aw crap, the nic doesnt support SR-IOV. What do you mean, I need a special driver? Oh lordy, I'm pinning a whole damn CPU just so DPDK can pull packets off the wire?
No SR-IOV on my board, but that's ok.
The previously mentioned issue is that to never buffer packets in a reclocking repeater on a link, you _need_ the incoming packet rate to never be higher than the rate at which you can send them back out, or else you'd fill up/buffer.
If your repeaters are actually switches, this manifests as whether you occasionally drop packets on a full link with uncongested switching fabric. Think two switches with a 10G port and 8 1G ports each used to compress 8 long cables into one (say, via vlan-tagging based on which of the 8 ports).
If you want to use jumbo packets on the internet at large, you need to have working path MTU detection, which realistically means at least probing, but you really should have that at 1500 too, because there's still plenty of broken networks out there. My guess is you won't have many connections with an effective mtu above 1500, but you might have some.
I'd be willing to bet my lunch more 10x more places have been moving down to assuming 1280 byte connections (since IPv6 guarantees it) than have been peering on the internet at >1500 (not counting 1504 for VLAN tags and whathaveyou).
Was developing OSI applications in 1989, you could order an X.25 circuit from your PTT and run OSI over that using the ISODE [1] toolkit.
The earliest ISP in my country didn't start until 1992.
[1] https://en.wikipedia.org/wiki/ISO_Development_Environment
before commercialization happened, IP was mostly the realm of government and education.
Say I have a global unicast address on my desktop, 2 hops from my router, and I want to allow traffic to it. My router gets a packet sent to it over its link-local address, with a destination header of my desktop’s IP. Say it has 4 ports (each going to another router/switch, each with its own link-local address.)
How would it know which port to use as the next hop? It would need a routing table, and that would need to be configured automatically if we want to work as well as switches do today. What would be the protocol for this auto configuration? BGP or something like it? How do the routers know the available address space? Or are we just stipulating that we’d invent a protocol for this, if it had ever happened?
In ethernet we have the Spanning Tree Protocol for this, to discover the topology of an Ethernet network and know which links to use for which MAC addresses (including the ability to detect cycles.) I feel like something like spanning tree would still need to exist in an all-unicast, no-ethernet, L3-only world. Does such a thing exist already, or would we need to invent it in this counterfactual universe?
For both IP, OSI, and routed ethernet, you run a complete routing protocol (IS-IS actually will handle both OSI, IP, and routed ethernet) where routers tend to send "hello" packets to announce themselves on the links to establish communication. This is how IP routing works since some of the oldest routing setups. On IP-bound protocols like OSPFv2/V3 or BGP you have to configure connections between the routers then tell them to peer with router at given IP, with IS-IS it has its own independent low level mechanism so you connect to a link and tell it which router ids to trust/peer with.
I guess since IPv6 solves the address allocation problem natively (with SLAAC), it’s not allocation you need to worry about so much as “what IP’s does this router see vs that router, recursively”, which it sounds like IS-IS can provide.
This would mean each router box in my network would grab its own random (SLAAC) address within the /64 advertised by my “main” router (found via multicast all-routers), and would then each (through IS-IS or something similar) announce and forward individual /128 routes for each host plugged into them. The network would converge such that each individual box would have a complete routing table of what next-hop to use for each IP address in the tree.
I can see this being a preferable setup if it allows you to completely eliminate layer 2 from the equation… alas this never happened so it’s all a thought exercise.
It was popular primarily because OSPF didn't scale well with hundreds to thousands of routers with the minimal CPU power even large core routers had.
But for a telco, the fact that you didn't have to setup IP connectivity in order for routers to see each other, was also a crucial ability.
but yeah, I saw one piece of SONET crap hanging off a ctunnel that spoke CLNP back in maybe 2012 or 2013? haven't seen much of it, but still learned the ctunnel stuff back in '09 because it still could occasionally rear its head.
While X.500 is a precursor to both of these things, and influenced both of these things, and both of these things interoperated with X.500, they are not X.500. X.500 is for all intents and purposes pretty much dead in 2024, although I did deploy an X.500 based directory service in 2012 and it's probably still alive and running.
The semantics are the same for the accessed store, IIRC. The protocol definition explicitly talks about being used to access X.500 data stores as complementary interface to DAP (which requires lower layers of OSI stack vs LDAP's raw stream of bytes)
This also meant autoconfiguration or supporting non-ip protocols (like routed ethernet) is simpler, though I will admit that OSPFv3 adding default use of link local V6 for router interfaces made the gap smaller.