NATS report into air traffic control incident details root cause and solution

NATS report into air traffic control incident details root cause and solution(nats.aero)

22 points by bigjump 2 years ago | 19 comments

bigjump 2 years ago |

Full report link: https://publicapps.caa.co.uk/docs/33/NERL%20Major%20Incident...

spuz 2 years ago | |

Thanks. One thing that I don't quite understand is how do new waypoints get added as part of the conversion of a flight plan from ICAO4444 to ADEXP format? Does it do some kind of interpolation?

Also, it appears the error was caused by the algorithm selecting two waypoints with the same identifier as the entry and exit points into UK airspace for this particular flight plan. But it also says non-unique waypoints should be at least 4000nm apart from each other so they can be disambiguated. Since UK airspace isn't that big, shouldn't the algorithm have chosen entry and exit waypoints closer to the borders?

Edit: actually it looks like UK airspace extends a few thousands km to the west of the coastline which makes it more plausible that it covers duplicate waypoints.

macguillicuddy 2 years ago | | |

I'm just speculating as to what points are actually added but typically the flight plan includes 'airways' in addition to waypoints. Here's an example of a plan for London Gatwick (EGKK) to Edinburgh (EGPH):

``` EGKK/08R LAM1Z LAM L10 BPK UN601 INPIP INPI1E EGPH/24 ```

And to take a section out:

``` BPK UN601 INPIP ```

In this case, BPK and INPIP are two waypoints (one in the south of England and one in the north). UN601 is an airway that connects these two waypoints. The airway represents a ton of other waypoints between BPK and INPIP but that you don't need to specify manually in the flight plan. I suspect the additional waypoints added are the splitting of the airway into the underlying waypoints - but as I say it's speculation :-)

Rochus 2 years ago | |

"An FPRSA sub-system has existed in NATS for many years and in 2018 the previous FPRSA subsystem was replaced with new hardware and software manufactured by Frequentis AG, one of the leading global ATC System providers."

"At this point with both the primary and backup FPRSA-R sub-systems having failed safely the FPRSA-R was no longer able to automatically process flight plans. It required restoration to normal service through manual intervention."

How can a primary AND it's backup system fail safely??? Who specified this?

"The actions already undertaken or in progress are as follows: 3) A permanent software change by the manufacturer within the FPRSA-R sub-system which will prevent the critical exception from recurring for any flight plan that triggers the conditions that led to the incident."

Means: now they catch the (Java) exception. Great.

jonp888 2 years ago | | |

> How can a primary AND it's backup system fail safely??? Who specified this?

All safety critical systems are specified to halt instead of performing undefined behavior, if they encounter something that cannot be processed. An unsafe failure would be entering undefined behaviour. What would you have specified differently, that would be safer?

A backup is primarily there in case of hardware failures or for maintenance. If it behaves differently to the primary then something is wrong. Can you explain how and why you would expect a backup system running identical software to behave differently?

CaliforniaKarl 2 years ago |

Pages 8 & 9 of the full report have the details of what happened.

> … it was found to have encountered an extremely rare set of circumstances presented by a flight plan that included two identically named, but separate waypoint markers outside of UK airspace.

> This led to a ‘critical exception’ whereby both the primary system and its backup entered a fail-safe mode. The report details how, in these circumstances, the system could not reject the flight plan without a clear understanding of what possible impact it may have had. Nor could it be allowed through and risk presenting air traffic controllers with incorrect safety critical information.

A flight plan came in that had a duplicated waypoint ID at either end of the route. The flight-plan software, when trying to extract the UK portion of an overflight (origin & destination outside UK airspace), ended up focusing on both of those (identically-named but geographically-distinct) waypoints. Software thought they were duplicates, couldn't figure out what the UK portion of the flight plan was, and intentionally crashed. It did so, rather than reject the flight plan for an aircraft that may already be in the air.

Rochus 2 years ago | |

And they shut down the entire system because of one incoherent plan? What a great example of an ingenious high-availability architecture.

CaliforniaKarl 2 years ago |

As an example of duplicate airspace waypoints ("fixes"): Head over to https://opennav.com/, and search "PINTO". You'll find the identifier being used for a waypoint in the United States, in Columbia, and in Chile.

In general: waypoints are five letters, VORs & similar are three letters, and NDBs are two or three letters.

This is an example of how older forms of identification come under stress in a modern world. It never mattered if you had a duplicate-named waypoint many countries away away; waypoints were defined by intersecting lines (typically relative to two VORs), or by a set distance from a reference point (such as a VOR/DME). Plotting a route would make it obvious how the different waypoints fit in, relative to the start/end and intermediate navigational aids (VORs etc.).

But then waypoints started getting GPS coördinates, and were collected into large databases. It's a problem that has been known since it became a problem, but it still causes issues (like leap seconds!).

darkclouds 2 years ago |

> This is the root cause of the incident. We can therefore rule out any cyber related contribution to this incident.

They were hacked. Obviously this is the first time this flight path has been filed, otherwise it would have crashed earlier.

Phone Phreaking, whilst not technically a cyber attack, is still a form of hacking of phone systems.

And email bombs exist which take out email servers and readers, and zip bombs https://en.wikipedia.org/wiki/Zip_bomb

So was this the first instance of a flight path being used as a denial of service and of course the "Blitish" playing down its significance because it doesnt want to offend any one due to its current isolated precarious state?

pixelpanic360 2 years ago | |

Why these baseless conspiracy stands?

pja 2 years ago | | |

Because this post has fallen off the front page, so almost no one is reading it to downvote nonsense like this.