Also, it appears the error was caused by the algorithm selecting two waypoints with the same identifier as the entry and exit points into UK airspace for this particular flight plan. But it also says non-unique waypoints should be at least 4000nm apart from each other so they can be disambiguated. Since UK airspace isn't that big, shouldn't the algorithm have chosen entry and exit waypoints closer to the borders?
Edit: actually it looks like UK airspace extends a few thousands km to the west of the coastline which makes it more plausible that it covers duplicate waypoints.
``` EGKK/08R LAM1Z LAM L10 BPK UN601 INPIP INPI1E EGPH/24 ```
And to take a section out:
``` BPK UN601 INPIP ```
In this case, BPK and INPIP are two waypoints (one in the south of England and one in the north). UN601 is an airway that connects these two waypoints. The airway represents a ton of other waypoints between BPK and INPIP but that you don't need to specify manually in the flight plan. I suspect the additional waypoints added are the splitting of the airway into the underlying waypoints - but as I say it's speculation :-)
"At this point with both the primary and backup FPRSA-R sub-systems having failed safely the FPRSA-R was no longer able to automatically process flight plans. It required restoration to normal service through manual intervention."
How can a primary AND it's backup system fail safely??? Who specified this?
"The actions already undertaken or in progress are as follows: 3) A permanent software change by the manufacturer within the FPRSA-R sub-system which will prevent the critical exception from recurring for any flight plan that triggers the conditions that led to the incident."
Means: now they catch the (Java) exception. Great.
All safety critical systems are specified to halt instead of performing undefined behavior, if they encounter something that cannot be processed. An unsafe failure would be entering undefined behaviour. What would you have specified differently, that would be safer?
A backup is primarily there in case of hardware failures or for maintenance. If it behaves differently to the primary then something is wrong. Can you explain how and why you would expect a backup system running identical software to behave differently?
> … it was found to have encountered an extremely rare set of circumstances presented by a flight plan that included two identically named, but separate waypoint markers outside of UK airspace.
> This led to a ‘critical exception’ whereby both the primary system and its backup entered a fail-safe mode. The report details how, in these circumstances, the system could not reject the flight plan without a clear understanding of what possible impact it may have had. Nor could it be allowed through and risk presenting air traffic controllers with incorrect safety critical information.
A flight plan came in that had a duplicated waypoint ID at either end of the route. The flight-plan software, when trying to extract the UK portion of an overflight (origin & destination outside UK airspace), ended up focusing on both of those (identically-named but geographically-distinct) waypoints. Software thought they were duplicates, couldn't figure out what the UK portion of the flight plan was, and intentionally crashed. It did so, rather than reject the flight plan for an aircraft that may already be in the air.
In general: waypoints are five letters, VORs & similar are three letters, and NDBs are two or three letters.
This is an example of how older forms of identification come under stress in a modern world. It never mattered if you had a duplicate-named waypoint many countries away away; waypoints were defined by intersecting lines (typically relative to two VORs), or by a set distance from a reference point (such as a VOR/DME). Plotting a route would make it obvious how the different waypoints fit in, relative to the start/end and intermediate navigational aids (VORs etc.).
But then waypoints started getting GPS coördinates, and were collected into large databases. It's a problem that has been known since it became a problem, but it still causes issues (like leap seconds!).
They were hacked. Obviously this is the first time this flight path has been filed, otherwise it would have crashed earlier.
Phone Phreaking, whilst not technically a cyber attack, is still a form of hacking of phone systems.
And email bombs exist which take out email servers and readers, and zip bombs https://en.wikipedia.org/wiki/Zip_bomb
So was this the first instance of a flight path being used as a denial of service and of course the "Blitish" playing down its significance because it doesnt want to offend any one due to its current isolated precarious state?
I wonder why they can't reject the flight plan for an aircraft that's already in the air? Presumably they have any number of reasons to reject a perfectly valid flight plan that's been submitted yet alone invalid ones there must be a rejection mechanism?
The explanation offered by the report (from a quick skim found this on page 9) is:
> Having found an entry and exit point, with the latter being the duplicate and therefore geographically incorrect, the software could not extract a valid UK portion of flight plan between these two points. > ... > In this case the software within the FPRSA-R subsystem was unable to establish a reasonable course of action that would preserve safety and so raised a critical exception
The failure is portrayed as a reasonable thing to do and yes it's good the system failed safe rather than continued with a bunch of corrupt data no-one knew about but it seems bizarre that a single dodgy flight plan resulting in the whole system having to shut-down was an intentional part of the system design. It does sound like they don't have strong isolation around individual flight plan processing so an exception thrown there just propagated up to bring the whole thing down.
More damningly the duplicated waypoint names with different positions is a known issue with work on-going to produce a globally unique set of names (from what the report says) so this is hardly unexpected. Surely any decent test plan would have included this scenario?
You need to know everything that may be in the air - if you skip the details of a flight that may be in the air, you risk routing another flight through the same space and the possibility of collision? So if you can't do that safely, the only option is to shut down; existing flights can continue but no new flights can be routed until the anomaly is resolved.
The authors of the report obviously made an effort to suggest this; but then on page 18 they nevertheless admit that "A permanent software change by the manufacturer within the FPRSA-R sub-system which will prevent the critical exception from recurring for any flight plan that triggers the conditions that led to the incident.".
The purpose of a backup system is not to prevent failure - it's to improve resiliency of the system as a whole across a set of foreseen and unforeseen faults. Backup systems failing to handle any specific fault is an expected and predicted behavior. Thankfully in this case there was a backup system that prevented a complete shutdown (and, thankfully, any accident) - the manual processing of flight plans.
Safety is not only about human lives, but also about health and property (also e.g. critical financial and other losses, or reputational damage). The present incident has obviously caused considerable damage. We can only hope that the rest of the system does not suffer from similar omissions and that it is not pure coincidence that even worse events occur.
The first part of this argument is semantics - how do we define failure. The second part is IMHO more important - what decisions are taken with regards to the behavior of subsystems and how they influence overall system degredation. In this case the overall design prevented any loss of operational safety which, to me, is a success.