Trading Program Ran Amok, With No ‘Off’ Switch(dealbook.nytimes.com) |
Trading Program Ran Amok, With No ‘Off’ Switch(dealbook.nytimes.com) |
It looks like they were using some new algorithm, which should have made them a lot of money, had the market gone up after their massive purchases. In that case, they would have pocketed fat bonuses and would not be on the news.
However, it has not happened, so the crying and the search for a scapegoat is on. It sounds like the case of the banking business as usual: 'heads I win, tails you lose'.
Ultimately, there is a really serious problem with the concept of limited personal liability for companies engaging in speculation. It is an assymetric arrangement, whereby the directors are entitled to the profits but are never personally responsible for the losses. With such rules of the game, it is advantageous to take crazy risks. Expect to see a lot more of this and many more taxpayer funded bailouts.
The markets have already recovered. The S&P was down a little bit on thurs and recovered by friday. Knight is down 60%.
http://www.google.com/finance?q=INDEXSP%3A.INX%2C+NYSE%3AKCG
This is ultimately a situation of the market being a robust and stable dynamical system.
Not while it's being shaped by algorithms competing against each other, which you're a part of.
So, they launched a busted market making algorithm, lost a ton of money and no one is going to bail them out.
There is no government money involved.
In other words, it was creating so much volume that, when buying (or selling), it made the market go up (or down). It was then reading the price as going up (or down) and jumping on its own bandwagon. This, of itself, would create growing oscillations in the market and growing losses.
For this to work for you, you need to first create a trend and then sit back and let the suckers pile in on it and take the losses. You then return only when you want to reverse the trend again, at a profitable level (for you). I suspect the program was just too fast for its own good and not a match for the human Masters of this art.
It just kept making markets in reverse (instead of joining the bid and the offer, it bid on the offer and offered on the bid).
Nanex speculates that Knight ran their tester software on the real market (the tester losing money on purpose to the main algorithm). Alternatively, it could simply be a bug that sends a bid instead of an offer and vice versa. One bit flip at the wrong place could cause that.
"Wilson!!! Get back in there and SELLLLLLLL"
On one hand, you'd think the QA in finance would be pretty solid, considering that the survival of the company could be at stake (witness Knight). On the other hand, I have a feeling that even there, people just don't take it that seriously.
Would love to hear from anyone with more experience writing software for these industries.
At some point if I couldn’t stop it - I’d be tempted to just kill the power to the server rooms, all of them. There just has to be a way to cut your losses.
Edit: on a COMPLETELY unrelated note, trading firms/banks are known to actively pursue the extraction of money from their clients with bogus trades/advice http://www.nytimes.com/2012/03/14/opinion/why-i-am-leaving-g...
Turn it off at any cost. If you are forthcoming and transparent, customers will understand.
1) They bought too much stock (incorrectly)
2) realized WTF, stopped everything
2a) more likely their clients said WTF is wrong first
3) had to sell the stock for the rest of the day.
Its only after they sold everything did the $440MM price tag surface. Hopefully they sold most of their positions to goldman (instead free market) so one of their investors made a boatload of cash.. giving them favorable terms for a line of credit.
The most benign ones were developed for use in IT systems. E.g. Apache bench. While these can cause disruptions if aimed at production services, this does not necessarily threaten the health of an entire enterprise.
However, the trend is that all software sectors are starting to adopt this particular technique of testing software with not sufficient regard to what happens if it is released into live systems.
For example, we have chaos monkey, from Netflix, which randomly shutdown services in a cloud based system.
What would happen if software which simulated meltdown at a nuclear facility was accidentally bundled into the build system by a tired operator? Or some one does the same with flight software?
The main software running trading platforms would presumably be supervised by another program to ensure that bad algorithms do not lose e company too much money. However there was no such tool for the component that generated the test data.
To me, it sounds like the supervision should be done at a higher level, e.g. A wrapper around existing APIs. All software running against live systems must call into the wrapper.
Secondly, test software should conduct some kind of verification. E.g. Check for evidence that it is testing against a Test system. This might be the presence of a nonexistent company, et c.
I am more than happy to compile any other ideas you may have so that the IT industry is able to build more fail safes into software.
We are starting to see some of these fail safes in practice. E.g. When you try to send out an email to everyone in the organization, email software may warn you if you are sure you want to do that. The problem is we haven't thought enough about these scenarios that we don't adequately address them.
Incidentally, over in Australia, the Commonwealth Bank suffered a major downtime when it's outsourcer HP accidentally pushed out system wide updates instead of doing this to select machines as originally intended.
Infrastructure changes can be notoriously difficult to back out by simply using an "off" switch, particularly if this was some type of a firmware upgrade that impacted all of their production servers. Backing it out at a minimum would require some type of a reboot, which would cause problems with an active trades. It could very well be that they were running an Active-Active environment, they had to go Active-Passive, back out the changes from the passive environment, reboot, and surgically cut over to the passive environment. This could easily take 30 minutes.
[1] http://www.pbs.org/newshour/businessdesk/2012/06/who-benefit...
"Goldman has agreed to buy, at a discount, the shares that the trading firm had accumulated. Such a move would help Knight by taking the portfolio off its hands and freeing up capital."
What does this mean? Why would GS do this? Why would Knight do this? Couldn't they just sell them on the open market at a better price instead?
As for GS's motivation, they're buying at a discount. Due to the time sensitive nature of Knight's predicament, they're probably trading the portfolio to Goldman at a reduced rate. Unlike Knight, Goldman has the cash to sit on it for a while and sell the shares directly out into the open market, even if it takes a few days. Given the discount they bought the shares at, they're likely selling with a decent margin.
Essentially, they were buying high and selling low. Many times a second.
Does it not go directly against the spirit and purpose of having a stock market with proper investors?
Just looking at things in a big perspective, the fact that the system is designed for allowing trades at such frequencies makes it seem like markets these days no longer exist for the benefit of the listed companies.
Then again, maybe I don't know wtf I'm talking about :-S I guess I don't understand how real value can be created from such a system.
It is kinda weird that all problems in Wall Street are caused by "bugs in software" or "rouge traders": while executives are never hold accountable.
Well 400 million odd dollars in the red later I doubt they still feel that.
I do feel sorry for them and they probably didn't deserve this huge loss. Hopefully valuable lessons can be learned.
Basically, they deployed a new HFT algo and it started buying high and selling low. oops!
Obviously every firm's goals are driven by the goals of those in control. In the case of Knight they are largely a trader driven firm that has arrived late to the algo party. They were looking to get ahead by being one of the first market makers on the NYSE's new retail order matching system and probably cut some corners to get there. From a risk v reward perspective it probably looked like a good bet - with no major competitors customers would flood in and any bugs could be ironed out in live. Unfortunately the 'fat tail' (http://en.wikipedia.org/wiki/Fat_tail) struck and it may have sunk their company.
For a closer look at what went wrong see http://www.zerohedge.com/news/what-happens-when-hft-algo-goe...
In a quick Google search, it seems circuit breakers do not kick in for first 15 minutes of trading: http://www.nytimes.com/2012/08/02/business/unusual-volume-ro...
The regulation just makes it much harder to bring an unsafe product to market, and makes it clearer who to blame when people die.
EDIT: Here's a little light reading for you http://finra.complinet.com/en/display/display_main.html?rbid... (or you could just take sandpaper and rub it against your brain)
From what I remember from those accounts, traders often like to tweak algorithms at the last second and there is little to no QA before changes gets pushed live.
It's sort of a byproduct of the necessity for speed. Algorithms are quite complex, and even running a basic test suite that takes a few minutes may be deemed unnecessary.
It's at stake either way.
QA is not risk free. Time spent in QA is opportunity cost lost. Many things are very time to market sensitive. One must balance "perfect" against "shipped".
This is even more true in (electronic) financial industry.
Their primary functions are acting as an order destination and a market-maker, for efficiency's sake an obvious conclusion would be that both functions are combined in the same software (in a market where microseconds matter). So given the choice of taking a cash hit (a potentially short term affair), or a reputation hit (a much longer term and most likely fatal affair), it's entirely possible Knight knowingly made the right decision.
It's worth note that the eventual deficit amounts to somewhere in the region of one year's net income, hardly insurmountable (and how many investment opportunities promise close to 100% return in a single year?).
Listening to the CEO on Bloomberg, it was clear that minimizing damage to customers was their primary goal (he made this point several times in the 5 minute interview), and that he appeared comfortable with the outcome.
But I think you are right that they tried to avoid an outage. The incompetence, if any, is that they apparently did not know how much money they were losing and still kept the system going. It wasn't a caclulated risk but rather an incaculable one.
I imagine it's not easy to know how much you're losing at any moment in time. They certainly knew they were building huge positions, but knowing how much they were going to lose on those positions requires an estimate of the price at which the positions can be closed (or a hedge).
What I cannot imagine is that it is common practice to leave this kind of decision to an individual's judgement call. There have to be rules for a situation like this. And there's only one sensible rule for a rogue algo racking up unknowable losses. Kill it and deal with the consequences later. Anything else is negligent.
My time writing trading software was never on the automated end of things, so I'm only modestly qualified to comment. But if I were doing the post-mortem on this one, the first thing I'd look for is middle management time pressure forcing a large release without adequate testing. And my standard for "adequate testing" would be pretty high.
If you're going to release something that can take down the company, it's worth making sure it works. In this case, they lost circa 400x the lifetime median income of a US worker. It's hard to imagine the upside that would have justified that kind of risk.
I'm not sure what other word you could use. Total stuff maybe?
> So given the choice of taking a cash hit (a potentially short term affair), or a reputation hit (a much longer term and most likely fatal affair), it's entirely possible Knight knowingly made the right decision.
Except thanks to their competence they have taken a massive cash hit and their reputation lies in tatters.
I suspect the damage to their reputation is so bad they will be lucky to survive.
Technically, it can take seconds to lose that much.
What they did wrong is do all their trading with the same algorithm. Way to put all eggs in one basket.
But as a torrent of faulty trades spewed Wednesday morning from a Knight Capital Group trading program, no one at the firm managed to stop it for more than a half-hour.
http://www.ted.com/talks/kevin_slavin_how_algorithms_shape_o...
The speaker makes an arguement that no one can understand how these algos. interact and are being studied like natural phenomena.
Most of the brokerages who routed elsewhere last week were back using them as of Friday.
The problem here though was that while some stocks had dramatic price movements that might have triggered a limit control, more heavily traded stocks were able to absorb the additional volume and the price did not move significantly. Knight were not doing anything outside of normal bands, they were buying normal volumes of stocks close to the current market price and selling close to the current price. What they were doing was illogical in a profit sense because they were buying at a high price and selling at a lower price and thus immediately losing money. I think it would be very difficult for an exchange to trap this kind of problem.
In all the issue of determining 'normal' trading is very difficult as markets tend to be much noisier than you might expect. The majority of trading occurs near the open and closes of major markets (Hong Kong, London, New York) or data releases (eg US Unemployment) so large spikes in volume and price are a regular occurrence. In equities this is even more difficult as smaller stocks will tend to be more volatile and profit reporting season increases this volatility even further. Markets are ruled by fear and greed, and falsely triggering a limit may cause larger issues than it solves.
Another technique is to provide API keys in the wrapper so that test programs will not have the keys to a live system.
The real problem is that the risk of these test systems have not been sufficiently identified or recognized. We are all too busy creating mock systems instead of devoting sufficient oversight to the development of test software.
Made me wonder what would happen to volume if markets were open 24/7...but I'll leave that thought for another day.
Sometimes, the right people aren't being incentivized to do the right thing.
"It isn't like we found out that Knight was stealing money," Sommers [a CFTC commissioner] said.
The CFTC is just watching carefully to make sure it stays that way.
If you perturb a system over and over and each time it quickly swings back to equilibrium, that's pretty strong evidence it is stable.
"The difference reached a peak at 9:58 a.m., when the volume was six times greater."
That's pretty noticable!
Why is that relevant?
> It's hard to imagine the upside that would have justified that kind of risk.
Actually, it's easy to imagine such an upside. Consider 800x the lifetime median income of a US worker.
Solyndra lost far more of the US taxpayer's money. Are you really suggesting that Solyndra shouldn't have been considered because the amount of money was too large?
How about CA's high speed rail project? Are you really saying that it's a bad idea just because of the amount of money involved?
I'm not claiming that Solyndra or high-speed rail are good investments, I'm just them to demonstrate that the $500M at risk isn't a show stopper. You must consider the return.
There are lots of bets that are that large or larger. For example, every time a company sells for >$400M ....
Because it means that there's no "we couldn't afford to do it right" excuse.
> Actually, it's easy to imagine such an upside. Consider 800x the lifetime median income of a US worker.
Double or nothing on a company that size is a stupid bet.
No, 400x the average lifetime salary of a worker doesn't mean that.
> Double or nothing on a company that size is a stupid bet.
Wrong again. Double-or-nothing is often an extremely good bet. You're ignoring odds of each outcome.
Then again, you're just spewing soundbites and getting details right doesn't help with that.
Ouch!
But creating a bug that loses your company half a billion dollars in thirty minutes and bankrupts them, must be stomach-churning.
This is a risk management failure much more than a programming error.