Elderly patients 23% more likely to die if surgery is on the surgeon’s birthday(psychnewsdaily.com) |
Elderly patients 23% more likely to die if surgery is on the surgeon’s birthday(psychnewsdaily.com) |
(Im most careers, do people even take their birthday off?)
There's a reason physicians don't consider themselves scientists. If they make an error they can claim they knew from Tradition/Art/Authority.
It's politically impossible to outspend the US Physician Cartel (AMA), so I doubt we will have a Science based alternative or science based reform.
Funny the website that hosts the study* even displays an "altmetric" chart showing how many media talk about it, how many tweets etc. Well done science :)
It actually wasn't that many surgeries delayed, as the surgeon just juggled surgeries and consults/paperwork/insurance to fit.
If this is fairly standard practice, then an afternoon birthday surgery would be an emergency situation and, hence, more deadly. Given the paper said some surgeons take the day off entirely, any surgeon with that habit would be performing an emergency surgery.
The problem is amusingly circular. Even if you reject the conjure in parent comment, you will be tempted to reduce the number of birthday surgeries due to the increased mortality. This will mean that birthday surgeries are only done in even more desperate circumstances which of course will increase the risk.
So mitigation of this problem will lead to the percentage increasing even more! Actually, it turns out that it is possibly better if the percentage is high!
> The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days.
It seems to me that the analysis is quite carefully done:
> Findings were qualitatively unaffected by: using in-hospital mortality instead of 30 day mortality; additionally adjusting for the timing of the surgery; including both hospital and surgeon fixed effects in the same regression models; excluding potentially outlier surgeons with the highest mortality; using logistic regression models instead of linear probability models: using random effects models instead of fixed effects models; restricting our analysis to surgeons who performed procedures on their birthdays; additionally adjusting for the day of the year; or excluding surgeons who were born on the outlier birthdays (supplementary eTables 5-13). [...] The study findings were qualitatively unaffected when the analysis was restricted to procedures with the highest average mortality or to patients with the highest severity of illness (supplementary eTables 16 and 17).
It is not even clear that there actually is a problem. It's just a weird way to slice the data to produce an effect.
Shouldn't hospitals have multiple surgeons?
If one of them is on vacation, the other one does the work, and vice versa?
This is spot on. The causality can be both ways.
Note that it would be interesting to dig into what is really computed here, because the whole wording seem intentionally sensationalistic.
1) "23% more likely to die" seems _huge_, but it applies to an already very small chance. The mortality rate just goes from 5.6% to 7%. Using this logic, moving from 0.1% mortality rate to 0.3% would mean "you are 3 times more likely to die".
2) Comparing mortality rates only make sense if the distribution of operation complexity are identical for these days. As the parent post suggest, it seems very likely that low complexity operations are postponed after a surgeons birthday.
3) Where are the confidence intervals? I refuse to even consider looking at a statistics if error boundaries and significance metrics are not provided.
That may very well all be provided in the underlying paper, but the article itself does not really discuss these points.
But that is indeed precisely what it means. The 737 MAX might have increased the accident rate from 1 in a million to 3 in a million, and that would have been a tripling. That is not sensationalistic.
> 3) Where are the confidence intervals?
In the paper: "(7.2% v 5.6%; adjusted difference 1.6%, 95% confidence interval 0.4% to 2.8%; P=0.01)"
As pointed out by jlebar, this is controlled for by comparing similar emergency surgeries.
"The patients were all Medicare beneficiaries aged 65 to 99. They had all undergone one of 17 common emergency surgical procedures between 2011 and 2014."
e.g., someone has a run of the mill cholecystitis that needs to come out. It can go when there's an opening in the surgical schedule, or tomorrow morning. That's an "emergency" - it came in through the ED, wasn't elective.
Then there's the person w/ chole that looks septic and you're afraid they're going to perf or already have. That person is going to the OR now.
Under Medicare coding, both of those are lap choles, CPT 47562. This doesn't control for that at all, except in the broadest of ways.
Also, a 65yo surgical candidate and a 99yo surgical candidate are wildly different. 99yo isn't going under the knife for anything other than immediate threat of death or unendurable pain. In the lap chole example above, I'm going with a trial of abx in the 99yo unless he's absolutely about to perf; 65yo, sure, let's take the gallbladder out - once he's progressed to sx chole, odds are really good it'll have to come out within the next two years. I think most surgeons would rather do it at 65 than 67.
Looking at the 2x2 of 65, 99, emergency, and EMERGENCY, you capture an incredibly large variety of severity and risk.
TFA says they're only counting emergency surgeries, to avoid exactly this bias.
My father is a surgeon at a small hospital and my mother just got her hip replaced a few weeks ago (at a different, larger city hospital) and the first thing he insisted on was she was scheduled as the first patient of the day.
Getting surgeons to adopt the kind of "It's obvious but point and speak or you're fired"-style checklists a la operating an aircraft has reduced complications (from the minor to deaths) by several percent in the NHS. It's perhaps worrying given how low-hanging some of these fruit are - i.e. "Do we have the right patient?".
> The patients were all Medicare beneficiaries aged 65 to 99. They had all undergone one of 17 common emergency surgical procedures between 2011 and 2014. Examples of those 17 procedures included cardiovascular surgeries, hip and femur fracture, appendectomy, and small bowel resection. The study focused on emergency surgery, so as to minimize the potential selection bias. For example, surgeons might otherwise choose patients based on their illness severity, or patients might choose their surgeon.
>The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (eg, Christmas and New Year) and weekends, which have been argued to affect the quality of patient care.
Everyone had good comments, good healing, very impressive track record, etc.
Everyone but one patient. She complained on how her surgery was rushed, how quick she was out of the operating theater, how she is scarred from it, etc.
At first, I thought she was lying. Why was she such an outlier?
I asked her for more details. It turns out her surgery was in the afternoon of the last friday before the Christmas holidays.
After knowing this, I made sure to schedule in during the middle of the week aways from holidays. My surgery went well.
I am glad to finally have statistics on this gut feeling I had. 23% is a LOT, and the 30% for the holidays is even worst. I had made a mental note to avoid those days, now it's become a rule.
So, only a fraction of the surgeons performed operation on their birthdays. Would be interesting to compare the outcomes using the same surgeon group: surgeries on birthday vs same-surgeon surgeries on other dates.
Is this partially that better/more experienced/senior/more affluent doctors are taking their birthday off, hence people getting more junior doctors?
It would be interesting to see how many more people die on a particular surgeons birthday compared with the rest of their year, rather than the birthday of the surgeon a patient gets.
Don't know if it applied to surgery though.
I wonder if this applies to other domains as well.
When I hear of these funky effects I always wonder how they relate to AI. Presumably this is somehow related to some kind of lossy compression of the state of the world, maybe similar to principal components[0]? Where the "I feel grumpy" component is a mix of defendant-is-guilty, temperature-is-low and my-team-lost.
It might also be related to the binding problem[1]?
[0]https://en.wikipedia.org/wiki/Principal_component_analysis
Actual data: 2064/980876 operation on bday 6.9% die, rate = 142 death
Expected: (if we use non bday death rate)
1/365=0.0027 2687 operation With better 5.6% death rate = 150 death
So deaths are only 6% more, so basically they are doing less and more emergency operations on birthday, so death rate is increasing it seems.
It means they are making 23% less operations on birthday then any normal day.
So, I could imagine, that the higher floors are occupied by more wealthy people, who might be less prone to alcoholism. This theory, however, doesn't check out for high rise prefab houses. But it could perhaps cause the deviation?
There was no info on building types or sqft/m2, but lower and higher floors are historically cheaper here (1st is little loud/non-private and last is little cold and flood-prone).
That doesn't take exclude the possibility that surgeons may be assigned different patients on their birthdays. Some studies on the 'weekend effect' [1] seem to also control for illness severity, not clear if that was done here. [1] https://en.wikipedia.org/wiki/Weekend_effect
"While we welcome light-hearted fare and satire, we do not publish spoofs, hoaxes, or fabricated studies."
https://www.bmj.com/about-bmj/resources-authors/article-type...
It also is worth noting that this is published in the BMJ Christmas Issue (https://www.bmj.com/about-bmj/resources-authors/article-type...), so it should be taken with a grain of salt.
(Sorry if that’s already been suggested)
>The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (eg, Christmas and New Year) and weekends, which have been argued to affect the quality of patient care.
[1] https://web.archive.org/web/20201216100507/https://www.psych...
Is there any reason Medicare would play a role in the surgeon's performance?
It's less pay, more paperwork.
Sucks if you're on Medicare. You're stuck with the doctors that do accept it, which probably weeds out some of the better practices.
Healthcare and getting old suck in the US. Think about how this might affect, say, your parents. And eventually you.
If something is 23% more likely to happen, you have to look at the original probability to get a handle on whether that increase is serious.
For example, if something has a 5% chance of happening, and it’s now 23% more likely, that means it now has a 6.15% chance of happening.
For example:
ARTERIAL LINE
• several sterile alcohol skin wipes • 3cc syringe with 25g needle [for skin infiltration of local anesthetic at puncture site] • bottle of 2% lidocaine with epinephrine 1/1000 • 2x2 cotton gauze pads to use for pressure on failed puncture sites • 3 #20 gauge plastic catheters (22 gauge for small children) • 2 surgical towels to drape over hand and lower arm to absorb blood that accompanied successful arterial puncture • size 7.5 sterile surgical gloves for me to wear while performing procedure • specialized 1" waterproof plastic skin tape to secure and protect catheter in situ
I was constantly amazed by how my colleagues would have to stop and wait for something not present in the unit they were called to. Conversely, when I was called because of an inability to insert an A-line, as they were referred to, and wasn't in a place where I could assemble my desired materials, I'd proceed with the materials at hand, all the while thinking "this could have been done a lot better...."
It turns out that while it's a good idea to check things this "low-hanging", the value is far larger than catching wrong patients, so don't worry too much!
Much of the value comes from essentially disrupting routine with an opportunity to stop, and from creating a culture of speaking up. I think the NHS was the organisation to trial having the nurse run the checklist, which had the effect of empowering the "lowest level" person in the operating theatre. Studies showed that even just having everyone in the room speak once increased the chance of subsequent communication, and ultimately improved patient outcomes.
Atul Gawande was one of the key people in designing and rolling out these checklists and wrote a book about it that I'd recommend – The Checklist Manifesto.
Known as pointing and calling in transport AFAIK.
It's similar to why formal verification is so important in hardware, because it's effectively forcing you to be specific about semantics and let's the computer walk through things for you.
In aviation, standard phraseology is generally carefully designed such that (mosts) subsets of a phrase are distinct from the opposite phrase. For example, when ATC warns of traffic, you reply either "traffic in sight" or "negative contact". When ATC hears only half of either, they still know what you meant.
You don't realise how "bad" normal operational discipline is until you've seen it done right. The risk isn't so much that people skip a step, but that phantom steps start creeping in because people aren't quite sure of the standard, that saps a surprising amount of resources away which could be used for checking mistakes. And then people get disorganised and potential holes appear.
A big part of excellence isn't doing the right steps, it is trimming out the steps that don't need to be there, to focus attention on the stuff that works.
Regarding your checklist, you should read about Semmelweiss: https://en.wikipedia.org/wiki/Ignaz_Semmelweis
It's crazy to think that entire generations of doctors retired and died arguing against washing your hands!
sad anecdote: a friend of mine got surgery to remove a problematic mole on his back, and not only the surgeon got the wrong one, but even got upset with him when my friend mentioned something like "I thought it was further up my back".
Note that in the debate you cited both, proponent and opponent, advocated the (continued) use of checklists.
Side note: I could see how you could do a blinded RCT, but not how you could do a double blind RCT here.
to a place where the protocol is "are you [name]?"
The difference is unnerving.
You would excepct that it is the day after your birthday the surgeon is worse (risk for hangover or less sleep).
How far off exactly would it need to be before it can be random chance?
Perhaps the link could be changed to the paper itself?
Especially since OP's link won't load.
Put another way, assuming a flat distribution, on any given day 980876/365 = 2687 surgeries happen. While apparently on their birthday it’s 2064.
Saying ‘only a fraction of the surgeons performed operation on their holidays’ is a strong exaggeration. It’s about 75% of surgeons.
They fit a model using both the case order (1st, 2nd, 3rd, etc) and the time elapsed. Either is significantly related to the case’s outcome, but when both are included, the rank explains away the time elapsed. This is obviously not compatible with increasing hunger/decreasing blood sugar, since those should depend on wall-time.
The parole board considered cases from one prison at a time. Within each prison, prisoners representing themselves went last and they tended to fair worse than those with attorneys. The judges tended to take meal breaks between prisons, and....poof, there’s the result.
If you refer to AI there are many examples where the training data is biased. One funny example was enemy tank recognition that saw enemies whenever there was gloomy weather, because the sample images of enemy tanks all where shot at such weather conditions to make them appear sinister to human eyes.
If you refer to a mental model, I guess it might simply be a resource management problem. Just because we do not experience the distraction actively it does not mean it is not there. How exactly distraction is compensated is irrelevant to this explanation. Explaining this with mathematical terms is probably pretty arbitrary and leads to framing (in a psychological sense). But I also like speculating on AI ;)
So if, in a study like this, they looked at differences in morality rate between: 1) men and women, 2) younger and older people, 3) younger and older surgeons, 4) surgeries in summer vs winter 4) surgeries in morning vs evening 5) etc etc.... And it came down all the way to n) surgeries on birthday - to find an effect. Then it would be almost guaranteed that such a finding is spurious.
Too long
Correlative causation is more correlated with causation than non-correlative causation
Too tongue twisting
Correlation is closer to causation than no correlation
I like that
Ignoring possible causation is correlated with stupidity
Too cruel
Correlation does not equal causation but they are correlated
Not bad
Anyone else?
I agree that the incessant belaboring of the difference between correlation and causation in these types of threads is tiresome, but I don't think it applies in this case.
On the meanwhile, a news reporter gets your (probably spurious) correlations and announce them for the entire world as "the TRUTH! science says so, and you don't doubt science, do you?"
That's basically how science gets done on any complex field where we can't test things directly.
They never said that it did (unless they've edited their comment since you replied to it). They just said that the percentage [of deaths on surgeons' birthdays] will increase, and that is correct.
Hand-off is also something that could, in principle, be improved whereas long shifts are just butting up against the limits of human physiology.
All in the end to be taken with a grain of salt of course, as the planning component of operations eliminates most statistical properties :)
``` operations = np.random.randint(0, 365, 980000) pd.Series(operations).value_counts().sort_values() / 980000 ```
``` x = np.random.poisson(.0089, 1000000) x = (pd.Series(x).cumsum() / 24).round() x.groupby(x).count().iloc[:365].sort_values() / 980000 ```
I don't know which of these is true, but despite the apparent statistical significance of the finding, I wouldn't be confident assuming that the result is generally applicable. While not impossible, it strikes me as suspicious that they found no differences whatsoever in the surgeons' birthday vs non-birthday schedules. I somewhat wonder if by "no difference" they really meant "no statistically significant difference", which in this case wouldn't justify their lack of adjustment.
So what mechanism is responsible for that reduction, and is it likely to affect surgeries differently based on how urgent and specialized (and therefore dangerous) they are? Since the authors restricted it to surgeons that have done at least one surgery on their birthday, that rules out blanket "never on birthday" policies. It seems like the only mechanism that wouldn't affect them differently is "the surgeon is already on vacation in another country and can't get here for the operation" (and they choose to take vacations on their birthday more frequently). One could probably check vacation-day records relatively easily...
From the paper:
"The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days."
In this case, based on another comment above about "emergency procedure" having multitude of meanings, you're most likely wrong in that the paper has a rebuttal to the top post. The hypothesis then, is that the actual urgency of surgeries is not controlled precisely enough to state that they cannot affect the measurement.
> The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days.
> we found that patients who underwent surgery on the surgeon’s birthday were __similar in all observable characteristics__ to patients who underwent surgery on other days
Edit: nevermind, found it: https://www.bmj.com/content/bmj/suppl/2020/12/09/bmj.m4381.D...
Edit 2: it looks like solid, peer-reviewed research
I personally was under the impression that once you left childhood and the impatience of getting presents your birthday pretty much became just a day as usual. That also seems to be how my college treats it but it might be cultural. Europe is not a homogeneous place.
Well, if you're lucky, your co-workers will chip in and buy a cake!
People often bring cake to work on birthdays, though.
Then again, I grew up having my birthday in the "herfstvakantie" and the habit kind of stuck :-)
But that's the fallacy. You can't just preemptively assume that there are no real correlations.
You definitely want to use a smaller p threshold when you look for more things, but it's quite possible to hit real correlations with a pile of plausible hypotheses.
As an example: Let's say just 1/150 of your hypotheses hit a real correlation, and you're inappropriately using a p<.05 test. Tiny signal, huge noise. But even in that pessimistic case, more than 10% of your positives are real. Far from a guarantee.
Yes of course. But the trouble is that, if you do this p-hacking expedition, you are guaranteed to find those correlations in pure noise. So if you use a procedure that will find something in noise - you cannot also use it to claim to have found something in your data.
In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false". In this case no refutation would have occurred if the correlation is false. Hence - the result is equivalent as if no test has actually been performed.
Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens". Some astro-physicists then describe that all the observed properties of that object behave just as we expect them to behave in the case of an asteroid. But the person might then reply with: "yeah, but it still might have been aliens".
I feel that the same is true for "yeah, but the correlation might still be true".
Sure, one weak result out of many doesn't pass. But not passing is a far cry from "almost guaranteed" to be spurious.
> Hence - the result is equivalent as if no test has actually been performed.
A result like that takes a big list of plausible correlations and distills it down. If you think even a handful of the original list items are likely to have merit, then the distilled list is useful for suggesting where you should collect more data.
> Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens".
What fraction of asteroids to you expect to be aliens?
If it's one in a billion, then cutting the list by a factor of 20 is useless. If it's one in a hundred, then cutting the list by a factor of 20 is very helpful.
> I feel that the same is true for "yeah, but the correlation might still be true".
It depends on the original list being sufficiently plausible. You can't distill tap water into vodka.
One simple way we can at least mitigate that problem is by requiring far lower p-values (or wider CI's), and where that's not feasible, require a much clearer-eyed explanation and acceptance of the fact that such research cannot be trivially supported by statistics, and instead additionally requires careful experimental setup and consideration of causal networks.
Basically: if you have p = 0.0001 or whatever I'm more willing to believe that publication biases and multiple testing aren't super likely to cause false positives that often. But without that, you want a clear hypothesis and proposed way to test published beforehand, and just one test, and ideally a clear hypothesis about causation etc too, so you can critically push and prod the results to try and distinguish noise from signal. A p=0.03 just isn't very obvious, at all.
In general, I think modern science is too reliant on statistics over complex systems, and in the effort to tease out significance then needs to try and correct for all kinds of known interference (confounders) and other effects; thus then need to use more advanced statistical models and less general assumptions about distributions (whether for significance, or for mathematical tractability), that it's just very hard for anyone to say they didn't make some systematic error somewhere. And sure, being an expert in the subject matter and having an expert statistician on hand helps, but making reasoning errors is too easy; too human to reliably avoid. Instead of seeking signals in noise, we should be targeting research more narrowly to parts of the puzzle that we can measure better, then use classical plain logic to put the pieces together - not try and measure the whole thing in one go. After we put all the well-measured pieces together, validating with tricky statistics is reasonable as a sanity check, but not much more than that. If common sense is hard, statistics is harder, even for statisticians.
Interpreting results like this as any more than "huh, that's something we could look into" is unwise.
They explain the reasoning for selecting this hypothesis to test:
>Operations performed on birthdays of surgeons might provide a unique opportunity to assess the relationship between personal distractions and patient outcomes, under the hypothesis that surgeons may be more likely to become distracted or feel rushed to finish procedures on their birthdays, and therefore patient outcomes might worsen on those days.
Incidentally, interestingly, that is how bias in real world often works - people making different assumptions for different groups in absence of evidence.
EDIT: This need not be done maliciously, that could very well be the actual reason why they decided to look at the birthday. What is concealed is how many other possibilities, equally well justified, were considered.
They might pitch in and buy you a small present, in return.
But they definitely expect you to bring a cake or at least some sweets.
All sorts of standards, compliance, and licensing requirements.
My gut instinct is that this is incorrect, too, but I don't know enough about surgery to make a compelling argument.
I'd back myself to pick up Ruby (a language I've never touched before) and be productive, more than I'd trust a surgeon who only has experience with heart surgery to operate on my brain. Maybe that's ignorant of me.
They don't test for "I've been doing this particularly tricky type of bone biopsy right next to the spinal cord for decades" scenarios.
EDIT: I take this back. I missed the point of the argument.
I don't think that's the scenario @jrh206 was talking about, though. Most code written in Ruby doesn't have the sort of immediate risk to life or limb surgeries do.