Notes on the Perfidy of Dashboards(charity.wtf) |
Notes on the Perfidy of Dashboards(charity.wtf) |
I worked with a behavioral economist, and we started running RCTs looking at different approaches to sharing data, and found that dashboards led to less engagement, when there was engagement it was more likely to drive ineffective interventions, and generally our dashboard groups had worse patient outcomes. Non-dashboard approaches had 20x better engagement and anywhere from 20-100% better patient outcomes (depending on the condition).
Unfortunately, both of us left the company when a bunch of engineers moved in, scoffed at our work, and immediately said "doctors need to be able to slice and dice their data" -- which, by every measure we had tested, is simply not true. But the "mission control" style thinking, where you have tons of dials and numbers flashing on a screen, pervaded because it "feels" like the right answer despite being the objectively wrong one.
In my timeline, I told the engineering team that all of their work was almost certainly for naught, that all the research said this product would completely fail, and we were basically just doing it for managerial and contractual obligations.
This gave engineering the freedom to use whatever technologies and conduct whatever technical experiments they wanted, since no-one would ever use the product, and it'd likely be shut down soon after launch for disuse.
A key hospital partner gave us a couple dozen docs to test it with. I interviewed them about how they measured their work and impact, and the data they used to improve their craft and outcomes. I asked them to review every measure on the dashboard, explain their understanding of it, and explain how their work or behavior would change based on that.
Almost to a person, the doctors said there was nothing of use to them there, as the research predicted. Some of these doctors were on the committee that specified the measures they would themselves be seeing.
The product was launched with great managerial acclaim, and promptly sunset 12 months later from disuse.
Not sure if this resonates also, but the engineers that took over all came from outside healthcare and had a strong "I'm going to apply what I know from Ticketmaster to solve healthcare!" mentality. Those of us that have 15 years of experience in healthcare would, at best, have a 10 minute "knowledge sharing" meeting with the engineering and product managers. And then we'd sit back and watch them make some really naive mistakes. [to be clear, I'm not about gatekeeping people from being involved in health tech, but rather I'm just exhausted at interacting with people with no self-awareness about the amount of things they don't know about a particular domain]
I'm still a bit bummed because I think we were actually just starting to get to some really cool, actually innovative, population health approaches that seemed effective for both improving outcomes and minimizing provider burnout. :(
We pretty quickly found that sending data ("push") was way more effective at engagement than just having a tableau report they could go to ("pull"), even when that dashboard was linked directly within the EHR, didn't require a login, and was contextualized for the provider (basically as low friction as you could get-- they would actually venture into it 1-2 times per year).
We ran a trial where we changed how we presented data: either in terms of number of patients ("screen 20 people for depression this week") or in terms of quality rates ("your depression screening rate is 40% and going up"). Keeping the data in terms of patients led to ~20% improved screening, and in the surveys led to providers expressing more trust in the data (although, they also were more likely to say they "didn't have enough data to do [their] job", despite actually doing their job better than the other group).
So then we took that idea for depression screening and extended it from depression screening to chronic disease management (where the specific task for the provider is much more variable). So we had one arm where we gave them access to data marts and trained them on how to "slice and dice" the data, and then compared that against a newsfeed that had the data pre-"sliced and diced". The engagement was higher in the newsfeed group. Interestingly, the only thing the "slice and dice" group seemed to do was look for patients without a primary care doc designed in the EHR and just assign them-- in evaluating the outcomes for this, that was the single least effective intervention they could do to improve chronic disease care (and this was validated in a follow-up study looking explicitly at the impact of PCP-assignment on patient care). So, our "newsfeed" arm ended up with, on average, around 60% better outcomes than the "slice and dice" arm.
What's funny is that through all of this, some of the leaders would also say "we need more data!!" But when we'd build a tableau report for them, they'd use it once or twice and then never again. Or, in one case, the leader would actually use it for ineffective purposes ("we have to assign them all PCPs!!") or for things that are easily automated ("we're going to email each of our uncontrolled hypertensive patients"). I firmly believe that for doctors and data, you need to have clearly defined objectives: the goal should never be "give them access to data", but rather should be something like "make providers feel like they have the data necessary to do their job" and "improve quality rates through targeted provider intervention." Start from those first principles, and validate your assumptions at each step. I'm confident your solution won't end up with tableau.
We boiled it down for our teams like this:
Administrators are top-down: they want a high-level view and to be able to drill down from there.
Individual physicians are bottom-up: they want "their" data about "their" patients, and maybe, sometimes, to compare to the peers they personally trust.
As with any professional group, there's some minority percentage that treats their work like a craft and knows how to use data to improve their practices; but the majority want qualitative data and value interpersonal relationships. Giving a dashboard to the latter group at all is wasting time and effort of all parties.
If your dashboard can't attribute all of its data and all of the patients referenced to match the physician's definition of "theirs," you've lost. That's the "more data" and "drill down" physicians care about.
If your dashboard isn't timely and clinical -- which generally means presented in a clinical voice, at the point of care, or otherwise when they have an opportunity to make the change you want them to make -- it's not going to be actionable. That means surfacing some alternative action right before they see a patient which might benefit from that, which is not when they're on their computer. They might be one of those doctors that never is on their computer until the very end of the day. Looking at your dashboard at 11pm about the patients from earlier today (or more likely, earlier this past quarter of the year) is not helpful.
Looking at your dashboard is non-clinical work, and doctors want to do clinical work. If you're going to make them go do a new and non-clinical thing, it has to reduce some other non-clinical thing in a way that's meaningful to them. Otherwise, they're just as likely to do an end-run around your application entirely, like the doctors who only use index cards to write notes or who fail to enter their passwords every morning and lock themselves out, so they don't have to use the EMR.
The tldr was that telling providers directly what you want, generally in an emailed newsfeed-style format was the most effective at improving actual outcomes. No slicing and dicing. No graphs. No comparisons. Just "hey, look at these 6 uncontrolled hypertensive patients, and follow-up with any that need follow-up."
Also, to caveat: I'm talking about how to engage the worker-bee providers. Not clinical leadership. Not the quality team. Not the data science/analyst team. Providers who are super busy with patient care, but also expected to manage patients between visits. Basically every experiment we ran favored the most direct, no frills, least effort approach to look at data. Which, coincidentally, was the exact opposite of what the engineering teams wanted to build :-/
>every dashboard is an answer to some long-forgotten question
>every dashboard is an invitation to pattern-match the past
>instead of interrogate the present
>every dashboard gives the illusion of correlation
>every dashboard dampens your thinking
I disagree with this on all counts. A dashboard is a way to view multiple disparate metrics in a single place. Whether they are correlated isn't important(but it is helpful).
And the author doesn't stop there...
> They tend to have percentiles like 95th, 99th, 99.9th, 99.99th, etc. Which can cover over a multitude of sins. You really want a tool that allows you to see MAX and MIN, and heatmap distributions.
They "tend to"? "You really want"? The author is confusing their own failures/gripes around the concept of dashboards with the world's experience with dashboards. By the end of the article, I was shocked they weren't selling something.
We've spent a lot of time building Grafana dashboards and they've been extremely helpful with debugging. It doesn't solve all problems but it certainly helps narrow down where to look.
Sure, we still look at log files, use htop and a lot of other tools, but our first stop is always Grafana.
I suggest the almost any book by Edward Tufte. There you'll see the beauty and value of visual information.
Instead, as data analysts, we usually want to write a bunch of SQL queries, create charts from them and expose the data to our business stakeholders. While they can see the underlying SQL queries of the metrics, it's not easy for them to modify these SQL queries, so they often get lost.
The dashboards have a long tail. For me, you need to get these four steps done beforehand:
1. Move all the company data into a data warehouse and use it as the single source of truth.
2. Model the data with a transformation tool such as dbt and Airflow.
3. Define metrics in one place on top of your data models in a collaborative way. (This layer is new, and we're tapping it at https://metriql.com)
4. Use an interactive BI tool that lets you create dashboards on top of these metrics with drill-down capability.
The point of the dashboard is so someone can say “hey, I’m not an engineer but new user signups sure are taking a nosedive this week. Can we get someone on this asap?”
Then you can point at the dashboard and say “this is a problem”.
Um...yes. And that is a very good thing. Because if there is anything the human brain is good at, it's pattern matching. Especially on visual data.
It's an extremely quick and efficient way to find out where to start the detailed debugging.
And there is a lot of value in that.
Static dashboards sound like timeseries backends, where the data is pre-aggregated (graphite / statsd, prometheus). You can't really drill down into the metrics, or can only drill down into preplanned dimensions. Grafana is a commonly used dashboarding frontend here.
Dynamic dashboards, in contrast, are dynamically aggregating data. More akin to structured logging, or maybe splunk / ELK. You have granular data, and write queries to extract, filter, and aggregate on demand. Tableau, PowerBI, Apache Superset all compete in this space.
But by focusing on the dashboard angle, the reader doesn't think to hardly about why they're different, and also why you might prefer one over the other. TSDB like Prometheus are very fast, and if you focus on collecting aggregate data, allow you to collect a lot more metrics, or sample much faster. You're probably not logging in the TSDB any labels associated with UserAgent strings, or screen size your mobile app got, etc. By paying the price in dimensionality, you get much faster queries at lower cost. I'll let you guess which type of backend Charity's startup represents.
Both have a place. I've been able to build canary dashboards that work quite well using both backends, as a proof of concept that something like Kayenta is feasible for my team. In fact, high dimensionality works against you in release engineering. The more dimensions you can compare across, the higher chance for a false positive, and the more investigations engineers have to do to rule them out. Worse, there are often confounding variables you need to go hunting for, and the dashboard won't find them for you.
And execs absolutely don't want to have to care about the complex causality chain you need to model. They want 'a single number' to improve on over time. They don't want a dashboard to dive in and analyze on ten dimensions. They want to see their chosen KPIs going to the right and up. Fundamentally, the dashboard is less important than your audience.
> raise your hand if you’ve ever been in an incident review where one of the follow up tasks was, “create a dashboard that will help us find this next time”
As a disciplined software engineer, I aspire to have each and every user facing bug captured first as an automated test. This helps form trust in the software. Ideally the users themselves can choose to write the tests and submit them for me.
This is akin to metrics. I completely agree, system-KPI metrics should be relevant and short. But there's nothing stopping you from collecting an archive of previous data experiment formulas.
But man, why is it designers thing light gray background with mid-gray text is a good idea? Almost unreadable for me.
Also, does anyone remember the covid dashboard from John Hopkins. That one is pretty useful.
Sadly dashboard systems that encourage this are extremely rare.
1a. Define dashboards in a way that suits your argument
2. Knock em down!
3. Profit?
This, https://status.cloud.google.com/, is a static dashboard. One can make up one's mind about its usefulness and function. A blunt example, but nobody I know "debugs" using dashboards.
Dashboard is automation, if you're against dashboards, you're against automation of repetitive, boring, long-winded, laborious, hard-to-remember, tasks. Dashboards aren't sacred, once one outlives its usefulness, get rid of it.
This is technically correct but doesn't approach anywhere near the criticisms the article has.
The deeper questions are: how did those metrics come to be collected, and why? What happened that resulted in those particular metrics being aggregated and displayed they way they are? What questions were being asked at the time the dashboards were created?
> a way to view multiple disparate metrics
So what? Why view them? Pretty graphs? A red/yellow/green? But to what end? This is why the statement is technically correct, but sheds no light at all on the reasons why a developer or customer support troubleshooter would care to look at the disparate metrics gathered in a dashboard.
Dashboards are created in response to certain problems and events. Those problems and events may or may not be relevant some time down the road. What happens when someone in the present with a certain set of questions or problems looks at the dashboard full of metrics capturing past questions and forgets that those questions are not today's questions?
I also like them for whatever KPIs are considered important this week. A slightly sneaky reason is that time has to be budgeted to modify the dashboard and they are very visible, so dashboards also advertise when the goalposts move. (My current employer is actually not bad about this. This lesson came from $job-1.)
At CoreOS I set up a number of dashboards specifically to socialize the normative behavior of our systems. Think of it as the difference between the person who just drives their car versus the person who knows what feels and sounds "normal".
The latter can tell when they need an oil change because of different vibrations in the engine and how the car sounds pulling up to a stop light (because of the engine sounds being reflected back into the window by parked cars).
Big surprise, it had the desired effect.
I'm especially with you on the notion of disparate metrics. While correlation is not causation, it's still a useful diagnostic tool.
Let's say someone in marketing walks a dashboard and and sees the following:
1) a new version has been pushed 2) customer tickets are up 20% over the number they're used to seeing
Does that mean that the new version caused the tickets? No. Will that allow them to ask? Absolutely. Will that urge behavior to reach out to support and release management to see if there's an interesting story to share internally (or with the world)? Hopefully.
You hit the nail on the head by calling out the absolutist / "I'm the authority on this matter" / "there is a single correct perspective" tone.
Your lack of shock at the author selling something can be remedied. They have a dog in this fight: https://www.honeycomb.io/teammember/charity-majors/
If you have no feel for what normal looks like on you system then it'll take longer to notice problems and longer to fix them.
This is what I was about to write. Most of our services have 1 or 2 dashboards showing some service KPIs - for example HTTP request throughputh and response time, and also interface metrics to other sub systems - queries to postgres, messages to the message bus and so on.
With dashboards like this, you can very quickly build a deduction chain of "The customer opened a ticket, well because our 75%ile of the response time went to 20 seconds, well, because our database response times spiked to an ungodly number".
And then you can go to the dashboards about the database, and quickly narrow it down to the subsystem of the database - is something consuming the entire CPU of the database, is the IO blocked, is the network there slow.
In the happy cases, you can go from "The cluster is down" to "our database is blocked by a query" within a minute by looking at a few boards. That's very, very powerful and valuable.
And sure, at that point, the dashboards aren't useful anymore. But a map doesn't lose value because you can now see your target.
Or, to put it another way, looking at a dashboard should tell you reliable facts about the system, that lead you to further exploration.
As the post puts it, "They’re great for getting a high level sense of system functioning, and tracking important stats over long intervals. They are a good starting point for investigations."
Dashboards should not attempt to interpret anything, without being very clear about how, why, and what they're doing.
Example: response time statistics vs "responsive green/red light"
If it's important enough to have logic built on top of it, that's an alert, and that's something different.
I don't think a debugging "dashboard" is a dashboard. That conflates two very different ideas. We need a different name for that.
But snark aside, I do agree with your about the intent of a dashboard. I've always told people dashboards are there to give you historical and real-time info about the performance and rates of your systems. It doesn't show you exactly what's wrong, but it is very helpful in showing where to start looking.
Dashboards tell you that a problem exists. Just demonstrating that SOMETHING is broken means the dashboard has accomplished it's task in full. The engineer's job is then to fix it.
And then once the bugs that led to the creation of that dashboard are fixed or retired, what's left for that data? It just sits there with its pretty graphs and eye-catching visualizations to snare the unwary who are looking for help debugging a different problem. In fact, they'd be best served by ignoring existing dashboards and creating new ones specific to the issue in the present, not some dead husk of a problem that looks like it might be related.
My takeaways:
- Start with first principles, such as "improve quality rates through targeted provider intervention"
- Push and simple stats works better vs. pull with fancy dashboards
- Slice and dice can help identify process exception but not great for process improvement, whereas simple stats on a regular basis improve outcomes
The above goes double if you're running Java and your logs include stacktraces spanning 300 lines.
The point: all those measures and the dashboard created to monitor them were likely put in place because of whatever prior bug or outage was traced to not knowing those metrics. But the next problem might be something else, for which the metrics are not collected, aggregated, or displayed. Now you've got a dashboard with lots of information, but it's not showing any problems, and it's not providing any insights into why your customers are complaining and all the product people are in fire drill mode.
In the first case, the clinicians have to do analytical work (slice, dice) towards understanding the population of patients. That sounds more like epidemiology... In the latter case, how is it that clinicians will trust the recommender? Is it understood that there is a clinical rationale or authority behind the algorithm? It sounds like "uncontrolled" in this case is based on a measure that clinicians trust.
I think of dashboards as potentially good for monitoring outcomes against expectations, EDA as potentially good for focusing attention on subpopulations, and recommenders as potentially good for efficiently allocating action. In a broad way what you described is a monitoring system that pushes recommended actions out to doers. I'd venture that with busy clinicians that that needs to be pretty accurate, too, and/or that recommendations need both explicit justification and a link to collateral information.
Your comment about epidemiology/EDA/etc really hit the nail on the head. If you sit in on population health meetings at your average hospital/clinic system, you'll see that many people really don't get this. Further, people often conflate their needs/desires with that of others-- so, the data-driven administrator is quick to say "we just need doctors to be able to slice and dice their data, and then we'll have better quality scores." But they're talking about what their needs are, and it's completely not what the doctors actually need (well, and from monitoring usage of dashboards for those types, I'd argue it's also not what they need either, but that's a different issue). And, the reason I keep saying "slice and dice" is because I've heard that phrase used by every vendor I've evaluated, and in practically every strategy meeting regarding population health at multiple institutions.
I'd personally shy away from describing this issue in terms of a recommender, since that has a pretty connotation in the ML world, and it doesn't really line up well (e.g., there's not a well-defined objective function or a clear feedback loop to train a recommendation system on). However, getting away from that specific concept, I think it's reasonable to say that there are needs for multiple distinct but ideally-related systems in the population health world: one for analysis to be used by quality and data people, and one specifically for the clinicians doing the work.
You are talking tactics, not strategy. What are the underlying goals?