Stanford, Harvard data science no more(stanforddaily.com) |
Stanford, Harvard data science no more(stanforddaily.com) |
Every high school student should learn how to grapple with uncertainty, how to evaluate statistical claims and experiments, how to interpret graphs and charts, understand how machine learning models work (at a high level), and internalize concepts like "significance", "error bars", and "expected value."
This training will help all students every single day of their lives, because it teaches them how to think. Society benefits from having more people with the tools to evaluate data and deal with uncertainty, especially as we face a looming epistemological crisis.
Calculus, on the other hand, will be used by very few students, and even for those few, it will not likely be used every day. Yes, it is a prerequisite for some STEM courses as part of a degree program, and so calculus can be taught to undergraduates pursuing a STEM field in their first year (or those who take it as an elective in high school.)
It's a shame that Stanford and Harvard, which set the tone for high schools and high schoolers, are going the wrong direction here.
Pet peeve: can we just go back to calling these things statistics?
While I agree with you that statistics should be more heavily emphasized at the high school level, the issue goes much deeper within American math education that the one class.
Visualization, scripting, data collection, models, simulation. EDx had a great course by Guttag and Grimson. Add to this Scott E Page’s Model Thinking. Add EDx Data Analytics and Learning From UT Arlington. And some Tufte.
I say these because i work in the accounting field and brought scripting to my firm from my own self-study. It’s been a super power for me, and solved several problems which my colleagues had tackled using Excel alone.
I’ve also studied statistics, but found it less generally useful.
How do you expect students to understand what they are doing with "data science" without learning probability and statistics, and how do you expect students to get probability and statistics without learning calculus?
I mean, Bayes' theorem. How do you get people to get it if they don't know calculus?
Bayes' theorem follows straightforwardly from P(A & B) = P(A|B) P(B) and P(A & B) = P(B & A). The latter tells us that we can swap A and B in the former without changing the value, giving us P(A|B) P(B) = P(B|A) P(A).
Rearranging gives P(A|B) = P(B|A) P(A) / P(B), which is Bayes' theorem.
If you want to introduce continuous distributions like the Gaussian one, you can just say "area under the curve" if you need to connect the density to a numerical probability. They don't have to know how to do the integral, in the case of a Gaussian, it's just tabulated anyway.
I'd argue that you could teach a perfectly reasonable high school stats class using this kind of approach.
A "calculus-free" method is mostly what is done for high school physics, with occasional nods in that direction to set the students up later. And like physics, the obvious connection to of continuous probability to calculus will be a nice motivation later on.
One analogy is how we teach probability to sophisticated engineering undergraduates. I'm not aware of undergrad engineering curricula that use measure theory. This results in awkwardness around delta "functions" and probabilities of certain sets of measure zero (sets that cannot be integrated without the Lebesgue integral).
And sure, some of those undergrads don't ever take that measure theory class, so they escape to the wild without knowing the answers to awkward questions.
>Calculus, on the other hand, will be used by very few students,
These two statements do not mesh. Understanding how machine learning models work requires Calculus.
Those two institutions are recommending more foundational (calculus) rather than applied courses (data science).
I knew a professor from a math department from a top European university who taught data science courses who swore that data science and data mining were just marketing terms invented to sell statistics.
UC CS undergrads had to take statistics for engineers and scientists.
UC CS undergrad majors in particular could end within 2 courses from a math undergrad degree. Is this not the case that squishier applied courses are possible?
EE/CS undergrads had to take the entire upper-division physics track for scientists and engineers, including modern physics.
So has something changed since then and is something changing back?
> This is the [open] textbook for the Foundations of Data Science class at UC Berkeley: "Computational and Inferential Thinking: The Foundations of Data Science" http://inferentialthinking.com/ (JupyterBook w/ notebooks and MyST Markdown)
> [#1 Undergrad Data Science program, #2 ranked Graduate Statistics program]
> Data literacy is distinguished from statistical literacy since it involves understanding what data means, including the ability to read graphs and charts as well as draw conclusions from data.[6] Statistical literacy, on the other hand, refers to the "ability to read and interpret summary statistics in everyday media" such as graphs, tables, statements, surveys, and studies. [6]
Data Literacy and Statistical Literacy are essential for good leadership. For citizens to be capable of Evidence-Based Policy, we need Data Driven Journalism (DDJ) and curricular data science in the public high schools.
This is the Stanford guidance. Mathematics: four years of rigorous mathematics incorporating a solid grounding in fundamental skills (algebra, geometry, trigonometry). We also welcome additional mathematical preparation, including calculus and statistics.
This is the Harvard guidance. Update to math curricular guidance: There is no single academic path we expect all students to follow, but the strongest applicants take the most rigorous secondary school curricula available to them. We receive many questions specifically about what type of math courses students should take. Applicants to Harvard should excel in a challenging high school math sequence corresponding to their educational interests and aspirations. Rigorous and relevant data science, computer science, statistics, mathematical modeling, calculus, and other advanced math classes are given equal consideration in the application process.
It is possible to teach calculus without trig (just for polynomials) and I think it is very useful just at that level.
A whole lot of stuff in AP Stats is a relatively dead end for many people, but geometry and geometric reasoning is necessary for all kinds of engineering-ish math.
Math is interesting in that the early foundation is so useful, but the use drops off quickly. While I feel like other areas often become more useful as I learn more. Possibly because I haven’t spent 15 years on that topic like I had math.
This is the most jarring thing I’ve read today. I can’t say I agree, but I haven’t spent 15 years studying math myself, so who am I to disagree.
For example, if you go into medicine and medical research having a good understanding of statistics is useful, but very little in calculus or analysis is useful (and even if you do need Calculus, most of the useful stuff for those fields is taught in the 1st semester of Calculus).
The Mathematics for Machine Learning book[1] exposes this as a top-down vs bottom-up problem. While both approaches have pros and cons, a sweet spot may lay somewhere in the middle and that needs you to embrace some inevitable backtracking (i.e. college curricula should not forget to add some courses where world modelling using the math and throughfully explaining why that underlying theory and math is actually useful in describing and/or predicting reality).
PS: I also think there is still a lot of focus in resolving problems manually.
[1] https://mml-book.github.io/book/mml-book.pdf, page 13.
Statements like this are a big part of the reason statisticians never trust anyone who works in "data science". The whole field is basically applied statistics/calculus and you're saying none of that is useful.
Not if you want to leave open the possibility of majoring in engineering, physical and biological science, or economics.
But really trig isn’t very complex a topic. I don’t think you should attempt to avoid teaching it. I just think it’s like a 1 month topic that is filled in as you learn calculus, linear algebra, and physics. The real intuition of trig comes form the use of it in other areas, and as a standalone subject it’s just boring.
I would say yes, however, the items listed in the comment I quoted fall squarely within the realm of statistics. I don’t have a problem with calling a curriculum of statistics + data manipulation tools “data science” but that’s not what’s realistically being covered in these high school programs.
Yes. I see what you're responding to--these are squarely in the statistics domain.
> not what’s realistically being covered in these high school programs.
Yes. Where the rubber meets the road. Who exiting from higher education now will have the skills to teach this imagined hybrid course? Realistically, they have to be vetted and hired by the mathematics department and satisfy some state and/or federal standards of education, which are currently staffed by educators who themselves are following standards of their office.
I was responding to the OP's premise:
> "data science" course, if designed properly, will be far more useful to students and beneficial to society than calculus.
Whether or not that objective is "realistic" given the current boundaries perscribed for high school education is another matter.
There is hope; there are modern thinkers in education out there. I referenced the UT Arlington course students and instructors referred to as DALMOOC (google it). I took this course thinking it was another data science course, and found a course taught by teachers for teachers. I hung in because their ideas were so fresh and interesting.
DALMOOC's ambition was to train teachers to encourage students to use social media to communicate their learning results, and in turn produce the data that the teachers were being traind in the course to analyze using social media analysis techniques. DALMOOC professors encouraged participants to generate social media responses to DALMOOC coursework. Very modern. Not sure how long before professors like George Siemens, whose brainchild DALMOOC was, get into state and federal positions of authority and influence to see their modern ideas at the high school level.
What name do you give to this "area under the curve", or the "rate of change" of this area? They are pretty fundamental concepts with important and basic properties, which affect things like local optima and minimization, and expected value and covariance, etc. I mean, you can't cover linear models and least squares without this stuff, and if you don't then I wouldn't really call it learning.
High school math isn’t and doesn’t need to be rigorously proofed based, if you lack some do the tooling necessary to demonstrate a proof, you can tell a student, “the proof requires calculus” and boom, you’ve given them a reason to take an interest in the subject.
If not, you could use some limiting argument to handle the moments of a continuous uniform RV, at least, in terms of the discrete analog.
You don’t need calculus to derive least squares estimators. You can follow the logic in this quora answer [1] to show that (e.g.) the mean is the minimum MSE estimator among constant functions, and that the conditional mean is the minimum MSE estimator among “general” (measurable L2) functions.
This derivation is familiar to many who have studied these concepts. It’s clever, it does not need differentiation, just expectation and logic.
It could be that your studies in probability were done using a certain pedagogical path, and that’s blinding you to the fact that other paths are possible.
[1] https://www.quora.com/Why-is-minimum-mean-square-error-estim...
Does it though? For example, you simply cannot teach Newton's laws of motion without knowing what a derivative is.
You absolutely can do that. You might now want to, but you can, and people do.
No, but it does sort of suggest that, doesn't it?
> This is especially true for foundational courses.
Sure, but calculus is about memorizing ways to answer problems. We're not talking about real analysis, the course in which students develop the calculus and prove it works.
It really isn't. That might be how some people managed to get a passing grade, but clearly they learned nothing and squandered a once in a lifetime opportunity to get to know it.
So it really begs the question as to what is the point? The only thing I can think of is college admissions. A specific selection of rigorous memorization for elite admission.
A whole lot of finance and pharmacology are about exponential functions and their derivatives and integrals, for instance. A whole lot of fields use optimization, even if "just asking the computer to do it", etc.
I admit I am weaker now in calculus and linear algebra because I lean on CAS and simulation a lot... but at least I know how it works so that I have an idea of what I'm doing.
I spent a chunk of my career optimizing FDMs and FEMs, but above and beyond that I haven't had a great need for Calculus until I started doing some deep learning. Again, very particular subfields.
And I suspect the work that you're talking about is exactly what I was thinking about when I wrote that even if Calculus is needed, it's the stuff taught in the first semester.
I think a whole lot of what we talk about in compsci... calculus is table stakes. Sure, it's not differential equations, but how do we talk about behavior at the limit or nonlinear scaling without it.
Even just making up functions that are smooth in their derivative and cross though a few points is something I've had to do a lot for decent heuristics.
> And I suspect the work that you're talking about is exactly what I was thinking about when I wrote that even if Calculus is needed, it's the stuff taught in the first semester.
What's taught in the first semester varies a lot. I'm familiar with AP Calc BC, and sure-- a little bit of the stuff in the last half of the course (differential equations, vector-valued functions) is a little more esoteric for many careers. But a lot of stuff isn't so much (polar coordinates, the "practical integration" stuff that uses basic mechanics, calculator skills, etc)
Ok, thanks for your contribution.