How Wikipedia influences judicial behavior(csail.mit.edu) |
How Wikipedia influences judicial behavior(csail.mit.edu) |
EDIT: I hear complaints from students all the time that they are not allowed to cite Wikipedia. I tell them no you should instead cite the Wikipedia citations. They invariably tell me how much better they do academically because of that.
Students should be encouraged to cite Wikipedia when they found information in Wikipedia, so that when they grow up and start writing real research papers they will continue citing Wikipedia when they find information there.
Finding information somewhere and then not citing it (or citing some random other source that actually says something different) erodes the whole academic project. Any teacher who tells their students not to cite Wikipedia should be ashamed.
I guess it's fine to be idealistic here but most reviewers would look down upon your work if you do this. And that impression can be the difference between acceptance and rejection. I'm sure ideally this shouldn't happen, but it is what it is for now.
However! It's also important to teach students how sources differ in quality or "authoritativeness". The problem with citing Wikipedia is not the citing per se but relying on that source. A peer reviewed academic journal is considered more reliable, although no source should be taken as gospel and definitive truth, especially on controversial topics.
You can even cite blog posts, personal letters, and even personal oral communication! The point is to let the reader know where your info comes from. Making students memorize rules like "don't cite Wikipedia" just results in a cargo cult, not actual understanding of critical thinking related to sources.
References themselves are gameable. They argue references mentioned in Wikipedia are more likely to be cited by a judge!
This is not about the quality of Wikipedia, but about its undue influence and how easy it is to game it, references included!
The difference is that a crowdsourced resource like Wikipedia is easier to manipulate by people who understand the system. There are plenty of PR specialists who get client articles pushed into Wikipedia or updated to their liking.
Wikipedia is a treasure, but it’s also vulnerable to a bunch of different attacks.
False ideas can be spread simply by overemphasizing biased true statements and disregarding true statements that don't fit the narrative.
The effect can be multiplied by controlling the discussion through selecting the right 'questions' that are discussed.
Snopes is the exemplar.
If you're looking for major/important sources to read on a topic, not just a quick way to halfway-fake a works-cited section, I've found it valuable to locate some representative, recent academic book in the field and read the author's introduction and other pre-chapter-1 material. These will often include a lot of name-dropping of what are considered major works in the field. There may also be a list of abbreviations the book will use, and those often include several major works in the field that'll come up often in the body text.
That's your list of books and papers to find and read. Repeat that technique with each of those books and papers, too, if you want to keep going deeper.
Often you can get enough off an Amazon or Google preview of a book for this to work. Plus, libraries exist, and you pull that kind of information out of several books (which can be handy—anything that appears more than once deserves special attention) in less than an hour, without checking anything out. And there's always Library Genesis, which may not have every book but probably has at least one in your interest area that can be mined in this way.
Wikipedia's sort of useful for this, at least for tracking down a first work to attack with this approach, but the problem is that many articles don't cite highly-regarded or authoritative or landmark works on the topic, so much as whatever the author(s) happened to have handy or what was easiest to find online (a whole hell of a lot of great information is still not available on the Web, even in 2022, including material in very recent books, not just pre-Web ones, or is available on the Web but only in poorly- or not-indexed-by-web-search-engines under-copyright ebooks).
https://aacrjournals.org/cancerdiscovery/article/12/1/31/675...
Ed. competently replaced usefully
This isn't an entirely trivial matter, as it shows that "random" persons may be able to shape judicial and scientific narratives through wikipedia.
A wikipedia article is going to have orders of magnitude more influence than nearly any journal article or textbook, and scholars should put at least a basic amount of effort into improving them.
It should be seen as a kind of public outreach.
No, they don't. Correlation is not causation, even if you see it in a randomized experiment. With shoddy reasoning like this, it's no wonder science has a replication crisis.
More likely, today's clerks look at Wikipedia.
Fortunately, this is the kind of thing we can all sit back and laugh at. If a candidate can’t be arsed to hire a competent PR firm to handle their public profiles, then they probably don’t deserve the position.
[0] https://en.m.wikipedia.org/wiki/2022_New_Mexico_gubernatoria...
There’s even a whole “judicial philosophy” based around this method of deciding first (based on personal preference or coin tosses or bribes or whatever) and then cherry-picking citations to pretend it wasn’t really your own decision / avoid having to explain your reasoning: so-called “originalism”. And it goes back decades, long before Wikipedia.
It's true that since they find that the gain appears to be concentrated in 'positive' citations, used as justification, they probably didn't flip many decisions immediately (if any). But they also do followup linguistic analysis to show that (like the earlier studies) judges are borrowing language in describing their decisions. So you are going to have an accumulating effect here where the citations at zeroth order are used for justification, but that makes those cases better known later on, and they will be described as the article describes, and increasingly interpreted that way when read by later judges due to being precedent, and who will then copy it (that's how common law is supposed to work!). And that may well start begin flipping cases.
I guess there's only one way to really find out :D
> The experiment featured Wikipedia entries authored by faculty and by law students under faculty supervision, who each had access, through their university library, to all the relevant primary and secondary legal materials available to judges and their clerks. This assurance of accuracy and of informed analysis in the content of the entries — though short of that offered by a specialist textbook — indicates that judges or lawyers would be unlikely to be misled by what they might read.
I find nothing ethically questionable at all about publishing accurate legal analysis on a case anywhere, including Wikipedia.
The issue isn't that they were misleading anyone; the issue is that they were, for research purposes, trying (successfully) to influence the outcomes of court cases without the subjects' consent.
https://en.wikipedia.org/wiki/Open_access_citation_advantage
Articles on Wikipedia and in open access journals are more accessible than paywalled sources, which means that people will read them more often.
Ed. legal precedent replaced sources for clarity of this significance
This is just that with a different medium.
I think you misread the article (or I’m misreading you). There were no counterfactual legal precedents published. They took a set of cases and for half of them published Wikipedia articles on them for half, did not publish them (the non-publication was the counterfactual case, not the contents of the articles).
https://towardsdatascience.com/establishing-causality-part-1...
https://bolt.mph.ufl.edu/6050-6052/unit-2/causation-and-expe...
https://towardsdatascience.com/establishing-causality-part-1...
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235704/
https://escholarship.org/uc/item/42v4w8k1
http://ippsr.msu.edu/public-policy/michigan-wonk-blog/random...
https://www.cs.cornell.edu/courses/cs1380/2018sp/textbook/ch...
I’ll leave open the possibility that it’s everyone else that’s wrong but RCTs are used to establish causality and are as much “proof” as you’re gonna get in science.
Hell ya know what I’ll just let the actual paper explain it.
> The second, more important advantage of randomized field experiments is that they can distinguish causation from correlation. The ability to prove causal relationships derives from the combination of two characteristics. The first is having a control group, that is, a group unaffected by the intervention (in our case, publication of a Wikipedia article on the topic) that can be used as a counterfactual to estimate the size of causal effects. The second is randomization, that is, random assignment into the control and intervention groups. With sufficient data and a sound experimental design, the experiment can reduce the probability of being misled by correlation or noise to whatever arbitrarily small value is desired.
No, they're not. The real "gold standard" in science--the standard that prevails in, for example, physics or chemistry--is a controlled experiment. Not just a "randomized controlled trial", but a controlled experiment, where you can actually dictate exactly what state the things you are going to experiment on start out in. And the eventual output of controlled experiments is a predictive model--a model that can predict, accurately, what will happen if you run further experiments. That is what it takes to truly "establish causality".
But in most other domains, including the one under study here, controlled experiments simply cannot be done and predictive models with any kind of accuracy simply don't exist. The correct response to that unfortunate fact is to realize that we can never achieve the same level of confidence in these other domains as we can in domains like physics or chemistry where we can do controlled experiments. Unfortunately, the response "science" has settled on instead is to pretend that it doesn't matter--that because we can't do controlled experiments in these other domains, the universe will somehow magically lower its standards of what it takes to achieve the level of confidence we want. But the universe doesn't care what we can or can't achieve.
The image for Mark Ronchetti was uploaded by a user who shot a video of him in 2020. Since they created the video, they own the copyright to the image and can license it to Wikimedia Commons:
https://commons.wikimedia.org/wiki/File:Mark_Ronchetti.jpg
Searching for other CC-licensed images doesn't return anything:
https://duckduckgo.com/?t=ffab&q=mark+ronchetti&iax=images&i...
https://www.google.com/search?q=mark%20ronchetti&tbm=isch&tb...
I've emailed his campaign to ask if they have a photo they can license appropriately and upload to Commons. If you have a better photo of him (that you took yourself and are willing to license for free use), you can upload it here:
https://commons.wikimedia.org/wiki/File:Mark_Ronchetti_Heads...
...and requested they complete the license authorizing its use:
https://commons.wikimedia.org/wiki/Commons:Email_templates
I'm giving it about 50/50 odds they'll fill out the form and the photo will stay up, but at least I tried.
I agree with the other posters who point out that this is a bit of an unfair advantage for incumbents (who have government-sponsored public domain photos available for use). It'd be an awesome thing for volunteers to try to help with, by reaching out to less tech-saavy campaigns as I've done here.
I've been to New Mexico maybe 4-5 times in my life and have basically zero stake in this race, but I guess duty calls[0]. ;)
To be sure, I don’t even live in Mark’s state and it’s on him if he wants another picture on his profile. You’re a stellar citizen, though, for taking action on your own!
I simply wanted to point out a trend on Wikipedia. Mark is just one of more than a dozen candidates with bad pictures or, worse, empty pictures that seem to be the result of sabotage. If you look on these candidates’ pages (or Mark’s page), you can look at the revision history and determine that there were past pictures taken down or replaced near election times.
Again, I don’t care. It’s up to these candidates to fix this stuff if they want to win. With the amount of money they’re bringing in, you’d think they could hire someone to spin them up nice profiles with ‘Political Stances’ sections and quirky stories about their family life. I wonder if they intentionally keep their profiles empty to funnel traffic to their personal websites instead.
> The Hunter Biden laptop controversy involves a laptop computer that conservative media outlets claimed without evidence had belonged to Hunter Biden. They further stated that the laptop had been dropped off but never collected.by an unknown individual at the Wilmington, Delaware repair shop of a blind proprietor in April 2019.
Three paragraphs saying it's all made up, you can't trust the NY Post's reporting, it's probably just Russian propaganda and then it finishes with:
> In March 2022, The New York Times reported it had authenticated some emails "from a cache of files that appears to have come from a laptop abandoned by Mr. Biden in a Delaware repair shop."[10][11] Also in March, The Washington Post reported that two security experts authenticated thousands of the 129,000 emails, though the vast majority of the laptop contents, including most of its emails, could not be authenticated.[12] Among the emails that The Washington Post was able to authenticate was the Pozharskyi email that formed the basis of the New York Post's original article.
[0]https://en.m.wikipedia.org/wiki/Hunter_Biden_laptop_controve...
When people don’t cite their actual sources it becomes orders of magnitude more difficult to figure out how they came up with their claims and trace the origin and transmission of those claims through the literature.
It's worth noting that Wikipedia is not special in this regard — those same departments probably also consider it bad style to cite Britannica (and, if they don't, they should).
Encyclopaedias are meant to be starting points for research, not the ultimate destination. Editors, both of Wikipedia and otherwise, are not expected to be subject matter experts, which is why the guidance on Wikipedia is that you're not even supposed to use primary sources as reference, but rather secondary sources[0].
0. https://en.wikipedia.org/wiki/Wikipedia:No_original_research...
But what happens instead (in people’s published journal papers! not to mention news articles, etc.) is authors lazily crib material from Wikipedia and then either cite nothing or randomly pick works from among Wikipedia’s sources to cite without ever looking at them.
If you are writing a paper you should cite where you got the information. If the only place you looked was Wikipedia, that’s not great research practice but you should still cite Wikipedia. Honesty is an even more important part of scholarship than diligence.
Of course, dishonesty often works, but it undermines the whole endeavor.
(Although, the original author still deserves credit for their work. Perhaps the citation should be to the original work "via Wikipedia".)
Moreover, if you want to claim this study was unethical, you need to show that there at least could have potentially been harm as a result of it being carried out. Please explain how publishing correct legal analysis of cases anywhere could possibly cause harm.
In short, if you want to make this claim, you need to provide actual evidence, and not just outrage.
Start with a couple keywords. Click through the "cited by n" links on the top few papers. For papers that don’t have PDFs freely available, find DOIs and put them into Sci-hub. Books can often be found at the Internet Archive, Google Books, or libgen. At the start, skim skim skim.
Look at what links forward and backward from the papers you see. Hunt for new keywords to try. Go a few hops all around the graph. It often doesn’t take too long to get a rough lay of the land.
There is a huge body of knowledge that lies (dare I say) in a Google search. You just need to know how to evaluate the search results with a reasonable criteria of notability, relevance, accuracy, etc.
No shade intended, if anything I need to work on this style. I'm too nervous of making it sound like other people's ideas are my own, but then I end up writing a block of defensive-sounding citation & qualification, and nobody wants to read that...
Anyways, well said.
So yes, this is a way to determine causation in relation to what legal precedents / decisions will be cited in new cases depending on the contents of Wikipedia.
I was under the (perhaps mistaken) impression that they claimed a causal link between their Wikipedia articles and the outcome of new cases. The chosen setup cannot IMHO prove such a causal relationship, in the strict statistical sense. That would require randomizing new cases and applying the “treatment” (new Wikipedia articles) only to some of them.
You are right that his article has a poor quality photo, which seems to be due to it being the only free-licensed photo available, but it's simply wrong to use this article as evidence of some sort of "partisanship on Wikipedia". If anything it is an example of bias towards incumbents, who get professional taxpayer-funded public domain portraits that are easy to drop into Wikipedia pages.
[0] https://en.wikipedia.org/w/index.php?title=Special:Log&type=...
[1] https://en.wikipedia.org/w/index.php?title=Mark_Ronchetti&ac...
[0] https://en.wikipedia.org/w/index.php?title=Mark_Ronchetti&ol...
Interestingly, the current image on that article appears to also not be legitimately freely licensed. It is a screen shot of a Youtube video uploaded by a resort in New Mexico. The uploader[0] gives no indication on their user profile that they are affiliated with the resort, while putting a Creative Commons license on the screenshot that the Youtube page makes no indication of.
The uploader is also now suspended on all Wikimedia projects for abusively using multiple accounts, and their main account[1] has on their talk page lots of records of deleted political-oriented images with many instances of inaccurate licensing and poor quality.
(If someone has a Wikipedia account it would be nice to flag the image on Ronchetti's page for deletion!)
All this to say, Wikipedia's editing process can by its nature be pretty messy and often produce some suboptimal results, until such time as someone comes along and fixes it. This should not be confused with a concerted effort by the people running the project itself.
https://commons.wikimedia.org/wiki/File:MarkRonchettiNM.png
Most likely it was taken down due to a notice from the copyright holder:
https://commons.wikimedia.org/wiki/Commons:Volunteer_Respons...
No, they didn't. You can't do a controlled experiment on humans. Nobody has a "human source" that can stamp out a series of humans that are identical in all respects, to be used in an experiment, the way physicists have "particle sources" that can stamp out a series of identical particles. That's what "controlled experiment" means. The fact that they call one group a "control group" does not mean it's a controlled experiment. Humans can't be controlled to the degree required.
But this takes significant effort (like, a half-day of research to sort out one claim), and then sometimes back and forth with other Wikipedians to convince people that you actually chased down the real story.
The problem is that for every mistake someone is willing to put effort into fixing, there are another 100 that nobody ever notices.
So much criticism of wikipedia seems to come down to: wikipedia did X. I think X is wrong. Other people don't see it that way. I don't want to spend the time proving my point. How dare wikipedia not just take me, a random internet stranger, at my word.
All i want to know is how do y'all think it could possibly work differently? Everybody thinks they are right. Nobody intentionally is wrong. Obviously if you just show up, unwilling to explain why you are right or unwilling to accept compelling counter arguments to your point, its not going to go your way. Why would anybody think it would?
Activists are willing to invest orders of magnitude more time, energy, and discomfort into winning. They are willing to break most social norms to have their narrative become the default. They're willing to suppress facts that would support alternate narratives. They're willing to put their thumb on the scale when inconvenient facts are unavoidable. Et cetera.
Non-activists are not willing to do any of those things.
It's not about right or wrong, it's about activism: who engages in it and how much.
But sure, let's let the activists win—or force everyone to become activists to "compete". I'm sure that'll make the world a better place.
----
Or we could ban activism since it is fundamentally anti-social bullying behavior. Maybe make a "code of conduct" that prohibits it. Just spitballing here…
It's pretty simple to identify activists mechanically (and at scale): they are in the fat part of the power law for contributions. Simply limit people's ability to contribute and et voilà !, the activism problem has been vastly reduced, if not eliminated. Non-activists now have a chance.
For example, in Australia there is a body that does sport participation statistics, Ausplay. They do this every year. It's a great source for sport statistics on Australia.
Two wikipedians decided that these statistics were not permissible in the sport in Australia article. They won :
https://en.wikipedia.org/wiki/Talk:Sport_in_Australia#RFC_on...
This is sport in Australia, which is not that controversial. Now things that are controversial like IQ or the role of heritability in ability are surely going to be problematic.
And an end result of No Consensus.
Framing this as if 2 Wikipedians exercising outsized power to produce this ruling seems disingenuous at best. And their basic objection (I only bothered skimming) of bias and ambiguity in the source/data/methodology seems fairly reasonable on its face; whether it’s correct I have no idea but it’d be a reasonable objection
As a policy, this whole thing seems like good behavior; the only gap is in the lack of voting participants. I suppose it is a real problem if the vote can’t be recast when more people are willing, but otherwise
FYI this kind of "wrong think" is already being removed in many articles. The way it's removed is applying the existing deep and numerous rules more strictly to information which cuts against the current dominant cultural narratives. For one of the best examples I can provide, have a look at how the "Feminism" and "Men's Rights" pages are written. Completely different standards for evidence, commentary, style, and even sections. Criticism of men's rights is evident in the heading, while of course, there is no criticism of feminism in Feminism's heading.
Wikipedia co-founder Larry Sanger has described Wikipedia as "badly biased." He's 100% correct.
Sometimes the discussion will have the same correction listed but overruled by partisan Wikipedians.
You could equally well say “I find obvious errors in textbooks / lecture videos / journal articles / paper encyclopedias / ... all the time but it’s too hard to contact the author so I don’t do anything about it”.
The main difference is that in Wikipedia you can do something about it with some extra effort. So it’s actually a much better situation than most kinds of resources.
The pages that are “locked” are usually locked because they are spam magnets. Not allowing IP edits is unfortunate (and does discourage simple corrections to articles), but in the highest traffic parts of the site the work saved from not having to revert dozens of low-effort vandal posts is (at least arguably) worth the downside.
> overruled by partisan
You wouldn’t believe the amount of abject nonsense and spam that gets cleaned up by those “partisans”. But Wikipedia is an open project, the “partisans” here are just other (slightly more experienced) volunteers not in any way fundamentally different from yourself, and if you can convincingly prove your case via polite conversation you will win the argument (if there is a local dispute it’s generally possible to get more eyeballs on it by escalating to a broader group of volunteers).
* * *
P.S. someone named Slartibartfast turning down a chance to work on the real-life Hitchhiker’s Guide?
For former encourages the behaviour of finding other sources that are reliable. The latter encourages quoting Wikipedia without citing it.
No, you don't need to write off the entire field, you just need to be aware of its limitations. As you should be with any field of knowledge.
> no two animals could be identical. Even clones could be subject to random point mutations.
Yes, and any honest assessment of what we know in biology, and how confident we are in our knowledge, has to take these things into account.
Larry Sanger is involved with several competitors, including some for-profit examples, so he has a financial incentive to bash wikipedia
Not to say that he is neccesarily wrong. I wouldn't say "badly biased", but nobody is going to claim wikipedia is perfect.
I was curious and I looked it up:
Most people at the top of the power law are not evil people. Its difficult to be both prominent and evil. The real pov pushers tend to keep a lower peofile to avoid discovery. That doesn't mean prominent people dont have beliefs, everyone does, but most people can have beliefs and behave appropriately.
I think your real objection is its more difficult to argue with an experienced person who is willing to devote more time. Which is true. It is after all why in the real world expensive lawyers are worth the money.
But why is that a bad thing. If another person simply has researched the topic more than you, they should win the argument. That is life, the more effort you put in, the more likely you get a positive outcome.
If you really believe power users are more likely to behave in bad faith or maliciously, i'd like to see some proof, because i highly doubt its true.
How much are physicist or chemists really controlling in the lab setting? There could be plenty of confounding variables in their experiments too. Maybe "RT" in this lab for that publication for that experiment is actually 75*F and its 71*F in your lab, or you are at different elevations. Maybe no one calibrated the instruments for years. Maybe the reagent wasn't fresh and absorbed too much moisture or oxygen from the room. Maybe an undergrad dropped the balance on the floor and was afraid to tell anyone.
To overcome those potential confounding variables and other biases, chemists and physicists often turn to the exact same statistical tests being employed by people in the social sciences. Technical replicates are the norm in hard scientific experimental design because of how many biases could be present in the laboratory. It's a chaotic environment. Good experimental design builds robustness no matter what your topic is.
A lot more than can possibly be controlled when you're gathering data from events in the real world instead of in a controlled lab environment.
> To overcome those potential confounding variables and other biases, chemists and physicists often turn to the exact same statistical tests being employed by people in the social sciences.
No, not "often"--"when they have no other choice". The preferred method of dealing with such variables is to measure them, develop predictive models for how they affect the desired outputs, and test those models in further experiments. For ewxample, if "nominal" temperature is 75 F but temperature in labs can vary, physicists or chemists will want to do experiments over a range of temperatures, develop a predictive model for how temperature affects the results, and test the model. They won't just throw up their hands and do "statistical tests" and call it a day--unless it's impossible to do anything else. Which it almost never is in physics and chemistry.
> Good experimental design builds robustness no matter what your topic is.
And good science is aware of the limitations inherent in each specific field no matter what the field is. Good science does not make claims that are not justified in the light of the limitations of the field. "Statistical tests" simply cannot give the same level of confidence as predictive models that have been tested against further experimental data and have passed the tests. And good scientists should not pretend otherwise.
The nature of any large project is that people will disagree and not everyone will get precisely what they want.
What was interesting is just how unobjectionable this addition is and how hard it was to go through all the processes.
I've added stats in a few wikipedia articles which have all gone in just fine. With this one part of it was OMG, really, you can just object and object and keep something out if you're determined enough.
(Disclaimer: I didn’t investigate this carefully enough to have a well formed opinion about the outcome.)
Also, there are stats in the article that are completely biased that are self-reported stats from sports organisations.
The people objecting to new stats had no problem with these ones.