How we made Typerighter, the Guardian’s style guide checker(theguardian.com) |
How we made Typerighter, the Guardian’s style guide checker(theguardian.com) |
Well-earned into the present day. I regularly see typos even today.
"People of ethnic group membership can change over time and with age."
They fixed it eventually (though I doubt it had anything to do with my comment).
There was also the following opinion piece, which still makes the utterly absurd claim that Jesse Jackson campaigned for the capitalization of 'African American':
https://www.theguardian.com/commentisfree/2020/oct/21/black-...
(I hope the fact that both of my examples happen to relate somewhat to race doesn't make me sound like an alt right troll. I'm sympathetic to the article. It just seems that there was a major fact checking or editing fail.)
https://www.theguardian.com/gnm-archive/gallery/2016/nov/18/...
But here it seems like a good choice to build on a battle-tested library of regrets, and it's clearly working well for them.
The demo looks slicker than the typical Grammarly/MS Word/native macOS grammar and spelling corrections, for those who missed it: https://www.youtube.com/watch?v=Yl0nb94N98k&feature=emb_imp_...
And the ability to flag false positives, send suggestions back, and see metrics of how the system's being used is just awesome.
Also, I'm a big fan of regex. I think -- probably thanks to jwz's famous quote -- a lot of younger programmers avoid them but they're fantastic for MATCHING. Using them in a Google sheet is a killer MVP to prove out something like this.
I suppose I still use them because I don't know of a better way to do things.
General maintainability is a priority, and we'd like to improve our rule management tooling to make the process of rule maintance generally accessible to editorial staff. We're also working on making noisy rules match more specifically, which usually involves migrating the initial regex into Languagetool for e.g. pattern-matching on part-of-speech.
Thanks sharing these projects, other suggestions are very welcome – we'd be interested in adding new matchers based on different tech if they were a good fit for the use case.
I suspect the biggest problem with using regexes is over-suggestion, trying to correct American English spellings in a quote for example, but I suspect this is a pretty good balance of features, usability, and correctness.
One issue that comes with more complex systems like you mention is that the bugs become more complex. I'd imagine it's fairly easy for a journalist using this tool to know why an incorrect suggestion has been made, and that makes it easy for them to disregard it. While the error rate may improve with more complex analysis, those errors that do still happen are likely to be less understandable.
It's a bit surprising that the engineering blog appears to be embedded in the main site, though. I've worked at a news org in the past (admittedly much larger) and the engineering/meta blogs were entirely separated from the main news section. Obviously it doesn't make sense to reinvent your stack, but I'm surprised the surrounding site scaffolding isn't at least distinct to show this isn't primary news output.
I've always felt automated checks + fixes for grammar and style are miles behind where they should be by now. Checking over and over e.g. long emails for problems before you send them is super time consuming, and that's not even considering help with tone and the overall message.
What does make it interesting is if it were applied as a GPT-2/3 module, and let loose as a reddit comment bot to train a model for engagement and provocation. Editors are essentially model supervisors, and if the object is to provoke and flatter people to sell advertising, it seems more like a compute problem to distill this process into a business.
Human writers creating organic content aren't really necessary for that, and very soon we should be able to generate content and then attribute it to loyal personalities that we stand up as minor celebrities, not unlike the old Hollywood studio system from the early 20th century, where talent was well kept, but still very much kept.
They even have a snippet of Scala code. I feel like HN must be the target audience
- regex rules are updated frequently (let's say weekly)
- the updates are available to hundreds if not thousands of users in different locations
- all of them have the latest ruleset
- all of them capable of sending feedback regarding how useful and correct the suggestions are
- said feedback is analyzed regularly and used to refine the ruleset
The results page generated by the script could have checkboxes to mark each suggestion as useful/not-useful/incorrect and a submit button, with this feedback saved in MySQL.
(I'm not sure whether this qualifies as "without ... services")
Early 21s century -- hopefully there is more to come :D
I have a deep antipathy to Murdoch press, having said that they used to have remarkably high editorial standards in the Australian flagship newspaper (called "The Australian") and when News Ltd went on a cost cutting drive, around the time of the collapse of their main competitor, Fairfax Ltd (caused by Warwick fairfax, who basically wrecked the empire: he now consults in the USA on .. how to be a successful entrepreneur!) It merged a huge amount of sub-editor functions into a JV with Fairfax, which subsequently basically failed in-place: They sacked the good staff, and kept an out-source agency which had no clue. The Oz, is now pretty U/S for basic grammar AND spelling.
They also routinely now do pun leads. This was funny for about half a second, ever. The Graun also does far too many pun leads. I think all newspapers wind up there.
A good fictional account of news headlines and the journalistic pressures of newspaper writing in happier days is in "leaven of malice" by Robertson Davies: a fictional Canadian newspaper, beset by a cruel trick played in "hatches, matches and dispatches" -"The Shipping News" has its moments too.
2. Can't they at least fix typos quickly?
3. I still regularly see trivial grammar errors (repeated words, etc) in opinion pieces on the Guardian, not just breaking news and liveblogs. I guess some of those opinions pieces might be treated as "fast-turnaround journalism", AKA "hot takes". The rate of simple typos there makes me wonder about more important things like factual accuracy.
Edit to add: reading the article more closely, it sounds like they've only started using this new system quite recently, so hopefully it will help them improve. I stand by my opinion that in recent years the rate of typos and grammar errors has been higher on the Guardian than most other comparable news sites.