Proselint(proselint.com) |
> Get that off of me before I catch on fire! > Needless variant. 'catch fire' is the preferred form
I don't think I've ever heard anyone say "catch fire" rather than "catch on fire".
From the UK if that changes anything.
http://jugad2.blogspot.in/2015/07/cut-crap-absolutely-essent...
One of the more challenging sections in the GMAT entails sentence correction. A proselint-enabled GMAT prep for sentence correction would be very valuable.
Is it possible to specify new rules in a high-level way?
Can it learn from examples?
Does it work on a sentence-by-sentence basis only, or does it "grasp" complete paragraphs?
https://github.com/amperser/proselint/blob/master/proselint/...
https://github.com/amperser/proselint/blob/master/proselint/...
It depends on your interpretation of NLP. In a sense, all of the rules are hard coded, and so it does string token processing that happens to be informed by contributed interpretations of style guides' rules for usage. Thus, most of the NLP has been performed by the human programmers interpreting those rules.
Though we are interested in extensions in the direction of robust machine NLP approaches able to meet the other goals of proselint, that presents many challenges (including some I mention in response to your third question). Nonetheless, this is an active area of research.
> Is it possible to specify new rules in a high-level way?
In short, no, but it is an area of active research on our part to develop a rule-templating engine for exactly this purpose. "High-level" is subjective though, so there may always be someone who intends to ask about a level higher than the interface that we provide at the time that this question is asked.
> Can it learn from examples?
In a sense, yes, all of the rules have been learned by people from the example text in guides and translated to linting rules. But I do not think that was your intended question.
If instead you mean: you would provide it a set of examples of your writing and it would induce a rule, no it does not do that currently, and may not for quite some time.
Stylistic rule induction is a difficult – though interesting – problem (as is rule induction more generally). It is not something we are intrinsically opposed to, but the simplest version of learning from examples would violate two core principles of the design of proselint.
First, our rules are taken from and organised around the advice provided by respected authors in their writing on linguistic style.
Second, any inductive method will be intrinsically uncertain about the rules that it induces. This uncertainty will always be opposed to our aim of having a low false alarm rate, making inductive methods possible but subject to extensive tuning and testing. This suggests that further development of a test set outside of the examples provided would be needed, to ensure coverage of any of the rules that the examples would suggest inducing.
Additionally, almost all state-of-the-art machine learning systems would require a set of relevant labeled examples of usage errors and non-errors that would somehow generalise to the examples that you would like to provide it. Even specifying the data format would be difficult; if you have any insights as to how this would be done, please develop them below, it can only be helpful and aid progress in this direction.
> Does it work on a sentence-by-sentence basis only, or does it "grasp" complete paragraphs?
I think the easiest way for you to answer this question is for you to see it in action at this website: http://proselint.com/write/
I should mention that longer range dependencies require greater computational power which brushes up against another aim of proselint, to be fast enough to run on reasonably large files as a real-time linter. This may not always be the case in all instantiations of proselint, but for now this is true.
If you have paragraph level rules that you might want to suggest (like the issue I just created when writing this response: https://github.com/amperser/proselint/issues/310), please do! It is even more helpful if you can find an authoritative reference to include as part of your issue, because that will be needed to incorporate the rule into proselint.
Similarly, where Tesla Model S is concerned: Ford Motor Company had something like this round about 1908. (Where "something like this" is "has four wheels and no horses")
You tell me it does cool things. Great, show me. I've looked about on the various pages and can see only one example and I don't understand it:
text.md:0:10: wallace.uncomparables Comparison of an uncomparable: 'unique' can not be compared.
What's the context of this, what's the error it would have caught in my writing?The tool is in a perfect place to show this off as it's text.
But yes, there should be examples on the front page.
Does it accept 'nearly unique' ?
Is this regression ad absurdum or argumentum ad silencium?
Which I think is equally as misguided as trying to force "data" to be plural, and that "less than 3" is wrong
I am among them. Here's why:
(1) There are already other words that express related concepts that are subject to gradation: "rare", "special", "unusual", and "extraordinary" come to mind.
(2) The original meaning of "unique", namely "one of a kind", is an important concept. If we let the word's meaning get lost, we will not be able to express that meaning as easily.
But you can easily define it to mean something else. And you can even make "uniqueness" comparable.
Semi-off-topic, but the notion of "more unique" reminds me of Sapolsky's TED talk about humans as the "uniquiest" animal.
https://www.ted.com/talks/robert_sapolsky_the_uniqueness_of_...
1. Writing isn't coding. In coding, you can do various types of "cargo cult programming" and "copypasta" and what-have-you -- in other words, as long as the code runs you don't necessarily have to know why or how a programming idiom or convention works, or how/why expressing it one way in code is better than expressing it another way in code. This definitionally untrue with writing. If you don't know the why/how of something, then it's better for you to botch it and let the reader attempt to parse it so at least they know what they're dealing with and how to interpret it ("oh, this guy's a non-native speaker, so I'll adjust my reception accordingly" or "ah, this person is kind of clueless about the whole sexist language thing, which is good info for me.").
2. 90% of writing style advice falls into one of two categories: a) hotly debated, and b) totally wrong. Most of it is in the latter category, and this includes Strunk & White (just use google for numerous takedowns of that text). I looked through the PR queue and saw that it consists of eager coders finding style advice from various sources and trying to work that into the tool. That is terrible, terrible, terrible... This will guarantee that the tool will represent a collection of awful writing advice gleaned from dubious sources and wielded with unforgiving ignorance.
This tool may be a terrible idea, but the idea of automated prose linting is not terrible. Most beginner to intermediate writers have tics, and as an editor I often have a couple of writer-specific find/replace things I do when I get a new piece from a particular writer (e.g. "this person uses 'however' when she means 'but', and this person overuses these four business jargon terms, etc.). If editors were able to easily compose and execute writer-specific linters from within something like Wordpress, that would probably be pretty great.
But this particular command line tool is destined to be either totally unused or massively abused.
I'm sorry, I hate to be mean... or, actually, there is a small part of me that enjoys playing Mr. Party Pooper when I see a mob of enthusiastic programmers trying to tie down some great cultural Gulliver with a thousand tiny little automated, black-and-white rules.
This positively screams for a online interface to test drive.
> detecting grammatical errors is AI-complete, requiring human-level intelligence to get things right.
(emphasis mine)
First, there's a problem of usage. When in CS we say that a problem is class-complete (like NP-complete), we mean that the problem belongs to the class (which in this case is true, because human-level intelligence can check grammar), but also that it is class-hard, which informally means "at least as hard as the hardest problems in class", and more formally means that any other problem in class can be cheaply reduced to the problem, and so finding a suitable solution to the problem is identical to finding a suitable solution to all other problems in class. Not only checking grammar not known to be "AI-complete" then, we don't even know that human-level intelligence is necessary to solve it.
But the reason this bothers me even though I fully understand the statement was made informally, is a little deeper than that: we don't even know what "human-level intelligence" (or intelligence in general) is, let alone what AI means. That people refer to AI as if it's a thing rather than a very vague notion, clouds how people think of AI research as well as intelligence. I would have simply said "we don't know of good algorithms to dependably check grammar, and this appears to be a very hard problem that may require intelligence".
I ran it on a couple 800 word emails and it didn't catch anything except me using 2 spaces instead of 1 in one place. I also ran it on my city's sidewalk maintenance ordinance, and it didn't report anything.
And yes, `python3` for the win. :)
[^fn1]: If you wanted to be truly precise, it's a parametric family of metrics.
http://matt.might.net/articles/shell-scripts-for-passive-voi...
Some might say you can mark lines as not being linted, but that then makes the change vulnerable to bikeshedding. For some people, being able to force the conversation to not happen because the linter is authoritative might be good, personally I prefer to follow the guidelines but be aware of the fact that they are there to aid in understanding for future coders not to adhere to a standard.
Prose isn't code.
Many key elements of good writing are based around the idea of knowing the rules, and then carefully breaking them.
It sounds really awesome but it's very hard to tell if it's going to be more annoying or more useful. Maybe it would be useful to have some example linting errors on the homepage.
Either way, I really love the idea!
One needed improvement: display the offending line on errors. Then you don't have to toggle between file and console to contextualize the errors.
EDIT: I must be blind - they say about ST plugin (although they don't link to it). https://packagecontrol.io/packages/SublimeLinter-contrib-pro...
Have copy on web site be intentionally incorrect, red-underlined with (small modals? tooltips?) that show what's been corrected/suggested by the tool.
Is there any word on business model / the intentions of the developers? Is it something that's being open sourced and then integration assistance would be commercialised?
Could you include a sample .proselintrc? rc files tend to have very different opinions on how to be formatted: dictionaries, JSON, bash-argument syntax, and so on. (EDIT: Ah, found one: https://github.com/amperser/proselint/blob/cd428bb0ecc5530c1.... Can’t quite get it to ignore butterick, though.)
I find it a little curious that you use a Markdown example and lint for curly quotes and unicode ellipses by default (butterick), since Markdown discourages such pre-formatting in its syntax, but that’s just hairsplitting, of which I can tell by your swelling Issues count that you have plenty of as it is. :)
Looking forward to some formatting/syntax highlighting in the CLI output, but I know you have your hands full as it is.
Nice idea, but you need to catch homophone errors.
As czechdeveloper has pointed out in this thread, it would also be nice to have a set of rules specifically for academic writing and/or for non-native speakers (e.g., Asian scientists seem prone to overuse "the").
I guess, a first step would be to have an extensible set of tags for the rules - both language-specifying ones (i.e., any_language, american_english, british_english, german, ...) and genre-specifying ones (any_genre, prose, poetry, academic, technical, ...). Furthermore, an easy way to select a subset of rules by tag (e.g., british_english and academic) would be neccessary.
Would that fit within your goals for proselint?
Was that deliberate?
After testing this against a dozen of my blog posts, I'm not terribly impressed with the output. I get more immediate value out of MarkedApp's keyword drawer and word repetition visualization.
We're interested in incorporating deeper NLP. In particular, we've been eyeing https://github.com/spacy-io/spaCy.
To the author(s): Your website, as far as I could tell, doesn't tell me how to install it; I had to go to GitHub to realize it was pip-installable. You should consider adding that to the main page.
Bug report — it told me I had too many exclamation marks in a Markdown file with a number of images in it.
re 2, you'll see at http://proselint.com/approach/ that one of the guiding principles of Proselint is that we defer to experts. In practice, that's meant almost all the advice comes from Bryan Garner's usage guide, Garner's Modern American Usage. He is a careful compiler of advice and you'll find that he is almost never "totally wrong", and when his advice is debated, he knows it, notes it, and provides a thoughtful discussion.
re 1, we think of Proselint as eventually being useful as a training tool, a way to learn the conventions. Note that natural languages are large, with so many low-frequency terms that nobody can learn the whole language. Why err if an automated tool can help? Consider for example demonyms, what you call people from a certain place. How many people know, for example, that people from Manchester are Mancunians, not Manchesterians? Rather than call someone by the wrong name, with Proselint the voice of an expert gently corrects you, and you learn a cool new word.
We aren't a mob of programmers, we are three people who love language, respect it, and think we're 2% of the way to making a great tool, one that The New Yorker could run over its stories to flag issues that its own editors would flag anyways. (In fact, we've done this, running Proselint over a corpus of highly vetted text, and have found numerous issues.)
Widespread use of proselint to correct this type of thing wouldn't improve writing. Rather, it would just add another interpretive option to the above range of scenarios, i.e. "ah, I can tell that this writer did or did not run that proselint tool before submission, because their text is or is not littered with boilerplate proselintisms."
The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.
Bryan Garner might be a careful compiler but doesn't seem to be a linguist and seems to be a traditionalist who makes simple errors.
e.g. http://itre.cis.upenn.edu/~myl/languagelog/archives/001869.h...
"His chapter is unfortunately full of repetitions of stupidities of the past tradition in English grammar — more of them than you could shake a stick at."
http://languagelog.ldc.upenn.edu/nll/?p=5630
"So why did Bryan Garner, a highly intelligent and insightful person, make this elementary error?"
http://www.arrantpedantry.com/2007/01/02/editing-chicago/
"A good editor should know that Bryan Garner’s take on the subject is misleading and incorrect. It’s become apparent to me that many of the self-appointed guardians of the language don’t even know what it is they’re guarding."
etc.etc.
> This tool may be a terrible idea, but the idea of automated prose linting is not terrible.
So which is it? The idea of the tool is prose linting, and you've now stated both that you dislike it quite a bit, and that it's not terrible.
Part of what I think you may be missing, is that it doesn't need to be an all inclusive set of generally terrible, conflicting suggestions. With code style checkers we've already mostly solved this problem, by both storing metadata regarding the source of the rules, and allowing this metadata to be referenced when making custom rulesets. Perl::Critic[1] is a good example of this. It allows you to use the default ruleset and select a severity of criticism, or it allows an organisation (or individual) to create their own custom ruleset to enforce how they want their code to look.
Keeping this in mind, what if the default ruleset was curated to have select rules from multiple sources, but allowed you to easily take a source and use its rules? For example, if I want to write using Strunk & White today, that might be as easy as a command line flag, or downloading a specifically compiled ruleset. If I want to use something else, the same. If I want to make my own custom ruleset based on rules from multiple rulesets and a few of my own thrown in, that should be possible too.
1) Editing fiction by Terry Pratchett -- not too useful. 2) Editing a newspaper article -- maybe it would catch a few typo-level issues that crept in under deadline pressure, but a professional writer wouldn't lean on it. 3) A non-native speaker of English running meeting minutes through it before blasting out the e-mail -- that has a lot of utility. (Actually, the "I went to engineering school because I dislike writing." native speaker of English would benefit from linting that e-mail, too.)
You dislike this tool the same way welders dislike computer welding or the same way truck drivers will dislike automated driving.
Everyone wants to believe their job is so complex that a computer will never be able to perform the same task adequately. Is critiquing a sentence really as complex as driving a car in heavy traffic? Or playing chess? Or finding faces in photographs? Or winning on Jeopardy?
You look like my wife complaining of GPS devices because sometimes they error or take us to dangerous places. It is just a took, you can just ignore its recommendations.
This could be a benefit in industries where the goal is to have homogenous writing that meets a given set of specifications/standards. Some ideas:
1. Peer-reviewed scientific writing and/or abstracts
2. Manuals
3. Materials written for a subset of language (EFL, pidgin, children's books)
4. Documentation
But still, it corrects incorrect things that my spell checker doesn't see, like inconsistent spacing and 'goofy approximations' like (R) for ®. (Depends on your definition of incorrect, but I personally would not mind at all if these things were homogenized for everyone, it would not take any richness out of the English language).
What I'd like (--help doesn't list such an option) would be to be able to enable some checks with a flag while disabling other parts (the ones that contain suggestions you can elect to break).
First, much of the advice is that certain word sequences are problematic without suggesting any particular replacement text. There are a few reasons for this (including the computational natures of error-detection vs. solution-recommendation problems). The reason most relevant to your concern is that solution-recommendations are more likely to produce a homogenizing effect because they have a driving effect, wherein using a particular set of words is deemed superior to another set of words. Much in the way that the diversity of life-forms has arisen because of selective pressures, by eliminating the least fit combinations of words, the native variation in writing can flourish all the more readily.
The goal is not to homogenize text for the sake of uniformity, but rather to identify those cases that have been identified by respected authors and usage guides as being specifically problematic. Any text that is sufficiently artful and compelling to have not been specifically addressed by these sources should not be able to be caught by the linter. Novelty will continue to introduce new usages, and some of them will be poor. Authors identified as trustworthy may point these out, but this will only be in retrospect. If you do not trust a guide's point of view, our strongest recommendation would be to turn off the modules associated with that guide. You can see some of the module names and a high-level description here: http://proselint.com/checks/.
Finally, I will modify a quote in the Foreword[^fn2] by Robert Bringhurst in The Elements of Typographic Style (version 3.2, 2004) > [Language usage] thrives as a shared concern — and there are no paths at all where there are no shared desires and directions. A [language user] determined to forge new routes must move, like other solitary travelers, through uninhabited country and against the grain of the land, crossing common thoroughfares in the silence before dawn. The subject [of proselint] is not [stylistic] solitude, but the old, well-traveled roads at the core of the tradition: paths that each of us is free to follow or not, and to enter and leave when we choose — if only we know the paths are there and have a sense of where the lead. That freedom is denied us if the tradition is concealed or left for dead. Originality is everywhere, but much originality is blocked if the way back to earlier discoveries is cut or overgrown.
[^fn2]: Only because we are on the topic of historical traditions and stylistic guides, it should be mentioned that a foreword – according to book design tradition – would be written by an individual other than the author about the author, the book, and usually the relation between them. In this case, the section in Bringhurst's masterpiece labeled "Foreword" would likely be better described as "Preface" or "Introduction". Given his knowledge of book design, I shall assume that this was a conscious departure from the road of tradition, even if I cannot appreciate the new view that it offers.
I was writing some C earlier and my linter warned me about "incrementing a void pointer". However, I understood the context better than my linter, knew that I'd be compiling with gcc (which allows void pointer arithmetic), so I ignored the warning and carried on. My code compiled and ran nicely.
When it comes to static analysis, I think (creative) writers, like programmers, wouldn't care about warnings. This is already true of spell-checkers (e.g. my letter-writing character is English, but my text-editor's yelling about "colour").
For a skilled writer who takes time to write "proper" prose, probably not very useful.
But for me, as a non-native english speaker who writes a lot of short english texts (emails, documentation, HN comments and so on), it could probably help.
For example, since I write both US and British English every day, a consistency warning is certainly helpful. I would also like a linter to help the flow of text, for example by pointing out when you aren't mixing up your sentence lengths in a good way. Oh, in that last sentence I accidentally missed that I used first person in the sentence before that! A linter as a chrome plugin would have pointed that out.
I can imagine a tool like this making it much easier for journalists to follow a newspaper's style guide or something similar.
My impression of this might have been different if the list of rules included CMOS instead of something that tells me not to use the term "jump the gun" because it's a cliche.
That said, I think there's a better way to approach this. Rather than linting based on a list of rules, I'd prefer a more technical approach that highlighted actual issues, such as garden path sentences, ambiguous pronouns, doubled words, etc.
While "$10 USD" may be redundant in a newspaper published in the USA, it's immensely useful and arguably preferable when writing blog posts, emails and other text destined for the "World Wide" Web. While USD is commonly used as and many are comfortable with its use as a "common denominator" when pricing something on the Internet, it's still very important to be clear "what dollars do you mean" in this context.
People in Australia might disagree. As might people in Bermuda, Colombia, Canada, Hong Kong, Argentina, ...
https://en.wikipedia.org/wiki/Dollar#Other_national_currenci...
The link I posted really concerns the insufferability of someone who corrects technicalities of language rather than a discussion on whether uniqueness is a countable property.
This tool is a blunt instrument. Writing is an art.
Semantics aside, it's not important to slm_HN's point. We can call it an AI, an algorithm, or just a computer, and in any case it's still possible for it to find errors beyond spelling ones.
"... there is a small part of me that enjoys playing Mr. Party Pooper when I see a mob of enthusiastic programmers trying to tie down some great cultural Gulliver with a thousand tiny little automated, black-and-white rules."
I'd reexamine that part, if I were you. I suspect it may be bigger than you think it is, especially since you've already pigeonholed the creators.
I was referring to a hypothetical American creative writer, writing a scene in which a British character writes a letter. In this hypothetical work, written in US English, there would then be a section of text that used UK English spellings. The naive spell-checker would not understand the context, and would flag these as misspellings.
This was meant to be analogous to my "incrementing a void pointer" example; the static analysis tool produces warnings which the author knows to ignore. In the C programming case, my function was passed the size of the objects comprising the array pointed to by the void pointer, so the linter was wrong to tell me I was making a mistake. Similarly, the spell-checker was wrong to say "change this instance of 'colour' to 'color'".
Similar considerations apply to prose linters.
Polonius would be a lesser character if shed of cliches, and a good writer would know to ignore the linter's opinions on the matter.
If the only goal of writing were to allow accurate assessment of the writer, then I would agree. But there are other reasons for writing — informing, persuading, clarifying, &c. — where writing clear, consistent, and idiomatic prose can help. Yours is a condemnation at all attempts to improve writing beyond the first-draft capabilities of the author.
> The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.
Agreed, Proselint is not the right tool to improve genuinely bad writing. Reading great authors and sweating through drafts is what we'd recommend to get better at the craft, too.
From a reader centrist point of view, I can understand lamenting the loss of this information channel. From the author's stance, I can imagine wanting to tighten up alternate channels of information and present a clearer message. The author always has this ability, through natural circumstance, effort or research, so this tool would do nothing but make it easier. As a reader, it may change the assessment to whether they ran a proselint-like tool or not, but in the end those are just assumptions. The writer could be making specific choices to disregard the linting tool on purpose. In the end, reading is still an interpretive experience, this just allows authors more options.
> The way to improve genuinely bad writing is not with rules and tools -- it's with lots of reading, a little mentorship, and lots and lots and lots of practice.
Generally good advice for any thing, but I think it's worth noting that different people learn in different ways, and providing more methods for learning is generally an improvement, and opens the field to more people. Tools that look to circumvent historical methods for achieving skill often face an uphill battle from those that used those historical methods. It's easy to see why, as it looks like it has devalued much of the hard work they put into their skills. This may be true to an extent, but the gains often far outweigh this, as making a skill accessible to more people has wide ranging benefits for society in general.
In more concrete terms, I see no reason why a tool like this can't be a multiplier for mentorship and practice. At the very least it enables exposure to ideas that might not have been encountered before.
To the library authors, Proselint looks very cool!
The second link in particular is tendentious. It claims Garner gives "a savage indictment of the behavior and character of those who use Stage 1 words [new usages]" in his book MAU.
But if you follow to the linked page from MAU, you read that Garner is, in an appendix, giving a series of wry analogies for the process of acceptance of new terms -- not a savage indictment at all. In other words, Garner is not himself saying all new usages have "a grade of F", etc., he's saying that is how some new usages will be perceived, in a very gross and qualitative sense, by a strict static conception of the language.
Since Garner comes right out and explicitly says all of the above, the link you cite comes off as picking a fight. There's nothing there.
Having read MAU (back in its first edition), I have to say that Garner strikes me as a very good guide to usage. I still enjoy perusing the book.
Taken as a whole, do you really have significant issues with MAU as a usage guide?
Wasn't my intention - merely pointing out that he's not a linguist and making simple errors should give anyone using him as an "authority" considerable pause.
> do you really have significant issues with MAU as a usage guide?
I am neither an American nor a linguist - which makes me doubly unqualified to comment. That I leave to experts.
Again, the idea of prose linting is not terrible, and in fact I do a hacked up version of it with a set of standard "find/replace" operations for specific writers who have specific issues. But a giant, general-purpose ball of rules of dubious provenance applied to a generic abstraction called "prose", is what I take issue with.
Is there an accessible, comprehensive, easy-to-read guide like Garner's Modern American Usage that's considered more accurate? There don't seem to be many options.
(I have a copy of GMAU and enjoy it, but mostly for discussion of usage, not the details of grammar)
If the context is explicitly local (such as a local newspaper, menu), then $10 is sufficient in the United States.
"USD 10 looks weird, why did you do that', or 'that on the pricing page looks funny, can you fix it up a bit'
It seems $ (or the equivalent currency symbol for other currencies) has a place in many peoples minds implying that the number it is next to is currency, and they seem to find it weird when things involving currency are 'written correctly' without the symbol that the numbers mean currency.
That's going to need a citation. To be sure there's plenty of style guides which say "don't do that, do [this other variant instead]" but where's the standard that makes this unequivocal?
E.g. (2, 4, 7) (2, 4, 8) (2, 8, 4) (2, 4, 7) (1, 4, 8) (0, 9, 3) (987, 4, 7)
When asked what are the "most unique" sets in that list, you'd probably be acting deliberately obtuse if you chose anything but one of the last two.
I can imagine defining uniqueness as a function returning real number from <0; 1> instead of a boolean value. For example:
let U(p, x, X) be the uniqueness of property(function) p(x) for element x of set X
U(p, x, X) = 1.0 - (size of X')/(size of X\{x}), where
X' = set of all elements x' of X such that p(x')=p(x) and x' != x
Property p of element x of set X is strictly unique when, and only when U(p, x, X) = 1When it's useful? For example for speaking about minimizing collisions of hashes for given data.
Another way of thinking about it: uniqueness is 1-probability of uniformly randomly finding element with same value of p in X as x after removing x from X.
Instead of saying everything is unique we could simply say that there is nothing. A thing is itself an abstraction. The concrete world is without inherently distinct things. We must abstract things for "unique" to describe something at all. As you implied, this process is arbitrary. Every way in which you could abstract things implies a distinct notion of "uniqueness". To simply select one "uniqueness" (like mathematics) is arbitrary. But to consider every possible "uniqueness" equally is also arbitrary. Without prioritizing forms of "uniqueness" we can only construct a partially ordered set. So when you void a fixation on mathematics, things can be more, less or "incomparably" unique.
I suspect most pairs of things are incomparably unique. Further, I suspect most binary qualities are predominantly incomparable. I don't know that you should never say things like "more unique" but it might be fair to issue a warning in a prose linter. Any binary quality used as a continuum requires an arbitrary combination of it's distinct forms. If this isn't specified then it only has meaning for those who already know what it is.
From a common sense perspective, it's obvious that there are things. Sure, you can point out the flux and decay of all entities, but still, this table here is a coherent thing even if it's made from parts in a temporary arrangement.
In some sense, philosophy itself is destroyed when you go down the path of denying objects, since philosophy crucially deals with concepts, and concepts are "thought objects."
Harman describes two modes of denying objects: undermining and overmining. Undermining is the tendency to say "really, this object is just a composition of these other particles," while overmining is the tendency to say "this object is just a modulation in a grand monistic entity."
Instead of that, he recommends an ontology of objects that's pretty interesting and fun to read about. He would, I think, agree that objects are unique in that they are (in programmer jargon) "pointer equal" to only themselves... and each real object, for that reason, has an infinity of potential that's never exhausted by any "arbitrary" perception of it... yet still, we perceive other objects not directly, but through aesthetic caricatures, and on that level you might have different degrees of uniqueness.
You need to define some relation on that set to get classes of abstraction. And that's exactly what abstraction means :)
An easy fix for this not problem is to use distinct instead of unique
If you mean similar, may I commend the word similar? ;-)