The funny rules of SpamAssassin in 2023(updown.io) |
The funny rules of SpamAssassin in 2023(updown.io) |
And this plays into the strengths of the big mail networks in detection. It's a bonus to them that every time they block a smaller host there is a good chance that sender will consider a move to office365 or Google Workspace for their mail.
As an aside, not sure if OP is related to them but updown.io is a nice service and I appreciate the simple PAYG pricing! For what it's worth their mails seem to get through successfully to me too.
Also for those facing mail delivery issues (or just practicing good email hygiene) - I recommend www.mail-tester.com - they give you an email address to send a mail to and carry out a heap of tests - including checking against SpamAssassin + blacklists, SPF/DNS/etc testing.
The irony is that a substantial amount of the spam I receive comes from those platforms.
-https://mecsa.jrc.ec.europa.eu/en/
Are exellent tool's to check your "deliverability".
I switched back to GMail a few months ago, and not only do I see less stuff in my Junk folder (indicating Google is blocking stuff rather than identifying it) but also I have not seen a single false positive. Hopefully that means Google is more effective, but there's no way to tell if I'm missing legitimate email. So far, no complaints.
Not related in any way except as an happy customer. They added a blog recently and this article caught my eye because of the nightmare that is mail delivery issue for everyone.
I found it particularly ironic that you now have to think like a spammer (i.e. look at spam detection engine source code to find a way to circumvent their heuristics) in order to get your totally valid email delivered (^_^).
edit: typo
I couldn't agree with this more. I want people to remember this whenever the topic of decentralization or federation comes up. People see this as a technical problem. it's not. It's a political and organizational problem. Even with email, which is fully decentralized (other than the ICANN TLDs) running your own node still incredibly difficult. And those reasons aren't technical at all.
Brevity has value. Having to bloat content (an email to get past anti-spam; a cooking blog to rank better within Google SEO; ...) brings back memories of high-school english papers, or the modern equivalent ChatGPT.
Any smart spammer will just tweak his spam to not hit these rules... And if he hasn't, it's because the vast majority of people don't use SpamAssassin
The problem wasn't just the number of FPs (which were much higher than the 'Cuda) -- it was that they came from real people, who were often common senders. This is not corporate email, or anything that was even remotely spam (except as SA's crazy ruleset determined). These all required whitelisting, and it became a real chore for all my users to keep up with all the whitelisting.
So back to the Barracuda for another year. It lets a little more spam through, but virtually no FPs. I just couldn't make SA get the same performance, even with many tweaks to the weights and rulesets.
I basically trash all emails not in my contact lists. Easy.
Most spammers and marketing/sales sleezoids never think they are doing anything wrong. They are totally empathy incapable. Or they know they are scum and don't care. Either way.
OP talks about adding "invisible text" and other such common spammer tactics to get around some of the rules. Zero self-awareness.
At no point did this person ever think "did I do something wrong?". No, it's that shitty Spamassassin!
Well-known rules will block most spam, some with occasional collateral damage but many with no realistic chance of collateral damage.
Entity-encoding @ as @ in email addresses in HTML will block the vast majority of email address harvesters, with no collateral damage.
Adding a honeypot field to an HTML form, with the label “If you are human, leave this field blank” and hidden by CSS, will catch practically all spam submissions, with no collateral damage.
I am sure there are plenty of smart spammers, but it also seems like a lot of spam comes from folks using scripts and email lists they use without fully understanding. It appears SpamAssassin would help with those operations.
So I wasn’t expecting Postgrey to provide much benefit. As it happens, in 10 years of running my own mail server, it’s the only anti-spam measure I’ve had to bother with.
Spam is all about high-volume/~no-cost delivery of crap. Time spent tweaking the spam - to evade $Defense_1, $Defense_2, etc. - is added cost. Especially if $Defense_n is only used by a few of the prospective victims (folks too savvy or paranoid to be suckered do not count), then tweaking to get around $Defense_n is a losing strategy for the spammer.
Bingo. Not that there aren't a lot of people running SA, but spammers want to be able to deliver to the big players(1) (gmail, o365, etc), not the size folks out there running SA. It's not worth their time to devote effort to optimizing for a rounding error in the deliverability equation.
(1) Unless they're selling 'targeting' services where you're paying to deliver to a specific domain/user which might be behind SA. Plenty do, but that's a little bit farther down the criminality spectrum and vastly less volume than shilling peener pills or warranty extension scams.
edit: formatting
Each rule has a score associated with it. By default a message needs to reach 5.0 to be marked as "spam":
* https://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_Spa...
The threshold is configurable. An header is added post-processing, e.g.:
X-Spam-Status: Yes, score=21.6 required=4.0 […]
* https://cwiki.apache.org/confluence/display/SPAMASSASSIN/X+S...One can then choose what do to with this information (via procmail or Sieve). There is another header as well:
> X-Spam-Level: This displays your spam level with asterisks, with one asterisk displayed per point, rounded down. For example, if your overall SpamAssassin score is 4.3, it will display ****. If you score less than 1, for example, 0.5, it will display nothing.
But that's a problem that will resolve itself over time, in a variety of ways. And the spam systems can play the same tricks with only invoking it on a fraction of emails too, of course. It's just at current expense levels, that would be a very small fraction indeed. I'd hazard that trying to use modern AI on spam classification at scale could easily consume 10x-100x of all current AI hardware and still make less of a dent than you'd hope.
https://spamassassin.apache.org/full/3.0.x/dist/doc/sa-learn... https://cwiki.apache.org/confluence/display/spamassassin/Bay...
Then there are increasing tiers of cost that you would only run after it becomes likely that the message is acceptable. As you say, you would only run an antivirus on a message on the verge of delivery, because decoding the attachment and running the AV (in an expensive sandbox) is so costly.
Somewhat out of context, but greylisting works as well as the day it came out.
It occurred to me recently that LLM-style tokenization + bayesian classification would be a sweet upgrade for spambayes, which always struggled with ad-hoc tokenization rules.
(I don't think of it as "client side"; it was integrated with my system via procmail on what I'd call the "server side". You could use it in other ways, including as an Outlook plugin, way back in the day. Or it could connect to an IMAP mailbox and filter messages it found, etc. Really versatile tool for its time)
https://workspace.google.com/blog/identity-and-security/an-o...
You received this message because you are subscribed to the Google Groups "jan-09" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to jan-09+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jan-09/0
14101da368d$0bdc8160$23958420$@gmail.com.I cannot email that unsubscribe link because it says I am not subscribed. I cannot visit that page, I have not subscribed to that group. I've had to set up a special filter to look for that footer.
I am not the only one with this issue. See https://support.google.com/groups/thread/68075070/i-get-goog... .
... Wait! You've indirectly helped solve the issue!
They are being sent to "info@" my domain, an alias that forwards to my real account. I set up a new outgoing account with that From address, sent from there, and managed to get Google to unsubscribe something I never agreed to in the first place.
It's been like this for a year, and with multiple attempts to fix it.
Thank you!
Few months later, they started bouncing my server's new IP address and that, too, wasn't their fault of course: "we're not seeing a block for your IP address so there cannot be bounces". Denying reality is super effective. The punchline was that they had blocked the new ISP's whole range rather than just my IP, so they weren't getting any hits when searching for my IP address. I found this out through some back-and-forth with a friendly sysadmin at the ISP, who was also banging their head against MS' wall...
These people must be so underpaid they're probably giving MS money for the privilege to work for such a correct business
(If you haven’t realized, this is why Gmail has SMTP message origination disabled by default — these days requiring not only enabling it for your Gmail account, but also fiddling with app passwords to get it working. If it was enabled by default, the “spam from stolen credentials” problem would be so, so much worse. Whereas, at least with the webapp route, Google can block you if you look like a bot [i.e. if you’re doing an insufficiently good job at fooling them.])
If anything I'm nervous to recommend Google because they flag too many legitimate emails as spam. After years of not checking, I'm checking spam again.
Does your company do outbound marketing/sales?
I've seen multiple companies spin up outbound email marketing campaigns where someone compiles a list of 5000 email addresses based on certain demographics, and then send automated emails (that look not automated) over the course of a month, rinse, repeat. Google Workspace will let you do this, but if you're too aggressive with email volume it can kill the reputation, and therefore deliverability of any email from that domain.
(Which is why most companies send outbound sales emails from a domain other than their primary domain to separate out the sending domain reputation)
Good guess, but we don't. I also checked DKIM/SPF when this happened and all appeared in order.
In short: you can probably do better than 3-5 spams per week with SA.
The big problem is the entire thing is a beast to configure with all the documentation of a Babylonian cuneiform stone tablet.
I'm more curious about the opposite metric: how many non-spam emails a week arent getting delivered to you? Because that seems to be the real flaw in spamassassin: the false positive rate.
And the spamassassin users don't usually have much visibility into this, so when emails don't get to them they just blame the sender.
My false-positive rate is very low, maybe a couple per month. However, I can predict with a high degree of accuracy when a piece of email is likely to land in the spam folder. Things like confirmation emails, registration emails, etc. are guaranteed to land in the spam folder. It's pretty hard for any system to accommodate those without allowing spam to get by.
That's fine by me, though, because I know when to check my spam folder.
I also have dovecot set to learn Ham every time I file an email from the inbox to a folder for good measure.
1 spam a month would be .018% of emails and 5 x 4 spams a month would be .364% of emails
So I would have gotten about .346% more spam based on the number of emails. In reality, because I don't see all of the mails, it's less. Is a touch more than a third of a percent a 'big difference'? YMMV.