The woes of sanitizing SVGs(muffin.ink) |
The woes of sanitizing SVGs(muffin.ink) |
A useful thing I learned recently is that, while CSP headers are usually set using HTTP headers, you can also reliably set them directly in HTML - for example for HTML generated directly on a page where HTTP headers don't come into play:
<iframe sandbox="allow-scripts" srcdoc="
<meta http-equiv='Content-Security-Policy'
content='default-src none; script-src unsafe-inline; style-src unsafe-inline;'>
<!-- untrusted content here -->
"></iframe>
It feels like this shouldn't work, because JavaScript in the untrusted content could use the DOM to delete or alter that meta tag... but it turns out all modern browsers specifically lock that down, treating those CSP rules as permanent as soon as that meta tag has loaded before any malicious code has the chance to subvert them.I had Claude Code run some experiments to help demonstrate this a few weeks ago: https://github.com/simonw/research/tree/main/test-csp-iframe...
This should be in addition to heavily restricting CSP on user content. (Hmm, surely all images should be served with the CSP header set.)
A better approach here would be to just serve svg with Content-security-policy: script-src 'none'; sandbox
I've been just using plain typescript/html and it's so easy to say "yeah all of that rendered content goes into an iframe", I've got all of d3 entirely sandboxed away with a strict CSP and no origin.
I do hope that iframe sandboxing grows some new primitives. It's still quite hacky - null origins suck and I want a virtual/sandbox origin primitive as well as better messaging primitives.
For something like this that's security critical I'd really like to see each of the browser vendors publishing detailed, trustworthy documentation about their implementations.
The technology itself is very widely deployed due to banner ads, so it's at least thoroughly exercised.
But for all my self righteous bluster the inline version was news to me. Hacker news. Awesome. Thank you.
I do feel that's there's two distinct types of svg - "bunch of paths with fills" and "clever dangerous stuff" where most real SVGs are of the former type.
Fully expect this to be shot down by someone that's thought about this problem for longer than the 120 seconds I just spent. :)
Even worse, OP's latest post "Every version of Scratch is vulnerable to arbitrary code execution" just tells you how exactly to exploit something similar today in the current version with no mention of responsible disclosure except a plug to say, "hey, check out my project, this one doesn't have RCE!" This is so irresponsible it borders on malicious.
https://developer.mozilla.org/en-US/docs/Web/API/HTML_Saniti...
https://developer.mozilla.org/en-US/docs/Web/API/HTML_Saniti...
I'm sure it'd just open up a whole other can of worms though... not to mention having to wait for browsers to actually support it.
The real solution here is definitely CSP + basic sanitisation though.
This would allow an update to the xmlns to
<svg xmlns="http://www.w3.org/2000/ssvg">
Which would allow the image to force SSVG mode and disable all non-approved features, but you could also update the image tag so the client could force security on potentially insecure SVGs <img type="ssvg" src="/insecure.svg">(This isn't a comment on the challenges in proper sanitization fwiw, as I've needed to do various of the same things myself)
[1]: https://developer.mozilla.org/en-US/docs/Web/API/SVGGraphics...
> Example from Scratch's test suite:
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" xmlns="http://www.w3.org/2000/svg">
<circle cx="250" cy="250" r="50" fill="red" />
<script type="text/javascript"><![CDATA[
alert('from the svg!')
]]></script>
</svg>
Is this really an issue? This is the method that the chrome teams polyfill to replace XSLT suggests you do. https://github.com/mfreed7/xslt_polyfill/tree/main#usageBut for the most part, I 100% agree, and I've been considering making a format for my own use-cases. I think the biggest issue is in agreeing as to what subset is necessary; plus, of course, getting any level of adoption (though the latter isn't a factor for my own use ... except in the sense that there are no tools to help).
For example, do we need animations? Gradients? If so on the latter, what kind?
This version 3 could have the version number changed to 2 in order to do cool SVG things, so full-fat SVG as version 2 is now. But you could just flip to 2 to a 3 on upload, so any embedded URLs are harmless.
This could be useful for the creator too, as it is helpful to have layers of source images in bitmap format to work with, and you can easily export such things accidentally.
This kind of stuff will get way worse with LLMs. They like just stacking more and more code on top of workarounds.
It is not, and never was, an image format. It's a markup language.
Like this post didn't even mention presentational attributes, like how cursor attribute can contain a url that gets loaded. Or any of the other tricky parts of svg sanitization, like using dtd to bypass things.
Is it because the SVG parser/renderer being used is an entire library, and it would be prohibitive to write your own SVG parser/renderer or insert your own code into the existing one?
You could change the default behavior to the “safer” behavior. And then add some sort of “danger mode” attribute. But… devs are usually hesitant to do something that would break legitimate code, such as changing the default behavior would do.
The infamous you can't parse (X)HTML with regex¹ meme is from 2009, yet this fix was done in 2019. I guess the SO answer never mentioned SVG.
Tag names, attributes, attribute values, event callback default-cancelers... so many ways to declare that this node and its children shouldn't parse/evaluate scripts.
As Jay-Z said: "I've got 99 solutions, fixing a problem ain't one"
I can't imagine the cumulative number of man hours wasted on this problem when the vast majority of users were just looking for a way to make their logos look sharp.
I would even go further: HTML should never have supported scripting.
I'd love to see an agreed standard like OpenGL vs OpenGL ES for SVG. SVG-ES. Everyone agrees on the static, non-scripted elements that should work.
If someone formalizes this as a new format, please give it a new name! tvg tiny vector graphics? savg safe vector graphics?
And keep the scope as simple as possible so it actually ships! Don’t try implementing a binary format or something.
It sounds like the linked post was about someone using a blacklist instead of a whitelist. It doesnt matter how tiny your subset is if you allow through stuff you don't recognize.
For the most part svg is safe. The dangerous parts are pretty obvious - script tag, image tag, feImage tag, attributes starting with on, embedding html in <foreignObject>, DTD tricks, namespace tricks, CSS that loads external stuff (keep in mind also presentational attributes. Its not just style attribute/tag).
The rest of it is pretty safe.
Though I think it's still a draft, it does appear to be a requirement for BIMI - https://en.wikipedia.org/wiki/Brand_Indicators_for_Message_I...
If you ever need to interface with other tools that generate SVG you now need to have a way of essentially transpiling SVG from the wild into your tamed SVGs. Oftentimes this is done by hand, by a software developer and designer (sometimes the same person).
And this is for basic functionality that your designers expect and have trivial controls for in their vector editors, like "add a drop shadow."
The article goes into some issues with sanitization itself, and except for <script> these are a bunch of reasonable things that someone might expect to work or not have issues with. Sandboxing rendering isn't an unreasonable approach if you're not writing the parser and renderer yourself.
Look at what Microsoft did with Excel--the dangerous stuff is behind a switch.
Thus, solution:
Add two bits to the tag.
SVG1 does not execute any sort of script.
SVG2 does not follow links.
SVG3 is actually SVG1 + SVG2 as these are bit flags, not numbers.
Additional bits are reserved for future use if any other issues are found.
The only real safety is in the engine, not by any sanitizer.
* Simplified paths (no shapes, only one kind of object)
* DTDs, Attributes, CSS and references are pre-resolved
* Invisible elements and comments are removed
Resolving shared items may cause sizes to increase drastically however. This is the sort of explosion that compression is perfect at compressing.
1: https://docs.rs/resvg/latest/resvg/ 2: https://docs.rs/usvg/latest/usvg/
Sanitization-wise it's already possible to strip scripting from SVGs and anything else you want, it's just that a library like DOMPurify to avoid ballooning in size doesn't include say a preset to handle the extra parsing necessary to make them behave like browsers treat IMG embeds, so it's up to devs to add their own.
But yeah, a world where a simple attribute to achieve the same effect as an IMG embed but for inlined SVGs would be nice.
Think of prior technologies like display postscript and .doc, where a data format ended up a with big problems from its embedded "exec" type features.
There is iframe srcdoc if you want to do this.
By turning it into a document boundary when you use the sandbox attribute, kinda similar to loading an svg file inside of an <img> tag.
and yeah you could get 90% of the way there with an iframe srcdoc, but I was imagining some kind of cross between an <iframe> sandboxed into its own origin, and an <img> where it still has its own intrinsic size.
but it was mainly just a throw away thought, I've not really thought it through much deeper than that.
does that work for you?
But also so that setting up a CSS transform: scale(10000) can't take over the entire viewport, it'd be constrained to an iframe-like boundary (exactly like an <img>) but still remain as an inline SVG, sort of like an <iframe srcdoc>. So scripts on the parent/host HTML document can still manipulate it like the rest of the DOM, but the inner <svg> elements are all "inert" for want of a better word.
Actually I don't know off the top of my head what happens with an SVG file inside of a <img> when it references external images (either cross-domain or not.) I know scripts and animations get disabled, so I'd take a guess and say some CSS gets blocked too.
Again I've not really thought terribly hard about it, or if it's actually useful at all, and I'm betting it'd be filled with even more foot-guns than there are right now. I'm just thinking out loud.
If you're going that route, add CSP headers on HTTP level to disable scripting, and/or host the SVG on a separate domain that has nothing valuable, or use data: URLs.
I suppose an actual exception is Content-Disposition. If you want the user to save a file, you need to serve it with dest == document as far as I know.
Right now if I want to render untrusted content and if I use React I have to escape from using React to leverage this, using https://react.dev/reference/react-dom/server/renderToString
And using null origins has tons of UX problems - virtual / sandbox origins would solve this. https://gist.github.com/ddworken/309363b5d140bcc5ff6b39fa4a8...
There's just a lot more work to do before I expect to see this. It would solve so many problems though. I personally put d3, markdown rendering, etc, all into iframe sandboxes, which means the entire library could be malicious and it won't matter. But it requires way more effort than I'd like.
Like opening a PNG in a new tab is harmless but opening an SVG in a new tab is opening a pretty substantial can of worms.
Yes, that was a large part of the thrust back in the day. Even if it wasn't officially a goal of the SVG working group, there was a lack of an open standards-based alternative to what Flash was able to do, and the developers of the SVG standard saw that adding animation/tweening wouldn't take much given what browsers were already becoming capable of.
I'm don't remember precisely but I don't think you could script it from the DOM, I don't see how that could work if it's a plugin.
A malformed JPEG or PNG might have potential vulnerabilities but they are considered a failure of the browser or parser lib to mitigate.
An SVG however has vulnerabilities and those are directly built into the spec of well formed SVGs.
So users can view SVGs embedded in our site and they are regular vanilla SVG images. But say the user copies a link to this image (which we serve via our site or a CDN).
They share the image to a friend via URL and their friend clicks the link opening it directly in firefox or chrome. Now all the scripts in the SVG can execute and the image can rewrite the DOM to present itself as a fake website prompting them to log into their bluesky/atproto account to view the content. So said friend types their credentials in and the script in the SVG sends that back to their C&C server.