The woes of sanitizing SVGs

271 points by varun_ch 21 days ago | 109 comments

simonw 21 days ago |

I'm glad this article includes the only credible fix for the HTTP leak problems: CSP.

A useful thing I learned recently is that, while CSP headers are usually set using HTTP headers, you can also reliably set them directly in HTML - for example for HTML generated directly on a page where HTTP headers don't come into play:

  <iframe sandbox="allow-scripts" srcdoc="
    <meta http-equiv='Content-Security-Policy'
        content='default-src none; script-src unsafe-inline; style-src unsafe-inline;'>
    <!-- untrusted content here -->
  "></iframe>

It feels like this shouldn't work, because JavaScript in the untrusted content could use the DOM to delete or alter that meta tag... but it turns out all modern browsers specifically lock that down, treating those CSP rules as permanent as soon as that meta tag has loaded before any malicious code has the chance to subvert them.

I had Claude Code run some experiments to help demonstrate this a few weeks ago: https://github.com/simonw/research/tree/main/test-csp-iframe...

rafram 21 days ago | |

And any additional CSP directives can only narrow what's allowed. Also works with headers plus <meta> - <meta>s can restrict the CSP even more than what the headers specified, but they can't widen it.

amluto 20 days ago | |

An idea I’ve been kicking around (which isn’t quite applicable to this use case, I think) is to aggressively restrict the Sec-Fetch- headers on user content. If a server is willing to serve up an untrustworthy SVG, it could refuse to serve it at all unless Sec-Fetch-Dest has the correct value, and ‘document’ and ‘iframe’ would not be correct values. This would make it more difficult to fool a user or their browser by, for example, linking to an SVG file, or using a less-secure mechanism like embed to load it.

This should be in addition to heavily restricting CSP on user content. (Hmm, surely all images should be served with the CSP header set.)

bawolff 20 days ago | | |

You can bypass the sec-fetch headers via service workers i think.

A better approach here would be to just serve svg with Content-security-policy: script-src 'none'; sandbox

staticassertion 20 days ago | |

iframe sandboxing is wildly underleveraged. I think it's because it doesn't work well with "modern" app development - you need the ability to slice bits and pieces out yourself.

I've been just using plain typescript/html and it's so easy to say "yeah all of that rendered content goes into an iframe", I've got all of d3 entirely sandboxed away with a strict CSP and no origin.

I do hope that iframe sandboxing grows some new primitives. It's still quite hacky - null origins suck and I want a virtual/sandbox origin primitive as well as better messaging primitives.

simonw 20 days ago | | |

I think the reason it's under leveraged is that there's so little useful documentation about it - particularly about its support in different browsers.

For something like this that's security critical I'd really like to see each of the browser vendors publishing detailed, trustworthy documentation about their implementations.

The technology itself is very widely deployed due to banner ads, so it's at least thoroughly exercised.

philipallstar 20 days ago | |

Wow - I had no idea. That's really useful, and probably much easier to implement by javascripters than something that might be set in nginx.

som 20 days ago | |

I read this whole post silently mouthing a "CSP" mantra as each new vulnerability was discovered, years apart no less. Elated when I got to the revelation towards the end.

But for all my self righteous bluster the inline version was news to me. Hacker news. Awesome. Thank you.

tracker1 21 days ago | |

Nice, favorited... thinking this could be useful for an email reader to support css, but not scripts.

arcfour 20 days ago | | |

Most extant email readers support such a limited subset of CSS that nobody is likely to send emails with anything beyond very basic CSS though, unless your reader gains a ton of traction I suppose.

prettyblocks 19 days ago | |

You can also use JS to inject a CSP and it will enforce the strict most policy.

recursive 21 days ago | |

I did not know about `srcdoc`, but it looks like that's still vulnerable to injection by using a double quote and </iframe> to escape the sandbox. If this is constructed in a hygienic way using DOM manipulation, it seems like it could work, but it definitely seems possible to screw up.

rafram 21 days ago | | |

If you're constructing your unsandboxed parent document HTML using string concatenation, you might as well not use the sandboxed iframe at all. But presumably someone who bothers to sandbox untrusted content also knows about setAttribute(), or the srcdoc JS property.

simonw 20 days ago | | |

You can entity-encode the content in the srcdoc= attribute to robustly solve that problem, or populate it via the DOM.

tracker1 21 days ago | | |

s/"/"/g

andybak 21 days ago |

My first thought is "support a tiny subset of svg that probably still covers 90% of real-world use cases".

I do feel that's there's two distinct types of svg - "bunch of paths with fills" and "clever dangerous stuff" where most real SVGs are of the former type.

Fully expect this to be shot down by someone that's thought about this problem for longer than the 120 seconds I just spent. :)

nmilo 20 days ago |

I'm sorry because I love the scratch project but this has to be said: they found XSS in SVGs in a surface with attacker-controlled access to Node and their fix was sanitizing it using regex??? And this was discovered by a user on scratch?

Even worse, OP's latest post "Every version of Scratch is vulnerable to arbitrary code execution" just tells you how exactly to exploit something similar today in the current version with no mention of responsible disclosure except a plug to say, "hey, check out my project, this one doesn't have RCE!" This is so irresponsible it borders on malicious.

inkmuffin 20 days ago | |

That post mentions that I disclosed this to Scratch in February 2024. The POC in that post is functionally identical to a POC I provided them back then and in various subsequent communications.

evilpie 21 days ago |

The HTML Sanitizer API has a subset of SVG that is allowed by the default configuration. It won't help you with sanitizing CSS at all however, style is simply not allowed by default.

https://developer.mozilla.org/en-US/docs/Web/API/HTML_Saniti...

Grokify 20 days ago | |

Good reference, along with the article. I built a SVG sanitizer in Go and will look to these to make it more strict.

philo23 21 days ago |

It'd be nice if there was a sandbox attribute you could add to inline <svg> tags, like the <iframe sandbox> attribute that'd let you opt out of all the potentially "dynamic" stuff inside of an SVG like scripts and event handlers, or even just literally sandbox the entire thing from accessing the "parent" HTML page's context/cookies/etc just like an iframe.

I'm sure it'd just open up a whole other can of worms though... not to mention having to wait for browsers to actually support it.

The real solution here is definitely CSP + basic sanitisation though.

ikkun 21 days ago |

I do wish tinyVG or similar would take off, but I don't see that ever actually happening. the only thing I think it's missing is animation support, which is pretty niche but not as niche as <script> tags.

https://tinyvg.tech/

hajile 20 days ago | |

We need a secondary official SSVG (Secure SVG) spec so the changes can be guaranteed by browsers and other implementors.

This would allow an update to the xmlns to

    <svg xmlns="http://www.w3.org/2000/ssvg">

Which would allow the image to force SSVG mode and disable all non-approved features, but you could also update the image tag so the client could force security on potentially insecure SVGs

    <img type="ssvg" src="/insecure.svg">

spankalee 21 days ago |

This is, by the way, why Google Slides doesn't have SVG support even though there's a nearly 15 year old ticket requesting the feature.

esprehn 20 days ago | |

They totally could sanitize it, just like they did with the gadget proxy, they just choose not to fix this ancient request.

It's like my ticket to add data url support to sheets. It just gets punted year after year.

ssocolow 21 days ago | |

Workaround: https://simonsocolow.com/tech/uploading-svg-to-google-slides...

Springtime 21 days ago |

It seems the reason they're inlined in the page at all is to measure things briefly like bounding boxes (not sure the full extent as it didn't cover that), before subsequent removal. I'm not familiar with Scratch and its use of user-submitted SVGs but I'd be curious to read more about what they're doing that required it be inlined specifically.

(This isn't a comment on the challenges in proper sanitization fwiw, as I've needed to do various of the same things myself)

inkmuffin 20 days ago | |

They want to run getBBox [1] which requires the SVG to be in the DOM somewhere - otherwise it throws an error. They need to do this because SVGs tend to have very inaccurate viewboxes, especially when working with SVGs made in old versions of Scratch. getBBox is the easiest way to get a more accurate understanding of how big the stuff in the SVG is.

[1]: https://developer.mozilla.org/en-US/docs/Web/API/SVGGraphics...

djoldman 20 days ago |

Cloudflare went down this road a bit:

https://github.com/cloudflare/svg-hush

Devasta 20 days ago |

> In 2019, a few months after the initial release of Scratch 3, Scratch discovered that SVGs can contain <script> tags that Scratch would cause to be executed when the SVG loads. This is known as an XSS.

> Example from Scratch's test suite:

  <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
    "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
  <svg version="1.1" xmlns="http://www.w3.org/2000/svg">
    <circle cx="250" cy="250" r="50" fill="red" />
    <script type="text/javascript"><![CDATA[
        alert('from the svg!')
    ]]></script>
  </svg>

Is this really an issue? This is the method that the chrome teams polyfill to replace XSLT suggests you do. https://github.com/mfreed7/xslt_polyfill/tree/main#usage

inkmuffin 20 days ago | |

This was the example from their test suite. I didn't want to clone and build a 2019 copy of Scratch to test it end-to-end since the specifics weren't super important anyway.

codedokode 20 days ago |

I don't like that SVG uses things like CSS and JS and requires pulling in the whole browser to display. Instead of being a simple vector image format, it became just an extension of HTML. Maybe we need a new format, and if someone decides to do it, please add ability to embed fonts, wrap text and decent animations.

DarkUranium 20 days ago | |

Wrapping text is a bit tricky because of differences in text wrapping algorithms. Though I suppose an "easy" fix would be to be able to specify a very specific algorithm (to ensure equal representation across systems), or allowing custom (possibly better-quality) wrapping.

But for the most part, I 100% agree, and I've been considering making a format for my own use-cases. I think the biggest issue is in agreeing as to what subset is necessary; plus, of course, getting any level of adoption (though the latter isn't a factor for my own use ... except in the sense that there are no tools to help).

For example, do we need animations? Gradients? If so on the latter, what kind?

Theodores 20 days ago |

Maybe we need a dumbed down version 3 of SVG where the browser knows it is not to do anything that requires fetching a URL, to make the image as harmless as a JPG.

This version 3 could have the version number changed to 2 in order to do cool SVG things, so full-fat SVG as version 2 is now. But you could just flip to 2 to a 3 on upload, so any embedded URLs are harmless.

This could be useful for the creator too, as it is helpful to have layers of source images in bitmap format to work with, and you can easily export such things accidentally.

Hendrikto 20 days ago |

> Stacking more and more complexity into sanitization is clearly a doomed approach. We are more than 5 major revisions deep and yet there are still known holes. People are actively sharing projects on the Scratch website bypassing SVG sanitization. And the moment browsers decide to implement the latest CSS specs, even more holes will open up.

This kind of stuff will get way worse with LLMs. They like just stacking more and more code on top of workarounds.

etchalon 21 days ago |

I don't understand why it wasn't immediately understood that SVG is as dangerous as HTML.

It is not, and never was, an image format. It's a markup language.

nulltrace 20 days ago | |

Browsers already treat the same SVG differently depending on how you embed it. <img> strips scripts and external resource loads. <object> and inline don't. People test with img tags, looks fine, then someone switches the embed method and everything opens up.

OneDeuxTriSeiGo 20 days ago | | |

it'd be nice if there was a way to declare in the URL that a given SVG could only be treated as an image so that you could safely open SVG urls, etc without exposing yourself to the dangers of embed/inline.

recursive 20 days ago | |

A markup language can be an image format. The "G" is for "Graphics" after all.

bawolff 20 days ago |

These aren't really SVG specific issues. They are all pretty standard XSS that apply to html and are very well known vectors.

Like this post didn't even mention presentational attributes, like how cursor attribute can contain a url that gets loaded. Or any of the other tricky parts of svg sanitization, like using dtd to bypass things.

Liftyee 20 days ago |

I'm not familiar with the details of real software development, so I don't know why it's not possible to just "not give the SVG part of the code internet access" or "perform sanitization on post-decoding (url, hex, etc) data".

Is it because the SVG parser/renderer being used is an entire library, and it would be prohibitive to write your own SVG parser/renderer or insert your own code into the existing one?

drfloyd51 20 days ago | |

Some of the suggestions are kind-of exactly that. But they specify not a change to the default behavior, but a new behavior based on the presence of a new attribute.

You could change the default behavior to the “safer” behavior. And then add some sort of “danger mode” attribute. But… devs are usually hesitant to do something that would break legitimate code, such as changing the default behavior would do.

kevinmgranger 21 days ago |

> This was fixed by using a regular expression to remove script tags.

The infamous you can't parse (X)HTML with regex¹ meme is from 2009, yet this fix was done in 2019. I guess the SO answer never mentioned SVG.

1: https://stackoverflow.com/revisions/1732454/1

jancsika 21 days ago |

For the "<script>" stuff: regardless of how the thing is spelled or otherwise obscured, the HTML5 parser eventually knows when it's gotten hold of a script tag. Oops, we got one in a NOSCRIPTTAG context. Let's poop out.

Tag names, attributes, attribute values, event callback default-cancelers... so many ways to declare that this node and its children shouldn't parse/evaluate scripts.

As Jay-Z said: "I've got 99 solutions, fixing a problem ain't one"

esafak 21 days ago |

Is there a browser-friendly vector alternative?

simonw 20 days ago | |

SVG in an <img> tag can't execute scripts.

esprehn 20 days ago | | |

It also can't inherit css variables which is unfortunate since it means the image doesn't respect the theme.

Pxtl 20 days ago |

I keep saying: I know there are good reasons we got "html but for vector art" but I wanted "jpeg but for vector art". I don't generally have to worry about sanitizing jpegs.

NooneAtAll3 20 days ago |

wait... scratch is just a browser?

inkmuffin 20 days ago | |

Since 2019, Scratch is written to run in a standard web browser, replacing the older Flash runtime/editor. The desktop app uses Electron.

wingi 20 days ago |

thank you for this post.

SpyCoder77 21 days ago |

I did not expect to see GarboMuffin.