Reverse Engineering TikTok's VM Obfuscation (Part 2)

Reverse Engineering TikTok's VM Obfuscation (Part 2)(ibiyemiabiodun.com)

245 points by laptou 3 years ago | 103 comments

nullpt_rs 3 years ago |

Nice work. I was going to basically cover the same topics in my second part but it looks like you beat me to it. If you'd like to collaborate with me on the next portion feel free to contact me on Discord (veritas#0001) or email (f@nullpt.rs)

cute_boi 3 years ago | |

Hmm, can you clarify how did you manage to send request in your original article? I copied as curl from network tab and tried to sent the request from curl. However, there was no response. Are they using fingerprints or cookies shenanigans?

kayson 3 years ago |

What's the point of such heavy obfuscation? Are they afraid of someone cloning their frontend? At first glance it seems like a waste of performance...

renonce 3 years ago | |

I think they are just reusing technologies commonly used in China on TikTok website so they don't write it twice. In China many frontend developers write code for WeChat mini program. WeChat mini program is just a typical HTML website with extra APIs and the restriction that code has to be uploaded for review before they are deployed, and `eval` (and anything else that allows executing string as script directly) is disallowed in order to prevent sneaking code past review. Many developers developed their own JavaScript VM and interpreter to bypass that restriction, and that same code happens to be reused on all HTML based platforms even outside WeChat.

dna_polymerase 3 years ago | | |

So they won't let you do eval but don't mind a whole VM? This doesn't sound right, unless eval touches something deeper and might pose a security risk, while the VM runs inside the normal sandbox. Can you elaborate a bit, or hint at further resources? It sounds fascinating.

mach1ne 3 years ago | |

They do heavy data collecting which is a PR hit if it's too obvious. Obfuscation could be a general policy driven by this.

thaumasiotes 3 years ago | | |

Wouldn't you detect data collection by watching what gets sent, not by debugging the app?

est 3 years ago | |

You'd be surprised to see China's underground content farms. They went great length to automate content generation by decrypt & exploit App's private APIs

vladvasiliu 3 years ago | | |

By "app's private apis" do you mean "apps other than tiktok running on the same device"?

Isn't that supposed to be prevented by the os via the permissions thing?

TobyTheDog123 3 years ago | |

Most likely bot prevention & data security.

There's not a huge performance hit, there's a portion that runs on page load and a smaller portion that runs on each HTTP request (which isn't too often).

jeroenhd 3 years ago | | |

Bots just run copies of Chrome these days, you're not protecting anyone with this obfuscation if the bots you're targeting can just run your code and automate the UI.

With the obfuscated fingerprinting demonstrated by the virtualized code, I reckon they're doing something more malicious.

javier2 3 years ago | | |

At first I thought this looked kinda scary, but it makes more sense as bot prevention, and obfuscating as much as possible to hinder manipulation of data. Consider services that manipulate view stats or followers etc for payment. This ruins both tiktok, instagram and reddit revenue models.

lwansbrough 3 years ago | | |

In the first post about this, it was revealed they're exfiltrating detailed metrics covertly by piggybacking on regular requests.

thrdbndndn 3 years ago | |

Lots of web services obfuscates their front-end scripts by default, not really that uncommon.

madeofpalk 3 years ago | | |

There's JS minification - renaming variables to single letters, eliminating dead code paths, minimising whitespace, etc - that's exceptionally common and will be the default for most frontend workflows.

That is not really 'Obfuscation', at least to the degree that TikTok is doing.

ZephyrBlu 3 years ago | | |

That's obfuscating the code. What TikTok is doing is obfuscating the execution which is very uncommon.

2OEH8eoCRo0 3 years ago | |

Maybe they're using stolen IP, copyrighted code, violating licenses, etc?

cactusplant7374 3 years ago |

At the business / management level, who decides to green light a project like this and what are their reasons? Is it regulatory obscurity or denying the competition easy understanding of the code?

kerneloops 3 years ago |

> Using these accessors, the VMs become able to do anything that JS can do.

In fact the source language is likely to be JS itself; a JS-to-some-sort-of-vm-bytecode-to-JS compiler is made. I know that Tencent has a similar VM; an interesting aspect of that VM is that the instruction set is dependent on the code being compiled (and the opcodes are dynamically generated and shuffled when compiling), so unused instructions are not generated.

TobyTheDog123 3 years ago |

I, too, was disappointed that this was not a continuation of https://www.nullpt.rs/reverse-engineering-tiktok-vm-1, but as someone who could really use a way to interface with TikTok's API (for legitimate non-bot reasons that allow users to interface with TikTok differently), I'm all for more eyes on this problem.

jeroenhd 3 years ago |

Interesting! I wonder, would this virtual machine be hand crafted or auto generated? When I look at obfuscated code like this, I always wonder if the code authors couldn't run their generator a second time and come up with a completely new format that existing reverse engineering efforts wouldn't work with?

I don't know what they're trying to obfuscate but it must be worth hiding to allow such inefficient javascript to run on clients around the world. I can't think of any non malicious reason to develop such a system for a website about silly videos.

jonatron 3 years ago |

dynamic string extraction: https://gist.github.com/jonatron/f7ec44e7ffd41c4dd50d51b3451...

sylware 3 years ago |

And then we can use a small software to view tiktok videos, without the need of one of the vanguard/blackrock financed, absurdely and grotesquely massive and complex, web engines (blink/geeko/webkit).

ralphc 3 years ago |

Can someone explain why TikTok needs a virtual machine? What are they doing with it that can't be done with a normal web app?

KiwiJohnno 3 years ago | |

They are doing it to obfuscate the capabilities and usage of their app.

Yes, this should set of big warning bells.

The_Link 3 years ago |

As a thought exercise Botter: attempts to spin up phone vm for botting Okay tiktok start up TikTok: What hardware am I on? Botter: insert common phone hardware here TikTok: Okay, than this should work hyper firmware/bytecode specific virtualization commands and syscalls (segfault)

tinus_hn 3 years ago | |

Apps can’t run unsigned code on iOS, everything has to be in the bundle submitted for review. That’s why you can’t have a third party JIT compiler.

ghostly_s 3 years ago |

I skimmed the original post but still don't really grasp how these VMs are deployed. I assume they are running server-side (client-side would be a violation of App Store policies at least, right)? Is a dedicated VM spun up for each user session? Or just a sandbox in which various services run?

kaba0 3 years ago | |

    while (hasData) {
      switch (code) {
        case INSTR1 -> …,
        case INSTR2 -> …
        …
      }
    }

yayr 3 years ago |

is there any open source reference implementation for such a vm that is dynamically generated based on JS/TS?

umasi 3 years ago |

As others have already touched on, it’s rather disingenuous to pretend that you are “picking up where he left off” after copying his title and labeling your article as part 2, no? Especially considering the first author made it very clear he intended to publish further work on the topic. Further, this article reads a lot like an attempt to beat him to the punch and to hijack his series. But correct me if I’m wrong?

laptou 3 years ago | |

You're right that copying their title exactly was probably not the right thing to do, and may cause confusion. I'm not trying to hijack anything, I think there's enough room for both of us to research this at the same time.

umasi 3 years ago | | |

Thanks for your response, seems reasonable to me. That being said, this post is at best impolite, and at worse intentionally "scooping" another reverse engineer. Seems like you guys are chatting now though, so I'll leave you to figure it out.

captainmuon 3 years ago |

It's sad how much human effort is spend in creating obfuscation schemes, and then in analyzing them. Or creating proprietary things and then reverse engineering them.

Sometimes I wonder if we could just make everything open. No obfuscation, no captchas, just a neat API for everything. Of course that wouldn't work ceteris paribus, everything else unchanged, due to bad actors, spam, or just competitors who want to take your work. But if you'd change the incentives - make society non-adversarial, non-profit oriented - then all that gating and obfuscation would become unneccessary.

CGamesPlay 3 years ago |

Note: this is not a continuation of the work by the same author. This is a new author who took the original and did further research. I think it's a bit disingenuous to call this by the same name, "part 2".

jeroenhd 3 years ago | |

I think it's actually more truthful to admit that this is the continuation of an earlier post than it is to say that this is new work. This is just the continuation of the process the first post documented.

The relation between this article and the one it's based upon is clearly indicated in the first paragraph. I don't think this is disingenuous at all.

vore 3 years ago | |

If Brandon Sanderson is allowed to finish The Wheel of Time, this author is allowed to call it part 2 ;-)

CGamesPlay 3 years ago | | |

Yeah, but that was 2 years after the original author died. This is a month after the original publication.

I just feel it would have been more tasteful to choose a title different than the one that the original author is obviously going to use for their next blog post.

mrsaint 3 years ago |

How could Apple properly review something like this? Isn't it one of Apple's selling pitches that they'd review each app for malicious activity before it makes it to the app store?

valleyer 3 years ago | |

So, a tricky piece here is that this appears to be behavior of the TikTok web site. Obviously Apple makes no attempt (nor claim) to review the behavior of every web site accessible in Safari from an iPhone. And other native apps can embed WebKit-based web views into their apps.

The good news is that the scope of "malicious activity" is (at least in theory) much smaller when you constrain it to what web sites can do, as opposed to the scope of what can be done by executing ARM instructions and making syscalls.

The bad news is that the scope of "things web sites can do" keeps growing and is fingerprintable.

emsy 3 years ago | | |

Apple has previously banned Apps for their backend content if they didn't like it. It's just that TikTok is too big and Apple is full of shit.

angulardragon03 3 years ago | |

> the code that is deployed on TikTok's _website_

This isn't regarding the app at all, which is likely not as heavily obfuscated as this (mostly because you can't just "view source" on an app).

Mindwipe 3 years ago | |

> How could Apple properly review something like this? Isn't it one of Apple's selling pitches that they'd review each app for malicious activity before it makes it to the app store?

They couldn't. Apple does not perform any meaningful review of apps for malicious activity, do they do it for rent seeking.

perttir 3 years ago | |

I used to develop Apache Cordova application that had strong obfuscation using javascript-obfuscator. Apple didn't care.

pjmlp 3 years ago | |

They can't and most likely would kick the app out of the store, hence why this is the Website code.

slimebot 3 years ago |

Yeah I'd also like to understand why they are doing this.

Everyone expects these sites to scrape as much personal information as possible (China did not invent that, they are following), but beyond that any additional imagined state-ran initiative would be server side, right? What is worth hiding in the front end beyond preventing people re-using their code? (which would be overkill to use a VM for, as light obfuscation would be enough)

ActionHank 3 years ago |

Instead of writing a custom decompiler it might be a bit quicker to pose as a question to chat gpt

steponlego 3 years ago |

This is pretty cool. Imagine what Tik Tok could do from a kid's phone when their parent is working on their laptop, on the LAN.

lxgr 3 years ago | |

What exactly could it do using a VM that it can’t already do using native JavaScript (or anything that compiles to it) or WASM?

Note that the parent article is about the website, not mobile app.

snazz 3 years ago | |

iOS requires user permission for each app to make local network requests (Settings -> Privacy & Security). So in theory this shouldn’t be possible, unless ByteDance came up with a way to bypass it that made it through Apple’s review process.

micromacrofoot 3 years ago | | |

I imagine TikTok could get pretty far simply asking users for permission. A significant number of users would probably allow it. Fortunately watchdogs would probably get it into the news pretty quickly.

steponlego 3 years ago | | |

You could log into any OS X machine by just pressing enter at the login screen enough times, for over a decade. Apple has trash security and they've stopped even saying what their security updates fix.

beepbooptheory 3 years ago | |

Tell us, what could it do?

steponlego 3 years ago | | |

Open a reverse proxy and scan your LAN direct from Beijing for one.