A hands-on introduction to static code analysis(deepsource.io) |
A hands-on introduction to static code analysis(deepsource.io) |
AST level analysis is certainly useful. Everybody should be using some sort of style checker. But AST pattern matching is a completely different technique from the stuff used to do bugfinding that I worry that these blog posts will give the wrong impression about what static analysis can do and what it can't do.
I'd love to see blog posts about interprocedural pointer analysis, for example.
Inter-procedural pointer analysis -- Yes, a lot more trickier than these, but definitely more juicier. Will try to write a post on it in the coming weeks.
If you want to go deeper, Principles of Program Analysis is a popular reference: Principles of Program Analysis https://www.amazon.com/dp/3540654100/
https://cacm.acm.org/magazines/2010/2/69354-a-few-billion-li...
Also worth checking out is BAP, the Binary Analysis Platform, which is the successor project to Bit Blaze, and is one of the most fascinating binary analysis frameworks out there for my money. It was the only one of the darpa CGC entries that ran on real binaries, not the much less complicated ones developed specifically for the challenge.
This way, analyzing the code is a simple "button press" and works out of the box on every Xcode project.
Soon after, Microsoft followed suit in Visual Studio (even though in my experience, the MS analyzer doesn't catch quite as many things as the clang analyzer).
Before that, static analyzers were those no doubt useful but obscure "magic tools" which were very hard to integrate into an existing build process.
Even the most useful tool will be ignored when it is hard to use.
I have a question: how difficult is it to implement the ast? It seems like that the bulk of the work for this static code analysis.
SSA is also not even universal among IRs for static analysis at this point. Heap-SSA is growing in popularity for complex dataflow problems involving fields.
This is useful because it reduces many program analysis design questions to questions of which lattice to use. It also allows you to compare algorithms by comparing their lattices, which makes it easier to see how algorithms are related.
The cost is that this approach will be pretty alien if you don't have experience with abstract algebra or related fields. If you do have that experience, I don't think it requires mathematical maturity beyond an undergraduate level.