Scaling Static Analyses at Facebook(m-cacm.acm.org) |
Scaling Static Analyses at Facebook(m-cacm.acm.org) |
That's why I created tools to convert the output from different tools into a common CSV format that can be databased and used to compare output from different tools, or from different versions of the code (e.g., after fixing errors reported by the tools).
These tools currently work with cppcheck, clang and PVS-Studio and can be found here: http://btorpey.github.io/blog/categories/static-analysis/
Personally, I'm happier with plain old text files that can be manipulated with awk, grep, etc., can be databased if needed (since they're csv files) -- and can also be compared using my all-time favorite software, Beyond Compare. (http://btorpey.github.io/blog/2013/01/29/beyond-compare/).
"Overall, the error trace found by Infer has 61 steps, and the source of null, the call to X509 _ gmtime _ adj () goes five procedures deep and it eventually encounters a return of null at call-depth 4. "
I think the example Amazon gave for TLA+ was thirty-something steps. Most people's minds simply can't track 61 steps into software. Tests always have a coverage issue.
>. For the server-side, we have over 100-million lines of Hack code, which Zoncolan can process in less than 30 minutes. Additionally, we have 10s of millions of both mobile (Android and Objective C) code and backend C++ code
> All codebases see thousands of code modifications each day and our tools run on each code change. For Zoncolan, this can amount to analyzing one trillion lines of code (LOC) per day.
11 "missed bugs" on the 100 mm server-side lines of code per run, or ever?
I think this is where languages with stronger inbuilt analysis (e.g. Rust) win: The results are better, and since the analysis is always running as part of a compiler pass there are no huge jumps in indicated bugs at once (like what would happen if one would run Coverity on a legacy C++ codebase).
> We also use the traditional security programs to measure missed bugs (that is, the vulnerabilities for which there is a Zoncolan category), but the tool failed to report them. To date, we have had about 11 missed bugs, some of them caused by a bug in the tool or incomplete modeling.
A missed bug is presumably one that the tool is designed to spot, but which it didn't during the period in which it has been running.
Hopefully Software Heritage (https://www.softwareheritage.org) will help with that.
Edit: It worked again right after I posted this comment.
> diff time [ie in the standard code-review workflow] deployment saw a 70% fix rate, where a more traditional "offline" or "batch" deployment (where bug lists are presented to engineers, outside their workflow) saw a 0% fix rate
That's the difference between "static analysis presented as part of the workflow a developer goes through anyway" and "static analysis presented after the fact". If you're in a position to enforce a code-review workflow that tools can hook into then "at code review time" works, but "at compile time" is better still since it shortens the feedback loop and ensures that everybody sees the issues while they're thinking about the code, even for smaller situations with more ad-hoc or nonexistent code review setups.
I mean, ultimately we agree. Most people don't trust static analysis tools because they have had bad experiences with them. I just suspect most people should try them again. The state of the art is quite good in that space.
But the balance of deep analysis and low false positives remains elusive. I'd be really stunned if FB really achieved a breakthrough in this area.
I do want to be wrong about this.
https://runtimeverification.com/match/1.0-SNAPSHOT/docs/benc...
I bring them up because they made the open-source K Framework and a C semantics. Another commenter says PVS-Studio is pretty good. Since Synopsis owns Coverity now, I'd recommend RV-Match (little to no false positives) followed by PVS-Studio.