How extensive is the database? Just free themes/plugins from one source, or popular themes from several sources? Paid ones too?
Any namespace clashes where you have to dig deeper to tell which theme or plugin it is?
Were you able to fully automate the creation and updating of the signature database?
WordPress have a predictable path structure and we use that to extract theme and plugin slugs (textual ids). For some plugins that don't import JS or StyleSheets we look for other signatures.
Once we have the slugs, we do a lookup in the official WordPress theme/plugin repository and get all the info we need (plugin descriptions, icon, author etc)
For example:
http://wpdetective.io/wordpress.org
That trips it up a bit with false positives, finding plugins that aren't plugins.
http://www.wpthemedetector.com/
All these websites scan and output similar unknown plugins, as it sharing the same database or same method to detect plugins and themes.
this site is very much wordpress: https://www.berghs.se/ but wpdetective won't detect wordpress at all. it has a custom WP_CONTENT_DIR.
here's a similar but less polished version my team created some years ago: http://wppluginchecker.earthpeople.se/?wordpress-site=https%...
it tries a few common variables for WP_CONTENT_DIR, and runs completely in the browser, should you want to take a peek on how we detect WP.
It has been fixed now: http://wpdetective.io/www.berghs.se
edit: apparently custom code
I'd be interested in the maintenance strategies you have in place (if any).
I assume that for plugins who don't output styles or scripts you use other methods, maybe some HTML output etc, so you've probably hard coded a lot of stuff for some popular plugins.
How have you set your tests and how do you plan on knowing when a certain plugin stops emitting the signature you're checking for? Most probably an E2E test with a local theme containing everything, care to share tech specifics ok this part?
There are some very popular plugins (Yoast SEO, Jetpack, W3 Total Cache) that don't import additional files. For these we have hardcoded patterns (under a 100). We do not have anything in place for checking if these patterns break.
We could automate creating a WordPress installation, installing the plugin we want to check, trigger a wp detective scan and then checking the results. But I am note sure it is worth the engineering effort.
You assume correct, that is one of the methods we are using.
We also look for signatures in the code that certain plugins output. For example Yoast SEO usually adds a HTML comment at the end of the page identifying itself.
Still great.
We collect this information from the WordPress plugin repository but it only contains free plugins. Paid plugins are not listed in a central place where we query their metadata.
"we do a lookup in the official WordPress theme/plugin repository"
So many popular themes and plugins, especially paid ones, won't show up...because they aren't in that repo.
I would guess you'd have better luck parsing the html and extracting the href attributes of any <link> tags, src attributes of <script> tags, etc. Then pattern matching only against that.