I suppose a few heuristics and some sort of recognition of state machines would suffice to be able to test 80% of all sites.
- "We have these wireframes of this new major feature, let's get a group of people from our target demographics in a room and have them react to it and give us early feedback"
- "We think we have an idea for a new product for managing widgets, so we want to talk to professional widget managers about how they currently manage their widgets; and test our hypothesis that this is a better way of doing it"
As the previous poster pointed out, this is something that isn't easily or readily generalisable into AI.