I think they just look at the traffic shape. Real-time traffic requires a lot of constant similar sized traffic per second, rather than texting or images which is much more bursty.
My guess is that it is because of a different layer 7 protocol than HTTP is used for calls, and the code of the protocol leaks in the TLS/SSL header. Is this guess correct?