The idea would be to make an educated guess at where each word occurs in the video - going off the time and subtitle data from pysrt - and build a dict linking words to when they occur in the video. You could then use MoviePy and stitch together a video version of the generated dialogue, by looking up the appropriate clip for each word.
7
00:00:23,060 --> 00:00:24,619
give a turnaround version
)
but i am not sure the best way to go about doing something like this.