Hello I had an idea that might allow for automatic video and audio capture. I would try it myself but as far as I can tell this project isn’t open source.
So basically, users downloads an external application that the extension talks to. Then the user presses a button to export the card, then the extension tells the external app to take a screenshot. After the external app tells the extension that it’s done with the screenshot, the video player jumps to the beginning of the subtitle line, tells the external app to start recording and the video resumes. Then once it hits the end of the subtitle the extension tells the external app to stop recording and sends back the audio(or video) and image as base64 string in the JSON response. Then with the base64 string the extension can upload the card to Anki through AnkiConnect.
You could probably also get the external app to only record the video player portion of the window by combining window.screenX, window.innerWidth with videoPlayerElement.getBoundingClientRect.
If for some reason those are unreliable you could also maybe just pass the window title to FFmpeg