For example, for a beginner it will be frustrating to watch a video that uses many less frequently used words. If there are too many words marked purple in a sentence, learning becomes ineffective. Perhaps you could calculate some kind of adequacy score for each video and display that as the first (fake) subtitle of each video.
A useful scoring system could work something like this:
- add 3 points for every sentence that has no more than 1 orange word in it
- add 1 point for every sentence that has exactly 2 orange words in it
- add 1 point for every orange word that is repeated at least 2 times (in separate sentences)
- subtract 1 point for every purple word in a sentence
- subtract 0.75 points for every orange word that is not repeated at least once (in separate sentences)
- subtract 0.5 points for every orange word in a sentence with 3 or more orange words that did not previously appear as the only orange word of a sentence
- subtract 0.5 points for every 10 seconds without people speaking (we could skip such sequences but that would likely diminish comprehension)
- subtract 0.25 points for every sentence which has only green words
What do others think?