Word Highlighting / Suggested Words based on Word Frequency

David_Wilkinson · May 31, 2020, 4:04pm

From the Youtube thread:

I added the backend code for this feature over the last few days. We used to have this feature, it got taken out during a heavy refactor. So, the way it will work is you set your level in settings (1000, 2000, 3000 words etc.), and the program will color words above that level (that aren’t already saved) as… purple… for example. (We used to grey them out actually.) Maybe it could be a bit more smart, and would auto-adjust the level so that a certain percent of words are ‘white’ (‘suggested to learn’) for each video. Haven’t really though about it seriously yet, I’ll make a new thread for that discussion.

More info coming soon.

David_Wilkinson · May 31, 2020, 9:15pm

Here’s how we are thinking to show infrequent words (as purplish grey). Suggestions?

Also thinking about moving saved word highlighting into colored underlines, to make the text less ‘rainbow’.

snowman · May 31, 2020, 11:38pm

It sounds good to me what you suggested.

From those frequent words left in white, perhaps it would useful to give the option to mark them also according to their grammar function in the sentence (noun, verb, adjective, adverb, preposition). That can help getting you familiar with complex new sentence structures, and anticipating what each word would correspond with in the translation. I would do that with the underlines you suggested. You can also try to apply this to all the words in the sentence rather than just white words.

If I select the first 3,000, I would still like a special color to highlight the first 1,000 within those, as frequency there is the most significant by far (there is little difference in frequency above this). Maybe slightly bolding them would suffice (reducing colors).

Also, perhaps it would be better to color the frequent words in purple, instead of the infrequent ones, since those are the ones we’re interested in, anyway. When no purple words would remain you will know you completed phase 1 in your studies (doing it like this you’re also reducing the color purple with time, which is better, as to not overload the user with too many different colors).

Since white is the default color, if you highlight just the infrequent words, it is likely that your code would leave stranglers behind in white, that it missed to catch, a color which would suggest that they’re frequent, while they’re not. That is another reason why it would be better to highlight the focus group directly rather than the other way around.

It actually gives me great pleasure seeing it turn all into green as it’s now. I quite like it.
If it still bug you guys perhaps you can offer 2 display styles to choose from to satisfy everyone preferences.

Calvosaez · June 3, 2020, 12:52pm

Hi David, when I add a card to anki I use blue, so if a blue word appears I know I have it in my anki to study.
I know you are focus improving the tool for netflix and youtube, but if you can make a provisional code to highlight words in text, pdf, website I would pay you 5 years subscribe XDD No jockes.

In a perfect world I could add manually words to my database but just what you have but working with text is going to be a big help to study.

Sorry I think is the third time I ask you about this

David_Wilkinson · June 7, 2020, 12:22am

Hey, sorry for not replying. It’s something we’re planning soon, at least for websites. We had something working a couple of months ago, but, decided not to deploy it at that time… now it has to wait for the next full ‘webstore’ extension update (we update the youtube and netflix code more frequently by another method).

David_Wilkinson · June 7, 2020, 12:41am

From those frequent words left in white, perhaps it would useful to give the option to mark them also according to their grammar function in the sentence (noun, verb, adjective, adverb, preposition).

This is something we could do… just we are limited by ‘visual bandwidth’… color is used for saved words now… I wonder if that would have been better saved for something else. Maybe we could do, ‘x’ key to expand the subtitle over a bigger screen area and annotate it with extra data (including all single-word definitions and word frequency numbers).

I would do that with the underlines you suggested.

Ah, ok. Hm.

If I select the first 3,000, I would still like a special color to highlight the first 1,000 within those, as frequency there is the most significant by far (there is little difference in frequency above this). Maybe slightly bolding them would suffice (reducing colors).

Og suggested showing frequency as gradiated color… uh, my thought was that then you need to figure out how gradiated it is, then take a decision about whether you should learn then word… vs. being told ‘yes’ vs. ‘no’, which should involve less mental overhead. Maybe two levels is a nice compromise.

If it still bug you guys perhaps you can offer 2 display styles to choose from to satisfy everyone preferences.

Also considering.

Thanks for your thoughts.

snowman · June 13, 2020, 6:36pm

That overly key idea for extra annotation is pretty nice. Here is another thing you can do.
Check out how this guy is creating his flashcards:

Link to the video: https://youtu.be/quN1br34nIM?t=531

You can do the same automatic color correlation between the machine translation and the normal subtitle, with a pretty high accuracy, as it’s a direct translation (as opposed to human translation, which can differ quite a bit). This would be super helpful for beginners in a new language.

I think the best way to display this information, while avoiding your color overloading problem, would be by simply temporarily highlighting the word when you hover over it together with its corresponding part in the machine translation. This is a simple way to display the focus here, without overloading the user with unnecessary information.

Corresponding colored underlines, in both subtitles, would be another possibility to display this information. Then you can just match with your eyes.

You would then also be able to supercharge your Anki decks, if you export that ability also to there.

I haven’t checked the Anki export feature yet, but auto-generating screenshots, or images from google image to the decks, might also be a feature worthwhile to consider.

Topic		Replies	Views
What word list are you using for Spanish?	13	3363	August 2, 2020
Feature Request: how many time a word has been said. In English	2	856	June 7, 2022
Feature Request : Word frequency counter In English	0	459	February 11, 2022
How does the vocabulary highlighting feature work? Purple words	2	1197	September 13, 2021
New color suggestion for subtitles / reading Request	0	266	March 30, 2023

Word Highlighting / Suggested Words based on Word Frequency

Related topics