Youtube Extension - Now Online!

UPDATE: (September 2021) Now LLY is not beta anymore, and was merged with LLN. The new name of the extension is Language Reactor (published by Dioco)

This thread is about LLY (Language Learning with Yotube), a sister extension to LLN.

It’s a beta (test) version, so you are all guinea pigs (test users).
image
(that’s you)

Youtube has videos on any topic, in practically any language.
Russian + owls? Ok: Снег дали! Сова в бешенстве от того, что ей не дают под ним сидеть - YouTube
German + diy? Ok: Magnet bauen | sowas wie ne Anleitung | Kliemannsland - YouTube
Cartoons in Hungarian? Ok: Megrázó halálok a rajzfilmekben | A gyerekkor vége - YouTube
French travel documentaries? Ok: SEB EN PAPOUASIE (documentaire) - YouTube

The thing is, Youtube tries to hide foreign content from you. Here’s what you do. We made these extra buttons near the search bar on Youtube. The one one the left makes the search only show results with ‘proper’ captions (not auto-captions or no captions), and the one on the right translates the search text to the language you select.

Here, we translated to French:

And here are the results, mostly in French:

Some searches will be a bit language-ambiguous still (like, if you search for ‘Brazil’). In that case, you can change your Youtube UI language to the language you are studying. Instructions here: http://languagelearningwithnetflix.com/youtube_language_instructions.html

When you open a video, it should be a mostly familiar interface, it’s like LLN.

Some notes:

  • Maybe half the videos on Youtube have no captions at all (no ‘proper’ and no ‘autocaptions’). LLY can’t do much with these videos.
  • A̶u̶t̶o̶c̶a̶p̶t̶i̶o̶n̶s̶ ̶w̶o̶r̶k̶ ̶O̶K̶,̶ ̶b̶u̶t̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶t̶h̶e̶y̶ ̶a̶r̶e̶ ̶b̶r̶o̶k̶e̶n̶ ̶i̶n̶t̶o̶ ̶s̶m̶a̶l̶l̶e̶r̶ ̶p̶i̶e̶c̶e̶s̶ ̶t̶h̶a̶t̶ ̶r̶e̶g̶u̶l̶a̶r̶ ̶c̶a̶p̶t̶i̶o̶n̶s̶.̶ ̶I̶’̶l̶l̶ ̶w̶o̶r̶k̶ ̶o̶n̶ ̶i̶m̶p̶r̶o̶v̶i̶n̶g̶ ̶t̶h̶e̶ ̶s̶i̶t̶u̶a̶t̶i̶o̶n̶ ̶s̶o̶o̶n̶.̶
  • T̶h̶e̶ ̶d̶i̶c̶t̶i̶o̶n̶a̶r̶y̶ ̶a̶n̶d̶ ̶s̶a̶v̶i̶n̶g̶ ̶w̶o̶r̶d̶s̶ ̶d̶o̶n̶e̶s̶’̶t̶ ̶w̶o̶r̶k̶ ̶f̶o̶r̶ ̶a̶ ̶f̶e̶w̶ ̶l̶a̶n̶g̶u̶a̶g̶e̶s̶ ̶c̶u̶r̶r̶e̶n̶t̶l̶y̶,̶ ̶I̶’̶l̶l̶ ̶f̶i̶x̶ ̶t̶h̶i̶s̶ ̶s̶o̶o̶n̶.̶ Also you can’t save entire captions currently.
  • T̶h̶e̶r̶e̶’̶s̶ ̶n̶o̶ ̶A̶u̶t̶o̶p̶a̶u̶s̶e̶ ̶t̶o̶g̶g̶l̶e̶.̶ ̶O̶g̶n̶j̶e̶n̶ ̶t̶o̶o̶k̶ ̶i̶t̶ ̶o̶u̶t̶ ̶(̶?̶)̶.̶ ̶Y̶o̶u̶ ̶h̶a̶v̶e̶ ̶t̶o̶ ̶u̶s̶e̶ ̶t̶h̶e̶ ̶’̶Q̶’̶ ̶k̶e̶y̶ ̶f̶o̶r̶ ̶n̶o̶w̶.̶
  • T̶h̶e̶ ̶d̶i̶c̶t̶i̶o̶n̶a̶r̶y̶ ̶i̶s̶ ̶v̶e̶r̶y̶ ̶s̶l̶o̶w̶ ̶a̶t̶ ̶t̶h̶e̶ ̶m̶o̶m̶e̶n̶t̶.̶ ̶W̶i̶l̶l̶ ̶b̶e̶ ̶f̶i̶x̶e̶d̶ ̶s̶o̶o̶n̶ ̶a̶n̶d̶ ̶f̶a̶s̶t̶e̶r̶ ̶t̶h̶a̶n̶ ̶e̶v̶e̶r̶.̶
  • Hit the ‘t’ key when watching a video to go to ‘theatre mode’. It’s prettier.
  • L̶o̶a̶d̶i̶n̶g̶ ̶s̶u̶b̶t̶i̶t̶l̶e̶s̶ ̶i̶s̶ ̶f̶a̶i̶l̶i̶n̶g̶ ̶s̶o̶m̶e̶t̶i̶m̶e̶s̶.̶
  • S̶u̶b̶t̶i̶t̶l̶e̶s̶ ̶t̶a̶k̶e̶ ̶5̶s̶ ̶o̶r̶ ̶s̶o̶ ̶t̶o̶ ̶l̶o̶a̶d̶,̶ ̶a̶n̶d̶ ̶t̶h̶e̶r̶e̶’̶s̶ ̶n̶o̶ ̶i̶n̶d̶i̶c̶a̶t̶i̶o̶n̶ ̶o̶f̶ ̶l̶o̶a̶d̶i̶n̶g̶.̶

UPDATE: The extension is now available on the Chrome Webstore.

UPDATE: (March 1st) It’s my birthday! Another day coding!
UPDATE: (March 5th) New version (v1.0.0) uploaded to webstore.
UPDATE: (September 2021) Now LLY is not beta anymore, and was merged with LLN. The new name of the extension is Language Reactor (published by Dioco)

8 Likes

That’s amazing! Do you know when you’ll be able to release it?

We’ll put a BETA version online, hopefully in the next few days. It’ll be a seperate extension (for now at least). There will be some rough edges at the beginning. Currently I’m finishing setting up our servers to handle yt support, and need to do a bit more testing of the ‘front-end’ to make sure subs are loaded correctly.

So, standby for install link! I think you can subscribe to this thread to get an update when there’s a new post. We’ll need your feedback to help us improve it.

2 Likes

Now available in webstore, see first link. We’ll be actively working on improving it in the next days.

1 Like

Thank you so much!!!

3 Likes

I’m excited for the possibilities of this, but a little frustrated at the tools for finding videos currently.

The ‘CC’ button in the search bar doesn’t seem to work - I get the same videos in results either way. If there was a working tool that ensured that a search only returned results with subtitles in both the target language and the learner’s first language, that would be a huge huge help.

Please let us know when you’ve added Mandarin videos to the LLY Catalogue. I think Mandarin probably has an argument for being a language that should be prioritised as one of the first to be given the same curated treatment that English currently has on there.

Let me explain - Virtually every Chinese learning video I can find on YouTube seems to have English and Chinese open captions on screen together already, so no close captions for either (must be annoying for people learning from a different first language!) And just about every Chinese TV show on there has Chinese open captions already on screen and only translation languages as closed captions (if you’re lucky).

I’ve been scouring YouTube for suitable Chinese content that has closed captions in both Mandarin and English, but it seems that, because of the huge number of Chinese dialects, it’s completely standard for Chinese videos and TV shows to have open captions burned into the video, and as a result closed captions are redundant, so very rarely provided.

On the plus side, there are a few Chinese TV shows with Mandarin open captions and English closed captions that I was previously finding frustrating to watch because English overlapped the Chinese and I couldn’t quickly skip back to play the last caption again. In these cases LLY does at least move the subtitles under the video (if the side bar is open) and allow me to move back and forth with arrow keys. So that’s an improvement over before.

Ah, it seems that if you press Enter to submit the search then the ‘CC’ button has no effect but if you click the magnifying glass search button to submit the form, then it does produce results that have subtitles. However, I then have the same problem as always, finding videos that have both Simplified Mandarin and English subtitles.

Is there a search tool where you can specify two subtitle languages and have it only return results where both languages are provided? That would be an absolutely killer feature.

Incidentally, I’m not currently managing to get any videos to show the translation (English) subtitles under the selected subtitle language it just leaves a gap at the bottom. It will at least show the machine translation if I turn that on. For example, I’ve tried every subtitle language on this video and none produced the English subtitles at the bottom, and the same if I changed the translation language to another on the list: https://www.youtube.com/watch?v=QY0AMmLuiqk

Also even though I’m on YouTube and it says it’s LLY, the settings menu is showing the message about how to fix Netflix error M7063-1013.

Oh and the extension doesn’t seem to know what to do with subtitles called ‘Chinese (China)’ and ‘Chinese (Taiwan)’ rather than ‘Chinese (Simplified)’ and ‘Chinese (Traditional)’, even though they display fine with LLY turned off, no subtitles load at all when it’s on - for example, see https://www.youtube.com/watch?v=I5xRQiZ_7u4

Edit: No, that can’t be the issue with that video - the Langfocus video works fine and that also has subtitle tracks called ‘Chinese (China)’ and ‘Chinese (Taiwan)’. There must be some other problem with videos from this other channel.

Thanks, looks like you found an issue with the search bar, doesn’t work with enter key. We’ll do something with that.

I tried a few videos with Chinese, didn’t see any immeadiate problem:


(Steven Universe in Chinese, nice.)

The current webstore version will just fail to retrieve the subtitles sometimes. In that case you can just reload the page (F5), and things might work the second time. I’ve made some fixes in the last few days that may well have eliminated that issue, will have to test a bit more. We’re also adding a mechanism that will allow us to push updates and fixes faster… minutes instead of days.

One thing to note is that if you can find a video with any kind of Chinese subtitles (‘autocaptions/automatic speech recognition’ or regualar closed captions), the extension should be able to provide a machine translation at least… that should almost always be available. There’s no need to find videos with captions in two languages. Actually, because it’s not very common on YT to have multiple ‘human’ translation tracks available, LLY currently doesn’t support showing them as the secondary, lower, translation track, only machine translation.

The suggested search method should work quite OK:

  1. Type something in English into the search bar (or Chinese directly).
  2. Use the translate button to translate that text to Chinese. I put, ‘driving a ferrari’ => ‘驾驶法拉利’ . Select the CC button, and click the search button
  3. Open any video, you should have Chinese primary subtitles, and if enabled in the settings, and English/German/French etc. translation.

All YT videos are in theory supported (as long as they have some kind of captions). You’re not limited to a catalogue. Later, if we accumulate some data about youtube videos, we can make a more powerful search methods that could be more convenient.

We’ll put a new update online shortly (1 week?) that should be more stable than the current code.

Thanks for feedback, it really helps!

btw Pinyin is coming for Chinese, for LLN and LLY, as soon as LLY is mostly stable.

1 Like

Heads up, new version (v1.0.0) has been submitted to Chrome Webstore. Much more solid all round, and with a brand new dictionary. Should be available in 1-3 days, after it clears review. Enjoy!

1 Like

Is it still in review? I still have v0.1.0, not v1.0.0.

It was rejected twice, we are removing some more permissions and trying again. :-/

Is is still being rejected?

Hey. Sorry guys, I got pretty distracted and started a ventilator project (http://openvent.org/). But, I am back now, working on LLN/LLY.

Good news, after another few tries, LLY is now updated, and working very nice. :slight_smile:

2 Likes

Please take a look at Arabic. LLY doesn’t work properly with this language.
It works ok with Hebrew but not Arabic.

Also, when 2 subtitles are present, LLY doesn’t select the English translation to display alongside the transcription, as it does with Netflix.

Please also allow to drag and change the placement of the subtitles. Often there is embedded translation in the video and LLY blocks the view (especially that is the case when you watch content intended for language learners) .

Some simple stats and percentages regarding white and green words would also be very handy. That would be very motivating to visualize progress in that fashion, and to judge in advance how much do you know from a given new video.

When hovring over the transcript, the video is paused, and it should remained paused, when subsequently you hover over to the pop-up definition box of a certain word.

Thank you for the awesome extension. Keep it up.

Hi. Do you have a link for the video that doesn’t work? I tried an Arabic video, this one seems ok:



There’s no technical reason why Arabic shouldn’t work, but, there’s a lot that can go wrong when you are bolting on stuff to somebody else’s website. Huh. This extension isn’t bad actually… I can study Arabic finally without having to try to read news articles.

You’re right, we only use the machine translations, not ‘human translations’… this is a todo, the code is still there and can be hooked up, but I didn’t think it’s a high priority, most yt videos only have subtitles in the original language, if at all… depends what part of Youtube you are watching of course.

drag and change the placement - yeah, Og mentioned doing that, I think it’s quick. We made the subs a bit transparent a couple of days ago. I would like to bring back the “bottom panel” option (that’s what we call it) from the Netflix version to Youtube, when in fullscreen, so subs are not over the video at all.

stats - probably we’ll look at this after we have word frequency stuff working again.

hovering over the transcript - yes, I think you are right. Pause on hover was hacked on quickly, the logic needs improving.

Thanks for helpful suggestions, I’ll get Og to look these over too.

Sesame street, that’s where it’s at: https://www.youtube.com/watch?v=STl5nE_IqCg

There is a lot of good stuff on Youtube… too much even. Sometimes you have to dig though. We were thinking to make some kind of catalogue where users can recommned videos by pressing a button while watching the video… or… something similar.

1 Like

For the same exact timestamp you posted I get for Arabic “translation not found” for every word in the video. Only some example sentences are found for some of the words. Never, or practically never (not sure), any definition is displayed, however. Color marking also doesn’t work at all (only for the first word in the sentence). Extension version 1.0.1. This has been confirmed by another user as well that I know. Same thing applies to LLN. So, basically, it’s completely broken from my POV for that language.

True, but I have been watching ton of youtube way before that extension emerged, so, nevertheless, i have been able to collect a lot of such content in many languages. If you search for it, you would find it. I have been using the extension “DualSub” for youtube to offer me this functionality thus far.

Ted talks are one example where there are multiple subtitles in may different languages.

A simple counter for each colored word would be a great start. You already have all that information. No need to implement some special frequency logic yet. If you download the JSON and search “green” or “blue” that is the only info you need at the moment. Also the percentage of white in any given text can be easily calculated and displayed to the user.
Otherwise, once you mark a few thousand words, you start to doubt that the database size is even increasing anymore, since it kinda gives you the impression that it “looks the same” each and every time. I would even go as far to say that would be a pretty critical feature for long term user engagement. One of the main indicators motivating you to push forward.

Other suggestions:

  1. There should be an “ignore list/color” for names of people and literal numbers, for example, and other such useless information. There should be a way to distinguish them separately from other white words and without contaminating your database by saving them.

  2. You currently offer google translation of an entire sentence, but often it’s the case when you want to translate only part of the sentence to increase your comprehension of a difficult portion of it. Goolge translation changes depending on the portion you select, so this comparison ability is very helpful and something that i use constantly. I currently use the sidebar from your extension combined with the “google translate” extension in order to achieve that.

a) You should be able to do all of that only from within your extension

b) You should allow text from the subtitle on the main screen to be selectable, so you can do it there as well (currently you can only click on words there)

c) Reverso context has a pretty awesome phrases detection which is much superior to google translate in most cases. That is another 3rd party extension that I need to use in combination to yours. Either duplicating its functionality or offering better integration with it would also be great. It is one of the best dictionaries out there.

  1. There should be an ability/color to save complete phrases or “chunks”. It’s quite important.

Other bugs I have seen:

  1. Punctuation marks break color marking recognition. For example, “so…”, due to the 3 dots, it would think it’s a new word than “so” and thus you would have to recolor it. i have seen that on LLN but probably applies to LLY too. Please strip any punctuation from a word that is being saved (it saves them with the punctuation).

Another related issue, for LLN with “CC” marked subtitles, sometimes you can’t mark one of the words attached to a square bracket. For example: “[wind howling]”, either “wind” or “howling” would not be markable due to the square bracket (not always the case).

  1. Pretty rarely, but you can find instances where you can mark the same exact word with two different colors at the same time. That might happen even in the exact same sentence. They are written exactly the same, and there is no difference between them that I can detect, but the extension thinks they’re different words for some reason. So this functionality isn’t really perfect yet. Just giving you the heads up.

  2. Words written in short form like “we’ll” are counted as two separate words: “we” and “ll”. You can mark each part in different color. Might be intended behavior but not sure… it seems incorrect.

“translation not found” for every word - ok, sounds like the code is trash. We’ll investigate. What language are you translating to, btw?

For the colors, you are marking all known words as green. Ok. This was Og’s original idea, and I think LingQ does it like that. I didn’t want to push that aspect too hard, as I thought it would focus the user on ‘whack-a-mole’ clicking rather than trying to process and understand the language itself, as whole sentences… I am cautious about placing too much emphasis on individual words and their translations. Words can be used in different contexts and have quite different ‘purposes’ in those contexts… it’s hard to say ‘Ok I know this word’… well, sometimes you can, depends on the word. :upside_down_face: Anyway, LLN is not trying to force a paradigm… it’s a tool that can be used by learners with different habits. I guess we could add some stats like you mention in the settings panel, it would be easy. I would like to encourage users to focus their attention on the audio as much as possible. I just implemented the ‘hide primary subs’ function… we could maybe offer stats about how many subs, and how many translations, the user had to reveal… although it’s a question of practical UI for that functionality. hm.

The ignore list, that’s a good suggestion. I’ll see if our NLP tools can handle this task automatically, I’m not sure.

Regarding ‘partial translations’, we can perhaps do this with the data we already have, it’s in the TODO list (word align the machine translation and the primary subtitles). It’s just a question of time, there’s just two of us, maintenance/bug fixing/refactoring takes about half our time, planning/emails/forum/advertising/writing instructions/etcetc. the other half, and new features the other half. :upside_down_face:

Thanks for the heads up on the other points, I will refer to these as I rework some of the NLP code in the next days.

Default - English.

Thanks, man. Sorry for overloading you with stuff.

I actually only came here to inform you about Arabic, which was the main and most critical thing that was bothering me the most. The other stuff don’t bother me so much, as I have my, a bit inconvenient, workarounds to deal with them, but since I was already here… I thought I would use the chance to tell you about other things and difficulties which occurred to me.

Hopefully, you would also be able to add TTS support for Arabic (Google Translate supports this).
Tip: Reverso-context and Al Jazeera offer much better quality audio than the Google Translate engine.

Take your time. No rush. Only wanted to make sure you’re informed about the issues.
Arabic is the most urgent problem from all the things mentioned before.

Good luck! Your extension rock, dude.

P.S: Another idea, for starting to learn a new language from scratch using your extension, it would be pretty damn cool, if you could auto-mark (highlight) the first 1000 (up to 3000) most common words to the user, so he would know what to focus on and not get lost in the sea of it. I think you guys might have already implemented such or similar feature in one of your first releases, if i’m not mistaken.

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Yup, that’s it. I think i’ll stop now. I can probably go like that forever and ever. :upside_down_face:
It was good talking with you.

I didn’t find it hindering me in any way. The opposite if anything. It adds a great value.
One of your better features, for sure. A great motivator as well (one of the most critical aspects in language learning).

That’s a non-concern for me. It’s a just a measure of progress, which also helps you notice words that repeat that you know that you have seen before at least once. It makes you more aware and helps recall. They can be green and then one day go back to blue again. It’s constantly fluid and changing. Nothing is set in stone and there is no rule they can’t retrograde back.You just judge each word based on your comfort level at that specific moment. If you aren’t too strict with yourself and “what you know” that wouldn’t bother any user what you described. The further along you’re in a language the more accurate those stats would become with sufficient time allowed.

That is also the viewpoint of Steve Kaufmann, the founder fo Lingq, btw. You can see him talk about such matters in his Youtube channel, if you’re interested to hear more about his perspective on such subjects.

You shouldn’t obsess too much with how well you know each word. Just flow with the flow.
Eventually the green would become “firm green” with enough time.
If you record in your JSON structure the last date a color was set/modifed, you can also in turn supply stats about relative “firmness”, I guess (every word older than a couple of months, for example).

Don’t hesitate to change the color back and forth constantly. The engagement with that in itself also helps reinforce the word in your memory.