Youtube Extension - Now Online!

Is it still in review? I still have v0.1.0, not v1.0.0.

It was rejected twice, we are removing some more permissions and trying again. :-/

Is is still being rejected?

Hey. Sorry guys, I got pretty distracted and started a ventilator project (http://openvent.org/). But, I am back now, working on LLN/LLY.

Good news, after another few tries, LLY is now updated, and working very nice. :slight_smile:

2 Likes

Please take a look at Arabic. LLY doesn’t work properly with this language.
It works ok with Hebrew but not Arabic.

Also, when 2 subtitles are present, LLY doesn’t select the English translation to display alongside the transcription, as it does with Netflix.

Please also allow to drag and change the placement of the subtitles. Often there is embedded translation in the video and LLY blocks the view (especially that is the case when you watch content intended for language learners) .

Some simple stats and percentages regarding white and green words would also be very handy. That would be very motivating to visualize progress in that fashion, and to judge in advance how much do you know from a given new video.

When hovring over the transcript, the video is paused, and it should remained paused, when subsequently you hover over to the pop-up definition box of a certain word.

Thank you for the awesome extension. Keep it up.

Hi. Do you have a link for the video that doesn’t work? I tried an Arabic video, this one seems ok:



There’s no technical reason why Arabic shouldn’t work, but, there’s a lot that can go wrong when you are bolting on stuff to somebody else’s website. Huh. This extension isn’t bad actually… I can study Arabic finally without having to try to read news articles.

You’re right, we only use the machine translations, not ‘human translations’… this is a todo, the code is still there and can be hooked up, but I didn’t think it’s a high priority, most yt videos only have subtitles in the original language, if at all… depends what part of Youtube you are watching of course.

drag and change the placement - yeah, Og mentioned doing that, I think it’s quick. We made the subs a bit transparent a couple of days ago. I would like to bring back the “bottom panel” option (that’s what we call it) from the Netflix version to Youtube, when in fullscreen, so subs are not over the video at all.

stats - probably we’ll look at this after we have word frequency stuff working again.

hovering over the transcript - yes, I think you are right. Pause on hover was hacked on quickly, the logic needs improving.

Thanks for helpful suggestions, I’ll get Og to look these over too.

Sesame street, that’s where it’s at: https://www.youtube.com/watch?v=STl5nE_IqCg

There is a lot of good stuff on Youtube… too much even. Sometimes you have to dig though. We were thinking to make some kind of catalogue where users can recommned videos by pressing a button while watching the video… or… something similar.

1 Like

For the same exact timestamp you posted I get for Arabic “translation not found” for every word in the video. Only some example sentences are found for some of the words. Never, or practically never (not sure), any definition is displayed, however. Color marking also doesn’t work at all (only for the first word in the sentence). Extension version 1.0.1. This has been confirmed by another user as well that I know. Same thing applies to LLN. So, basically, it’s completely broken from my POV for that language.

True, but I have been watching ton of youtube way before that extension emerged, so, nevertheless, i have been able to collect a lot of such content in many languages. If you search for it, you would find it. I have been using the extension “DualSub” for youtube to offer me this functionality thus far.

Ted talks are one example where there are multiple subtitles in may different languages.

A simple counter for each colored word would be a great start. You already have all that information. No need to implement some special frequency logic yet. If you download the JSON and search “green” or “blue” that is the only info you need at the moment. Also the percentage of white in any given text can be easily calculated and displayed to the user.
Otherwise, once you mark a few thousand words, you start to doubt that the database size is even increasing anymore, since it kinda gives you the impression that it “looks the same” each and every time. I would even go as far to say that would be a pretty critical feature for long term user engagement. One of the main indicators motivating you to push forward.

Other suggestions:

  1. There should be an “ignore list/color” for names of people and literal numbers, for example, and other such useless information. There should be a way to distinguish them separately from other white words and without contaminating your database by saving them.

  2. You currently offer google translation of an entire sentence, but often it’s the case when you want to translate only part of the sentence to increase your comprehension of a difficult portion of it. Goolge translation changes depending on the portion you select, so this comparison ability is very helpful and something that i use constantly. I currently use the sidebar from your extension combined with the “google translate” extension in order to achieve that.

a) You should be able to do all of that only from within your extension

b) You should allow text from the subtitle on the main screen to be selectable, so you can do it there as well (currently you can only click on words there)

c) Reverso context has a pretty awesome phrases detection which is much superior to google translate in most cases. That is another 3rd party extension that I need to use in combination to yours. Either duplicating its functionality or offering better integration with it would also be great. It is one of the best dictionaries out there.

  1. There should be an ability/color to save complete phrases or “chunks”. It’s quite important.

Other bugs I have seen:

  1. Punctuation marks break color marking recognition. For example, “so…”, due to the 3 dots, it would think it’s a new word than “so” and thus you would have to recolor it. i have seen that on LLN but probably applies to LLY too. Please strip any punctuation from a word that is being saved (it saves them with the punctuation).

Another related issue, for LLN with “CC” marked subtitles, sometimes you can’t mark one of the words attached to a square bracket. For example: “[wind howling]”, either “wind” or “howling” would not be markable due to the square bracket (not always the case).

  1. Pretty rarely, but you can find instances where you can mark the same exact word with two different colors at the same time. That might happen even in the exact same sentence. They are written exactly the same, and there is no difference between them that I can detect, but the extension thinks they’re different words for some reason. So this functionality isn’t really perfect yet. Just giving you the heads up.

  2. Words written in short form like “we’ll” are counted as two separate words: “we” and “ll”. You can mark each part in different color. Might be intended behavior but not sure… it seems incorrect.

“translation not found” for every word - ok, sounds like the code is trash. We’ll investigate. What language are you translating to, btw?

For the colors, you are marking all known words as green. Ok. This was Og’s original idea, and I think LingQ does it like that. I didn’t want to push that aspect too hard, as I thought it would focus the user on ‘whack-a-mole’ clicking rather than trying to process and understand the language itself, as whole sentences… I am cautious about placing too much emphasis on individual words and their translations. Words can be used in different contexts and have quite different ‘purposes’ in those contexts… it’s hard to say ‘Ok I know this word’… well, sometimes you can, depends on the word. :upside_down_face: Anyway, LLN is not trying to force a paradigm… it’s a tool that can be used by learners with different habits. I guess we could add some stats like you mention in the settings panel, it would be easy. I would like to encourage users to focus their attention on the audio as much as possible. I just implemented the ‘hide primary subs’ function… we could maybe offer stats about how many subs, and how many translations, the user had to reveal… although it’s a question of practical UI for that functionality. hm.

The ignore list, that’s a good suggestion. I’ll see if our NLP tools can handle this task automatically, I’m not sure.

Regarding ‘partial translations’, we can perhaps do this with the data we already have, it’s in the TODO list (word align the machine translation and the primary subtitles). It’s just a question of time, there’s just two of us, maintenance/bug fixing/refactoring takes about half our time, planning/emails/forum/advertising/writing instructions/etcetc. the other half, and new features the other half. :upside_down_face:

Thanks for the heads up on the other points, I will refer to these as I rework some of the NLP code in the next days.

Default - English.

Thanks, man. Sorry for overloading you with stuff.

I actually only came here to inform you about Arabic, which was the main and most critical thing that was bothering me the most. The other stuff don’t bother me so much, as I have my, a bit inconvenient, workarounds to deal with them, but since I was already here… I thought I would use the chance to tell you about other things and difficulties which occurred to me.

Hopefully, you would also be able to add TTS support for Arabic (Google Translate supports this).
Tip: Reverso-context and Al Jazeera offer much better quality audio than the Google Translate engine.

Take your time. No rush. Only wanted to make sure you’re informed about the issues.
Arabic is the most urgent problem from all the things mentioned before.

Good luck! Your extension rock, dude.

P.S: Another idea, for starting to learn a new language from scratch using your extension, it would be pretty damn cool, if you could auto-mark (highlight) the first 1000 (up to 3000) most common words to the user, so he would know what to focus on and not get lost in the sea of it. I think you guys might have already implemented such or similar feature in one of your first releases, if i’m not mistaken.

https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists

Yup, that’s it. I think i’ll stop now. I can probably go like that forever and ever. :upside_down_face:
It was good talking with you.

I didn’t find it hindering me in any way. The opposite if anything. It adds a great value.
One of your better features, for sure. A great motivator as well (one of the most critical aspects in language learning).

That’s a non-concern for me. It’s a just a measure of progress, which also helps you notice words that repeat that you know that you have seen before at least once. It makes you more aware and helps recall. They can be green and then one day go back to blue again. It’s constantly fluid and changing. Nothing is set in stone and there is no rule they can’t retrograde back.You just judge each word based on your comfort level at that specific moment. If you aren’t too strict with yourself and “what you know” that wouldn’t bother any user what you described. The further along you’re in a language the more accurate those stats would become with sufficient time allowed.

That is also the viewpoint of Steve Kaufmann, the founder fo Lingq, btw. You can see him talk about such matters in his Youtube channel, if you’re interested to hear more about his perspective on such subjects.

You shouldn’t obsess too much with how well you know each word. Just flow with the flow.
Eventually the green would become “firm green” with enough time.
If you record in your JSON structure the last date a color was set/modifed, you can also in turn supply stats about relative “firmness”, I guess (every word older than a couple of months, for example).

Don’t hesitate to change the color back and forth constantly. The engagement with that in itself also helps reinforce the word in your memory.

Hi! Thank you so much for your work.
I make videos that I speak Japanese in there.
Is it possible that I fix English translation ??

Hi! I am a great fan of LLN. Thank you so much for offering this to the world!

I had been waiting for years for LLN to exist, after discovering Bern (Switzerland)'s movie theaters, which, for a decade or two, have been displaying movies with three subtitles at once : the original version (usually English), the German translation, and the French translation.
I had always dreamed that my DVDs could do the same.
And there you come, with your double-subtitles ! Thanks, guys !

So, now, I am thrilled : I am testing LLY, on this 2015 TEDx video (Bill Gates talking about a possible viral outbreak) :

TEDx videos are great because they usually come with several sets of human-made subtitles. So here I am, watching most of the video with English human-made subtitles, and a few parts with “Français” subtitles, and then one more time with “Français Canada” subtitles (especially a fragment where Bill Gates was using the word “equity” in too puzzling a way to me, poor random Frenchman).
I wish I could display its English human-made subtitles AND its “Français Canada” human-made subtitles at the same time, but I have not figured how to do this yet. Did I miss anything?

1 Like

This has been discussed here before just a second ago. It’s on the TODO list.
Only machine translation is available at the moment. I’m not affiliated with the extension.

Workaround for the meantime: Install the “DualSub” extension and select English there and French in LLY, or the reverse order if you study English. It will get you what you want.

1 Like

LLN works ok with Filipino but it doesn’t work in LLY (pretty much like the Arabic situation I described before).

Suggestion: Sometimes the subtitle language is mislabeled on Youtube (or 2 languages might be in the same subtitle). It would be useful to allow the user to personally set the correct language of the transcript in such a case.

Consider also that Youtube’s auto-language speech detection fails, if the speaker starts the video by saying a few greeting words in one language, for the first few seconds, and then switch to another language for the rest of the video.

Another reason why that might be useful.

EDIT: I just saw it’s in the todo list. Looking forward to it!

Thanks a LOT for this extension! I’m a Spanish language tutor and I create videos in Spanish with carefully synced subtitles for my students, so this is just what I needed.

Question: Besides using machine translation, can LLY load a second subtitle track already in the video? For example, for this video about why we should give up shaking hands, I carefully crafted both the Spanish subs and English subs (which are a rather literal translation so they match the Spanish subs closely), so it would be great to use the English subs I uploaded instead of the machine translation:

Hey. So we think we have fixed the issues with Arabic, the new code should be live in 24hrs or so. I checked out automatically excluding ‘named entities’ from the coloring system, it’s possible, but the lib I was looking at (stanza) is computationally extremely heavy, will have to look if there’s an alternative. Og is looking at some of the issues you mentioned around punctuation and saving words, but it might have to wait till next time we revisit that code. Adding stats, drag subtitles (or other solution), dict hover/pause behaviour to TODO. Phrase detection (phrasal verbs and compound nouns etc., will check out Reverso) and word frequency was something already on my mental todo list. :slight_smile:

1 Like

“you can mark the same exact word with two different colors at the same time”

Example: ‘Can I open this can of soup?’ - the first ‘can’ is recognised as a verb, the second as a noun, so, should be saved seperatly. Also, the lemma form is saved, not the form in the text. Of course it could be more broken code you have found. :open_mouth:

Hi. :slight_smile: I’m glad you like it. We were hoping teachers on Youtube would find it useful. Using a second Youtube track as a translation is possible (we do this on Netflix with our other extension), we’ll add this feature back, but it might take some weeks before it’s ready. btw, I tried your video, setting the Youtube subtitles to Spanish, and using machine translation from the extension, the results were good. They are nice videos.

Our extension tries to set the native language subtitles for the video you are watching on Youtube, but, the way it does this is to look if ‘ASR’ (automatic speech recognition/auto captions) are available for that video, then, using that language, looks for ‘human’ subtitles. This is not always possible though. Youtube doesn’t make it easy to detect the ‘native’ subs. For your video, Youtube set English subtitles. As no ASR track was available, LLY couldn’t detect the native langauge. You could include some small data in your video description which the extension could read, or we could keep some data in our database about your videos, and which captions language to set. If we keep some data in our database, we could add special features for your video (subtitles with highlighted words… links for pdf downloads… etc.). If you have some feature requests, I’m interested to hear them and help. We even thought a little about making a kind of ‘Netflix for language classes’, hosting videos (without ads) ourselves.

Hello. :slight_smile: We only support using machine translation for the second subtitles at the moment, but I’ll add the feature soon to use a different Youtube subtitles as the second subtitle.

@Jerome_Poirrier Glad you like the extension, I think this is your request too.

1 Like