Extending LR: Web browsing with automatic partial translations

Preface:
This is not my idea, I saw another tool which does something similar, but it was not implemented as well as it could have been. Combining this idea with LRs personalized word database would be something on another level.

The idea is simple but powerful:

The idea:
When you are browsing the internet and you read any web-page, any parts of the page which would translate to a KNOWN WORD (or LEARNING WORD) in your TL (Target Language) will be replaced in the middle of the original-language sentence. For example:

Original text:
For example, if I'm learning Chinese, this sentence here might look something like this (based on my own Chinese level)
With addon:
For example, 如果我学习中文,这个句子 might look something 这样 (based on 我自己的中文水平)

My Chinese is likely not accurate, but I’m sure you can see the idea. We can see color coded known/learning words and get all the usual features with LR’s pop-up dictionary on mouse-over and the ability to mark the word as known/learning/don’t learn etc.

In the beginning, you will be reading original-language sentences with individual words translated. As your vocabulary grows, you’ll be seeing more and more full sentences in your TL.

Challenges
The main challenge is finding effictive NLP systems to correctly handle sentence structures.
Here is my suggested approach/workflow:

  1. The original language text is fully translated into the TL behind-the-scenes (the user doesn’t see it). Translating the full text helps to ensure proper context compared to translating individual words. Modern ML models are very good at this.

  2. The translated text is parsed sentence-by-sentence to find groups of known/learning/unknown words and match them to the corresponding original-language text.

  • If less than 50% of the TL sentence’s text is known/learning, the original-language sentence structure is used but with known/learning words shown in the TL.
  • If more than 50% of the TL sentence’s text is known/learning, the TL sentence structure is used with the not-learning words shown in the original-language.

Personal Note
This could possible be a different addon compared the usual LR addon which handles video.

Hmm. I thought about this kind of thing before. I find the idea of switching out words to another language, hmm, pretty wierd :open_mouth:… the substituted word won’t have quite the same meaning and usage… you could only substitute maybe some kinds of words… grammar gets butchered. Then again… people mix languages in speech (spanglish etc.) and works well enough. Something that I would be more inclined to make: translate sites you visit into your target language… and we can annotate words you don’t know with translations below the word. Doing it in-page is potentially more ergonomic, but tricky to get right, so it works on all pages (readlang never quite worked right on all pages, I don’t think, with links etc.). I’ve been doing work with some guys recently to make better translations models available for LR, so we can features like that.

One difficulty in the workflow you mention is matching up which words in the translation correspond to which words in the source text. Translation models can output alignment data… the Microsoft api doesn’t reliably output this data… with our own models, it might be better. There’s tools like fastalign (GitHub - clab/fast_align: Simple, fast unsupervised word aligner), I don’t know if it would work well enough.

Probably this web-page annotation stuff isn’t top of my list at the moment (working on better translation, better youtube catalogue, a library of texts, getting a proper app etc.). Injecting code to modify existing websites, it’s a bit miserable and fiddly work, we know about this. :slight_smile: I can dig around see if there’s some existing code that could help.

An aside, we should document our server APIs (NLP, saved items, translation etc.), someone could make this kind of feature as a seperate extension.

1 Like

Hi David, I appreciate the detailed response!

Indeed, I read somewhere that you guys are working on upgrading the NLP models, which is one reason that I think you would be a great candidate to implement this.

That’s really interesting. Good to know there are some tools already available. If they’re not production ready yet, I imagine they will be by the end of the year, given the current rate of ML progress.

This could also work really well. It’s like throwing the user into a fully immersive TL internet with LR’s Text Mode at their side to help them navigate. I dig it.

Of course, this would definitely be something for later down the road. I love the direction that things have been moving in over the past few months, you made a lot of great usability changes and provided a lot of very valuable tools, really great work! I can’t imagine how busy you guys are working on so many different things at the same time.

That would be extremely cool!
I’m a developer myself and would really love to work with you guys in some form. Unfortunately I won’t be available before the end of 2023… If you had an API available, maybe I could dabble during my free time :stuck_out_tongue: