Extending LR: Web browsing with automatic partial translations

Preface:
This is not my idea, I saw another tool which does something similar, but it was not implemented as well as it could have been. Combining this idea with LRs personalized word database would be something on another level.

The idea is simple but powerful:

The idea:
When you are browsing the internet and you read any web-page, any parts of the page which would translate to a KNOWN WORD (or LEARNING WORD) in your TL (Target Language) will be replaced in the middle of the original-language sentence. For example:

Original text:
For example, if I'm learning Chinese, this sentence here might look something like this (based on my own Chinese level)
With addon:
For example, å¦‚ęžœęˆ‘å­¦ä¹ äø­ę–‡ļ¼Œčæ™äøŖ叄子 might look something čæ™ę · ļ¼ˆbased on ꈑč‡Ŗå·±ēš„äø­ę–‡ę°“å¹³)

My Chinese is likely not accurate, but Iā€™m sure you can see the idea. We can see color coded known/learning words and get all the usual features with LRā€™s pop-up dictionary on mouse-over and the ability to mark the word as known/learning/donā€™t learn etc.

In the beginning, you will be reading original-language sentences with individual words translated. As your vocabulary grows, youā€™ll be seeing more and more full sentences in your TL.

Challenges
The main challenge is finding effictive NLP systems to correctly handle sentence structures.
Here is my suggested approach/workflow:

  1. The original language text is fully translated into the TL behind-the-scenes (the user doesnā€™t see it). Translating the full text helps to ensure proper context compared to translating individual words. Modern ML models are very good at this.

  2. The translated text is parsed sentence-by-sentence to find groups of known/learning/unknown words and match them to the corresponding original-language text.

  • If less than 50% of the TL sentenceā€™s text is known/learning, the original-language sentence structure is used but with known/learning words shown in the TL.
  • If more than 50% of the TL sentenceā€™s text is known/learning, the TL sentence structure is used with the not-learning words shown in the original-language.

Personal Note
This could possible be a different addon compared the usual LR addon which handles video.

Hmm. I thought about this kind of thing before. I find the idea of switching out words to another language, hmm, pretty wierd :open_mouth:ā€¦ the substituted word wonā€™t have quite the same meaning and usageā€¦ you could only substitute maybe some kinds of wordsā€¦ grammar gets butchered. Then againā€¦ people mix languages in speech (spanglish etc.) and works well enough. Something that I would be more inclined to make: translate sites you visit into your target languageā€¦ and we can annotate words you donā€™t know with translations below the word. Doing it in-page is potentially more ergonomic, but tricky to get right, so it works on all pages (readlang never quite worked right on all pages, I donā€™t think, with links etc.). Iā€™ve been doing work with some guys recently to make better translations models available for LR, so we can features like that.

One difficulty in the workflow you mention is matching up which words in the translation correspond to which words in the source text. Translation models can output alignment dataā€¦ the Microsoft api doesnā€™t reliably output this dataā€¦ with our own models, it might be better. Thereā€™s tools like fastalign (GitHub - clab/fast_align: Simple, fast unsupervised word aligner), I donā€™t know if it would work well enough.

Probably this web-page annotation stuff isnā€™t top of my list at the moment (working on better translation, better youtube catalogue, a library of texts, getting a proper app etc.). Injecting code to modify existing websites, itā€™s a bit miserable and fiddly work, we know about this. :slight_smile: I can dig around see if thereā€™s some existing code that could help.

An aside, we should document our server APIs (NLP, saved items, translation etc.), someone could make this kind of feature as a seperate extension.

1 Like

Hi David, I appreciate the detailed response!

Indeed, I read somewhere that you guys are working on upgrading the NLP models, which is one reason that I think you would be a great candidate to implement this.

Thatā€™s really interesting. Good to know there are some tools already available. If theyā€™re not production ready yet, I imagine they will be by the end of the year, given the current rate of ML progress.

This could also work really well. Itā€™s like throwing the user into a fully immersive TL internet with LRā€™s Text Mode at their side to help them navigate. I dig it.

Of course, this would definitely be something for later down the road. I love the direction that things have been moving in over the past few months, you made a lot of great usability changes and provided a lot of very valuable tools, really great work! I canā€™t imagine how busy you guys are working on so many different things at the same time.

That would be extremely cool!
Iā€™m a developer myself and would really love to work with you guys in some form. Unfortunately I wonā€™t be available before the end of 2023ā€¦ If you had an API available, maybe I could dabble during my free time :stuck_out_tongue: