I often use LR to watch Chinese shows with both Mandarin and English subtitles. One thing I have noticed after using for several hours is that it doesn’t seem to be able to parse chengyu very well. In case you are not familiar with the term, Mandarin Chinese has a TON (many thousands) of idiomatic expressions used in daily conversation which are almost always exactly four characters in length. These strings of four characters create new words which will often have a totally different meaning from the individual characters that make them up.
To give one example, the phrase/word 亂七八糟 means a giant mess, or something that is in complete disorder. But the individual characters themselves in that word wouldn’t necessary reflect the meaning when you look at them individually. For example, the two characters in the middle are the numbers 7(七) and 8(八).
When using the app, it doesn’t seem to recognize the chengyu and group the four characters together correctly into a single meaning, rather, it lists the meaning of the individual characters separately instead.
Is there a way to improve this? I might be able to assist. In addition to being an avid student of Chinese, I’m also an ETL Developer for my day job. If you would like, I could probably generate a machine readable file (ex. XML, JSON) of thousands of the most commonly used chengyu and send it to you all, then you could use it to map the chengyu in the subtitles better. Not sure if that would be helpful or not, but I would just need to know how you all want it formatted so it could be imported into whatever database you are using on the backend.
Keep up the great work, this is by far my favorite language learning app!