Pinyin support for Chinese (Mandarin) subtitles?

On a similar subject, I’d like to know what hsk level a word I’ve selected is. I’m HSK 5 and it would be nice to know if a word is 5/6 leveled (or god forbid 4 and I haven’t learned it :stuck_out_tongue:) or if it’s just super high leveled and therefore not worth my study at this time.

1 Like

I found a workaround until this is implemented in the addon. See this gist on github: https://gist.github.com/klausondrag/6bf80ec882019260de1e498bc7516d08

Just discovered this app and it is almost exactly what I need to learn Mandarin… almost.
Just get that Pinyin support working and you will have a new monthly subscriber for many years.

Would also love PInyin! This has been super helpful so far thank you!
If Pinyin on here too, I think I’ll be able to achieve my learning goal much easier.

also would love pinyin!!! ive been copying and pasting everything into google translate to get pinyin but it takes like 10x longer than it could w added pinyin</3

Hey guys. Thanks for being patient. The Youtube extension is faily stable now, so I’m switching to working on improving the linguistics processing code… pinyin… and some other cool stuff (word frequencies, maybe transliteration of other scripts, recognising phrasal verbs and compound nouns etc.).

See: Feature TODO List and roadmap

2 Likes

Pinyin support would be a massive game changer for learning mandarin with LLN - following for updates

1 Like

Hi there! I love your extension and product, I just want to extend my enthusiasm for pinyin that would be kinda essential for most mandarin chinese learners like myself! I’m excited and I’ll wait the good news haha! Good luck!

Hey. is coded up, Og is just finishing the front-end code. Will deploy to Youtube (LLY) tomorrow most likely. We might have introduced a bug into LLN, so, will test some more, but that should follow soon too.

Long suffering Chinese learners: It’s online. Sorry it took so long :disappointed_relieved:

Refresh your Youtube/Netflix page a couple of times. We don’t speak Chinese so please let us know if there’s something not right.

1 Like

How do I turn the pinyin off?

Thanks for adding this! I’d already noticed it on YouTube, but a few refreshes got it working on Netflix too.

My feedback is that this will generally be a huge huge help and makes LLN/LLY a lot more useful for Chinese learners, but that there absolutely needs to be the option to turn it off, and greater control of how and when it appears.

There are stages in learning where you’re trying to learn the characters without pinyin. At this point you should only be looking up the pinyin if you want to check that you remembered correctly, or see what the tones are.

I think for a significant portion of Chinese learners, they’re only going to want the pinyin to appear on hover or click in the definition that appears - and I’ve reloaded several times and it still doesn’t seem to be there yet. Is this feature planned?

I’m not quite at the level where I only want pinyin in the definition modals, I would like to keep the pinyin on screen, but I’m finding that putting pinyin at the top means that I’m not actually reading the characters, which I’m pretty good at doing for a few hundred of the most common characters. I think an option to put the pinyin below the characters would cause me to read the characters first, then glance down at the pinyin only if I need it, aiding my learning.

Ultimately my goal is to be able to read books, signs and subtitles, none of which would provide the pinyin, so there’s a point at which it would be limiting how much I’m learning by having the pinyin ‘training wheels’ always on screen.

Similarly, I think that pinyin should be blurred out when ‘hide translations’ is on. Ideally this functionality would have more granular control and there’d be a third option ‘hide pronounciation’ that would also apply to similar additions in Japanese.

The only other bit of feedback would be about how layout is working both in subtitles and the sidebar transcript - I’ll cover transcript later, it needs a different approach.

I would prefer it if the subtitle layout logic attempted to centre the pinyin alternative directly above (or below) the individual character it relates to.

Currently there seems to be some guessing of which sets of characters make up ‘words’ (often these are quite arbitrary) and then all the pinyin for those characters is centred within that ‘word’ rather than the individual characters.

I’m not sure if this way of doing things would be desirable even if the word matching was 100% accruate, but when they’re often different from even the machine transtion’s interpretation, this becomes extremely distracting.

(This relates to a separate issue with the definitions, but that should be in its own separate topic, I think.)

It’s even worse in the transcript sidebar where the characters start getting grouped into dubious chunks and the pinyin mostly displays as simple sentences.

This is really not desirable. The characters should never be clumped up in that way - a gap makes it look like it’s a difference sentence or cuts words in two.

I’d say in the sidebar you want the pinyin to be one sentence, and the Chinese characters another. This is how pinyin alternatives would be provided in a textbook, it’s more like a translation than an annotation or key.

This has also had the side effect of breaking copy-paste from the transcript. This is the mess resulting from copying one line of subtitle dialogue now:

zhè

wèi wú xiàn zuò
魏无羡做
de

dōng xī
东西
jiù shì
就是


xíng

This should be provided simply as, no gaps in either, plain text a line for each:

zhè wèi wú xiàn zuò de dōng xī jiù shì bù xíng
这魏无羡做的东西就是不行

I can quickly read either of those at a glance. With gaps added, they become harder to read and I’m distracted by why it’s decided that 魏无羡做 is a word and 不行 isn’t.

Oh and you’ve done this to all my saved items:

Having tested Anki export, I relieved to find that the lines aren’t mangled on export to Anki, but equally, there’s no pinyin field included.

Generally all my flashcards have the Chinese sentence on the front and both the pinyin and the English translations on the back.

Here’s an example with card definition above, preview of front of card below (audio would play):

And here’s the card after you click to see the rear (audio plays again):

I hope this makes the point that the pinyin should be treated as a type of translation, an intermediary step on the way to a translation.

While we’re talking about Anki export, in general it would be great if you could provide everything as simple named plain text fields that I can then set up my own formatting rules to change into cards - or at least give me the option to do it that way.

Anki is extremely powerful and customisable, but not if you’ve forced everything into your own ‘Front’ and ‘Back’ fields and filled them with HTML. Just trying to edit them to put the TV show title and episode on the other side was extremely labourious.

Here’s an example of what a card format definition looks like for Chinese sentence mining flashcards:

And a different, highly customised deck for learning Chinese characters, where I’ve added my own animated GIFs:

I’ve just found the answer to this in a thread asking about the Japanese equivalent.

To turn them off you need to go to settings (the cog icon next to the LLN or LLY logo and On/Off control) and find the ‘Show Transliterations:’ drop down, that should look like this:

From there “No transliterations” means “Don’t show pinyin”, “With originial form” means “Include pinyin” and “Transliterations only” means “Only show pinyin”. (Hopefully these will be changed to include the word ‘pinyin’, to match what’s been done with the Japanese version.)

I’d only seen pinyin referred to as ‘romanisation’ before now, so my eyes completely skipped over ‘Transliterations’ and assumed it said ‘Translations’ when I went looking for this setting.

Hope this has helped!

The ‘separate issue’ I mentioned in the comment above about the layout / formatting of the pinyin in chunks is now documented here in excruciating detail (and a tl;dr at the start):

Hey, thanks for this useful feedback. We’ll come back to Chinese shortly and sort it out properly. Maybe we can get a couple of the quicker items done in the next couple of days, will post updates here. :slight_smile:

Og coded up the dragable subs:

(purple words are the new/old word frequency highlighting)

Ok, so tasks for Chinese:
– Add pinyin to hover dict (if not already displayed), and full dict.
– Pinyin below subs, and a ‘blur transliteration’ mode (Hopefully the change doesn’t upset anyone).

Centre the pinyin alternative directly above (or below) the individual character it relates to.

Question, can Chinese characters always be mapped to the same pinyin? Can the pinyin change when characters appear as part of combinations? If not, I think we should be able to do it. We’d convert every Chinese character to pinyin in isolation.

Currently there seems to be some guessing of which sets of characters make up ‘words’.

Yes, there absolutely is. We could upgrade to jieba as a word tokeniser for Chinese, it’s probably a bit better but not perfect either… Maybe Tencent etc. made a better tool, it would need to be researched and integrated.

This is really not desirable. The characters should never be clumped up in that way - a gap makes it look like it’s a difference sentence or cuts words in two.

Ok, spaces between Chinese chars is undesirable. We’ll try implement your suggestion.

Oh and you’ve done this to all my saved items:

Will fix shortly.

Pinyin to Anki.

While we’re talking about Anki export, in general it would be great if you could provide everything as simple named plain text fields that I can then set up my own formatting rules to change into cards - or at least give me the option to do it that way.

We made CSV export, I think it’s what you want, can be imported in Anki if you set up the fields right, although it’s missing pinyin still.

We used ‘Transliterations’ because I found a library that can transliterate 50 languages or so… we were going to make it feature for many languages (Hindi, Russian etc.), but, on closer inspection it wasn’t so great.

[From the other thread]

You need to be able to define all the characters in the compound because so many Chinese words are compounds and often learning a new word and its characters actually teaches you two or more new words, and gives you the tools to intuitively understand what other unfamiliar words mean.

Maybe you can use a custom dictionary url that breaks down the word in chars? You can open the last used custom dict with shift + click. We could try showing definitions for each character in the word… I’m not sure it will be as useful as a dedicated Chinese dictionary. If you have a suggestion, we can add it to the list of external dictionaries.

‘cat-head-eagle’ :slight_smile:

There’s more to repond to, I’ll get to soon. Thanks for the detailed feedback.

The pronunciation might change for some characters. Just to give an example:
了 as a particle is most of the times ‘le’, although in ‘了解’ it is ‘liǎo jiě​’. Similar in some gramatical constructs like 看不了、看得了、去不了、去得了 it is ‘liǎo’. See complement liao.

  • 睡覺 ‘shuì jiào​’ vs. ​覺得 ‘jué de’ (this is one of the reasons why good word splitting is important)
  • 得 which can be ‘de’ or ‘dei’

Although I would say, having pinyin correct at least for words consisting of multiple characters and maybe at least the 不了/得了 pattern is good enough for the time being.

Ruby Character might have some inspiration.