Settings for how Chinese names are machine translated?

Is there any possibility of controlling whether the English machine translations for Chinese subtitles include the names of English language celebrities and companies?

Whatever translation system is currently being used really likes mangling the Chinese names of fictional Chinese people and organisations by over-matching with English language names that often don’t even use the same characters in their Chinese equivalents.

The Netlix live action version of The King’s Avatar has variously mistranslated the team name 嘉世 (Excellent Era) as ‘Castrol’ (Chinese name 嘉实多), ‘Carion’ and one time it was even ‘The Kardashians’ (Chinese name 卡戴珊)!

I think the most appropriate way to handle 嘉世 as an unknown name would just be to romanise it to ‘Jiashi’.

The same show has one of the female characters called the pet name 少唐 (Little Tang), which the machine translations consistently change to ‘Don Jr.’. Both Donalds Duck and Trump are both 唐纳德 in Chinese) but image searching 少唐 doesn’t produce any photos of Donald Trump Junior, and this sort of thing is honestly pretty distracting:

Again, in the same show, a fictional person 乔一帆 (Qiao Yifan) is machine translated as ‘Joe Yifan’. 乔 is the Chinese character used for the name Joe (and in the title of Jojo Rabbit (乔乔兔)), but it’s also a Chinese family name so when it’s next to a Chinese given name, going with ‘Joe’ consistently in all the episodes seems inappropriate.

Is there any way to control this unwanted over-matching of Chinese names? The Chrome extension I use to give translations of Chinese words as hover text almost never does this and when it does it’s almost always an alternate definition, not first choice.

If not, could we perhaps submit corrections or alternative translations for particular names?

I realise the answer is probably no, but I thought there’s no harm in asking!

Thanks as always for all your hard work :slightly_smiling_face:

Hmm, it would be maybe possible to do something… translations are handled by a translate API… if we isolated certain entities in the source text, and constructed a glossary of suitable translations, and submitted them to the API too… but it’s quite some work for a ‘niggle’… I think there’s a still a lot of low-hanging fruit, featurewise, for us to try first. It’s good that you brought this to our attention. Submitting corrections, yes, that might happen sometime. :slight_smile:

I alluded to this elsewhere today, but there’s a much wider and more significant problem with how your API handles ‘words’ and their definitions in Chinese, which becomes extremely obvious once the pinyin romanisation causes these clumps of characters to be highlighted.

The tl;dr is that Chinese doesn’t have words, it has characters, it also doesn’t have word boundaries that an API can reliably find. If hover text and clickable definitions are to be at all useful then you need to be able to define all the individual characters and pairings of characters. 1) because if your APIs get it wrong, it’s impossible to work out what the sentence actually means and 2) because new words contain new characters that you’ll need to know elsewhere.

The slightly longer summary is: The API currently somewhat arbitrarily decides what is word and then blocks you from seeing any other definition for that ‘word’ or its components. This is particularly annoying if it’s decided that a set of characters in the middle of a sentence is ‘The Kardashians’ but, that aside, the wider problem affects multiple sentences in every scene and severely impedes how useful LLN is for Chinese language learning - unless you install additional Chrome extensions to work around the problem (which have been broken by some previous versions of LLN).

You need to be able to define all the characters in the compound because so many Chinese words are compounds and often learning a new word and its characters actually teaches you two or more new words, and gives you the tools to intuitively understand what other unfamiliar words mean.

For example ‘owl’ is ‘cat-head-eagle’. If you don’t know the word for ‘eagle’ and you come across ‘owl’ (as I didn’t), then clicking or hovering for definitions should allow you to see what the individual characters mean. If not, then I’ll have seen two characters I know and one I don’t that together make a word ‘cat-head-???’ that means ‘owl’, but I’ve only learned ‘owl’. If your hover or click-for-definition system works well, then instead I’ve learned two words ‘owl’ and ‘eagle’ (and hawk/falcon), and that ‘head’ can be used like ‘headed’.

And now for the far too long explanation with detailed worked example follows:

The API you’re using has some very strange ideas about what counts as a word. Let’s take the example I used earlier in the pinyin thread…

So the sentence is:

这魏无羡做的东西就是不行
zhè wèi wú xiàn zuò de dōng xī jiù shì bù xíng

Machine translation is:
This wei’s envious thing is not possible

Human translation is:
Anyway, Wei’s inventions are unqualified!

The API has split it tas follows - I’ve added the definitions it gives for each chunk, with item 1 being what you get on hover, and the rest being what’s given on click:

zhè

  1. it
  2. pron. this, these
  3. adj. now

wèi wú xiàn zuò
魏无羡做

  1. wei wuxian does
  2. Wei Wuxian does

de

  1. possessive particle
  2. adj. possessive particle
  3. pre. of
  4. noun. aim
  5. adv. really and truly.
  6. ablative cause suffix
  7. -self

dōng xī
东西

  1. things, something, stuff
  2. noun. thing, stuff, east and west

jiù shì
就是

  1. just
  2. adj. even, exactly, very, precisely, just like, in the same way
  3. conj. if


  1. not, never
  2. adv. not
  3. adj. no
  4. noun. no
  5. non-, a-, il-, im-, in-, ir-, un-

xíng

  1. do, travel, walk
  2. verb. do, travel, walk, go
  3. noun. row, behavior, conduct, profession, behavior
  4. adj. professional, capable, competent, temporary

Please note, I am still very much a novice and learning as I go. I am most certainly not a translator! But here’s how I’d break that sentence down:

The machine translation has taken the ‘Zhè’ to mean ‘This’, but in the context of a conversation, it’s also often more like an interjection where we’d say something like ‘Anyway’ or ‘As I see it’ or ‘However’.

Next the API has decided that Wei Wuxian is a name (in this case we’re lucky and it’s right), but it’s gone further and decided that ‘Wei Wuxian does’ should be taken as a single word with no ability to see the meanings of any of the characters within it. Now, this isn’t what the machine translation has decided, it thinks ‘Wei’ is probably a name, although isn’t confident enough to give it a capital letter, and that Wuxian means ‘envious’.

So, looking at what LLN is displaying, a learner might reasonably ask, “Where has the machine translation got ‘envious’ from?” There is no way to know. The definition is soley given as ‘Wei Wuxian does’.

A learner should be able to break that definition down further to discover that ‘Wèi’, is a family name, which long ago literally meant ‘tower over a palace gateway’. ‘Wúxiàn’ is a given name, made up of the characters ‘wú’, which means something lacking, and ‘xiàn’, which means envy - and so the given name could be taken to mean ‘no envy’. Given that this is a literary character, some of that meaning could be intended as a poetic implication, so it’s helpful to know those meanings, even if you’re not wondering why a machine translation is adding ‘envious’ to everything whenever the person is mentioned.

‘zuò’ means ‘do’ or ‘make’. It’s often used as a compound, so, for example, if you ‘make-meal’ you’re cooking.

The ‘de’ particle hasn’t been included in this definition clump, but it probably should be, given that it either puts the emphasis on the word before or joins the words around it by defining a possessive relationship (unless it’s the rare case where it’s a word and not a particle, which this isn’t).

For example, you might put a ‘de’ on the end of the ‘zuò’ to emphasise how the fact that you have made or done the thing is to be taken as the main subject of the sentence. In fact, that happens earlier in the scene, someone asks who made the magic compass they’re discussing, and we’re told it was Wei Wuxian who made it - the sentence ends ‘Wèi Wúxiàn zuòde’.

Moving on, ‘dōngxī’ means thing, things, or stuff, although it’s very often left out of translations because it’s used in compounds like ‘buy-stuff’ for ‘shopping’. If you make it the subject of a sentence then it would be ‘something’. It literally means ‘east-west’, which the definition on click does include.

Now, how the ‘de’ particle joins characters together can change things radically. For example, if we made the sentence fragment that we’re deciding is a ‘word’ into ‘zuò de dōngxī’, that would mean ‘things to do’. But if you decided to add the ‘xiàn’ to this lump then ‘xiàn zuò de dōngxī’ might mean ‘enivous things’, but if you also add the ‘wú’, then ‘wú xiàn zuò de dōngxī’ becomes the opposite ‘things you don’t envy’.

Of course you shouldn’t do that, it should be ‘Wúxiàn zuò de dōngxī’, ‘things that Wuxian makes’ (although it could also mean things that Wuxian does), or in this case Wuxian’s inventions, because in the context of this conversation, Wei Wuxian has made a magical compass that detects demons, and the characters on screen are discussing if it works.

This is because the possessive particle ‘de’ is actually applying to Wei Wuxian here, due to a Chinese grammar rule where the possessive ‘de’ moves after the verb. So ‘Mom’s cakes’ is ‘妈妈的蛋糕‘ / ’Māma de dàngāo’ but ‘The cakes that mom makes’ is constructed more like “Mom make’s cakes” ‘妈妈做的蛋糕‘ / ’Māma zuò de dàngāo’. So, yes, we’re definitiely talking about things that Wei made, not what Wei does.

So then we have ‘jiù shì’. ‘Jiù’ means a huge number of things on its own, such as ‘at once’, ‘right away’, ‘as early as’, ‘to suffer’, ‘goes well with’ (food), ‘to take advantage of’, ‘with regards to’, ‘concerning’ etc etc. ‘Shì’ means ‘is’, ‘are’, 'am, ‘yes’, ‘to be’ (it’s more like an ‘equals’ than an ‘is’). Together, ‘jiùshì’ means ‘just’, ‘exactly’, ‘prescisely’, ‘even’, ‘if’, but it is often a form of emphasis that you mean exactly as you say - you may choose to translate this simply by putting an exclamation mark at the end of the English sentence, or just put in ‘just’, whatever seems natural.

‘Bù’ as a prefix is used for negation. On its own it might mean ‘No!’

行 has been romanised as ‘xíng’ here, ‘xíng’ means things like ‘to walk’, ‘to go’, ‘to travel’, ‘temporary’, ‘makeshift’, ‘capable’, ‘competent’, ‘effective’, ‘alright’, ‘OK’. ‘will do’, behaviour’, etc. If you say it on its own it’s an affirmation that you’re going to do the thing you were asked to do, or that what’s just been said is acceptable, much like ‘OK’.

If you said ‘Bù xíng’ on its own, it’s the opposite of OK and you’re saying ‘No way!’ (especially if you say it twice), or it’s communicating something like ‘can’t’, ‘no good’, ‘incapable’, ‘doesn’t work’ or ‘out of the question’. (It could also be a negation of one of the meanings of ‘xíng’ above. A native speaker would know what was natural in each context, the rest of us watch a lot of Chinese TV to help us work it out.)

So again, we could clump some preceeding characters into this ‘no way’, so that ‘jiùshì bùxíng’ might be ‘just can’t’ and ‘dōngxī jiùshì bùxíng’ might be ‘something just doesn’t work’, but we know that the ‘something’ here is 'things made by Wei Wuxian, so we’re saying “Anyway, the things that are made by Wei Xuxian just don’t work.”

Except, when you’re taking about whether a machine or gadget doesn’t work, you wouldn’t say it that way. So this probably has something of the secondary meanings to it, either ‘ineffective’ or ‘are unprofessional’, which is supported by the Netflix translator choosing to used ‘are unqualified’. So maybe you might take translator’s license and make it “are unproven” or “can’t be trusted” or “aren’t up to the job”. “Just don’t work” is probably fine though. Although perhaps I’ll discover later that this fantasy setting has a professional body for certifying magical devices - it’s quite possible.

(For an added complication, 行 is a Chinese homonym and can also be pronounced ‘háng’, when it’s said that way, it means ‘row’, ‘line’, ‘rank’, ‘commercial firm’, ‘line of business’, ‘profession’. The dialogue clearly says ‘xíng’, but the list of definitions included on click has the ‘háng’ meaning as its second definition with no mention that this is a completely different word that isn’t pronounced the way the voice has just spoken it.)

Worked example ends.

Here’s a link to the relevant episode if you want to check this out, see conversation 4 minutes in: https://www.netflix.com/watch/81200232?trackId=14170286&tctx=2%2C1%2C8675eadc-33c4-41d7-bec2-e54421a66912-32476802%2Cbfc2f7dd-3b3f-422c-bb90-3a74966c22bf_42730065X3XX1590099970523%2Cbfc2f7dd-3b3f-422c-bb90-3a74966c22bf_ROOT%2C

So hopefully I’ve made the point that you need to be able to see definitions for each of the characters within a compound word or name to have any hope of translating a sentence containing unfamiliar words or characters, or to understand why the machine translation is so wildly wrong, or why the human translator chose a different way of saying the same thing, or to get the most out of learning new words.

And that’s what I’ve been doing all the time I’ve been using LLN, and the way that I was able to provide you with all those breakdowns of individual characters.

I use another Chrome extension to add hover text definitions to Chinese sentences. It’s called Zhongwen: Chinese-English Dictionary and can be found here: https://chrome.google.com/webstore/detail/zhongwen-chinese-english/kkmlkkjojmombglmlpbpapmhcaljjkde

This allows me to hover over any character and see the pinyin, the simplified and traditional forms and the definitions, including any that are homonyms but not homophones. It also sometimes has keyboard shortcut grammar notes.

Currently it’s working well with LLN subtitles, although it wasn’t on some previous versions. It allows me to hover over individual characters within ‘words’ that LLN has chosen to give definitions to and make sense of them. When it matches multiple characters as a word starting with the character hovered, it gives definitions of the combination and of the invididual character the mouse is hovering on, so you can move to each of the characters within a word like owl/cat-head-eagle.



image

And in LLN:


Screen Shot 2020-05-22 at 21.32.17

Hopefully you now have a vague idea of what the problem is, and why within the wider picture this is more than a niggle?

If you don’t have plans to fix any of this, it would be amazing if you could test against hover-based dictionary Chrone Extensions like Zhongwen and make sure that changes to LLY don’t break them.

I’m pretty sure that the way pinyin is being formatted in the sidebar transcripts is actually breaking Zhongwen’s ability to recognise potential words within sentences - see the comment elsewhere on pinyin formatting and add this to the reasons why pinyin and characters should just be 2 complete lines and not chunked together in the transcripts.

All this is also a reason why it’s so important to find a way for us to include the English subtitles on LLY videos when they’re provided. Machine translation is pretty impressively bad at Chinese when it gets it wrong.

Anyway, as always, huge huge thanks for all the hard work! I am really genuinely extremely grateful for all the time and effort you put into creating, maintaining and improving these tools. I know that everyone has a lot going on right now and there are other priorities, so totally understand if this isn’t anywhere near the top of your to-do lists! I hope you take my worryingly long and time-consuming feedback as encouragement for a genuinely amazing project, and not as entitled moaning (and you did ask on Twitter! :wink:)

Have a great weekend, or whatever time it is whenever you get around to slogging through this monster comment! :sweat_smile:

1 Like