Thai transliteration is lacking

Most Thai teachers onliine and on youtube use a system very similar to paiboon. It is consistent and has tone marks (important for a tonal language…). I do not know what you currently use, but it is not so useful. A own system?

Please check thai-language.com for a better one (choose option Paiboon). thai2english.com also uses something like that.

I do know this can be hard to implement. Anyway, thanks for reading.

As a workaround, let us modify the used transliterations by pasting text into it.

2 Likes

ps. seems thai2english now also has a great batch transliteration service (just like thai-language, except that one sometimes does not have a result for a few words). See Automatic Thai to English transliteration - fast, easy and accurate . Seems even possible to reuse this in your addon. Just send a post and scrape the results or contact hello @ thai2english dot com to support your addon through an api or something?

1 Like

Current Thai transliteration is seriously lacking.
Not only it ignores the tones, it’s also sometimes completely wrong.
For example, it transliterates “mahalai” as “manai”

Devs,
Thai is a tonal language.
You need a Thai Transliteration that conveys tones.
These are extremely difficult to look up; to be told what a tone is as soon as you hover over a word via a popup transliteration, would be a gamechanger.

The Transliteration system I recommend is the official one used by the Thai government.
The Royal Thai Precise System of Transcription.
Keep it standard. Happy to advise on what this entails.

Not to be confused with the Royal Thai General System of Transcription (RTGS), which is essentially the system you’re using now;
because it’s only meant to be used in limited contexts such as traffic signage, tones and a few other features that are important for learners are omitted.

The good news is that the “General” system you currently have is easily expandable to the “Precise” system.

This is an issue with the Transliteration not being able to discern where a word begins and ends.
eg, the Thai equivalents of “H” (ห and ฮ) are pronounced as an “N” only when at the end of a word.

Thus, in the above situation, it thinks “maha” is “mana”, or something like that.

But honestly, this is issue is not nearly as serious as the lack of tone marks, because word endings/beginnings are easy to tell from the Thai script itself. But tone is far harder.

To calculate tones, you’d need to use a complex algorithmic decision tree, and frankly, it’s just easier to form an implicit visual understanding from repetitious pattern recognition, ie, hovering over words and quickly seeing the tone in the transcription. (the algorithm is not fun and slows everything down)

1 Like

Well, as Paiboon and Royal Thai were already proposed (thai2english uses the third, invented by them), I’m going to make my own proposal as well.
I’d prefer AUA transcription developed by J. Marvin Brown over those two standards. Why?
First of all, it’s closer to IPA, than Paiboon and especially Royal Thai (which is often discouraged for learners anyway). And who does really want to learn another different standard, when it’s not so prevailing anyway?
Second, I don’t like the Paiboon’s convention for representing aspirated and non-aspirated consonants. Transcribing ก as g just feels wrong.
Finally, all my learning materials use AUA or its slightly modified version, so I’d just prefer sticking to one standard.

But honestly, devs, just implement that one, which can work correctly, because the current one is just useless.

@procion
I know you mean well… but this is exactly what we should not be doing.

I even agree with you that AUA is in some ways a superior transliteration system, (especially over Paiboon and Thai2English),
but we need to keep it as simple as possible for the developers.
If we keep proposing competing transliteration systems (and Thai has a million), we’re not gonna get any functioning system at all.

  1. The reason we should use the Royal Thai Precise System of Transcription is because it is the government-backed one used in the largest variety of official contexts and databases.

    • There are theoretically better transliteration schemes for Chinese than Pinyin, but no one would dare suggest an alternative because it’s the official and most widespread one.
  2. AUA uses special symbols for vowels such as “ʉ” as does Paiboon and others, whereas the Royal Thai System uses the “ue” digram.

    • While it’s nice to have 1 symbol used for 1 vowel sound, you would have to also employ “combining diacritics” to add tone marks to these special characters, and doing so may lead to weird rendering issues. However, this is not an issue with the 5 basic latin vowels that the Royal Thai System uses, because these are available as “pre-combined characters”. ie, à â á ǎ vs ʉ + ◌́, etc.
  3. Language Reactor is already using the Royal Thai system. They just need to upgrade it from the “General” variant intended for street signs, to the “Precise” variant intended for students that does include Tone marks and two other letter distinctions. Thus this approach should be easier for the LR team.

Alternatively, another solution would be a transliteration option that shows “tones only”. Unlike Chinese, Thai does have an alphabet (abugida) where, with few situational exceptions, each letter corresponds to one sound. So the serious student does not need transliteration of consonants and vowels for long.
But determining tone is the hard part because that is not a one-to-one correspondence indicated by a certain symbol, but more of a math equation that slows down macro comprehension at the sentence-level.
(and something most native Thai speakers never learn explicitly at all)

Therefore, for the non-native learner of Thai, if we had a transliteration option that showed us just M, L, F, H, R (for Mid, Low, Falling, High, and Rising tones)… that alone would be enough and an absolute game changer.

Thai transliteration will always be lacking, doesn’t matter which system you’re going to use.

For example: for years I said the word Koh (island) like it was written, but the K has to sound more between an K and and a G, and the O has to sound different as well.

Take the effort in learning Thai script and believe me, you’re going to be so happy that you did.

Still not any ‘official’ reply. Point being: current transliteration is useless because tones are missing and it makes mistakes all the time (try kru, becomes khnu… uhm what).
Do not use IPA, that needs a major study itself to understand all its ideas. Nice for general language academics, not for learning a language.
“Royal Thai Precise System of Transcription” is not used anywhere as far as I know, neither does it have any online tools or presence and the non-precise one is completely useless too (just look at the name of the int. airport and how it is actually pronounced). Thai2english API would be easiest if they allow it. Btw. Reading the Thai letters is doable. Getting the right tone without knowing the word already… complex algorithm. Ask any Thai, probably they can’t even tell you the tone, they just know the sound. Wish there was some offline open source transliteration alternative algorithm. Anyone knows?

Details about different transliteration systems: Pronunciation Guide Systems for Thai - Thai Language - slice-of-thai.com

Tone marker would be nice but not required since you’d have to know by just reading Thai, but there are far too many transliterations that are misleading that should be easy fix??. like arai which means what, but it is read as anai? Even Royal Thai reads it as arai

มี or ที่ for example, it shows mi, thi, when it should either read as mii, mee or thii, thee to emphasize long vowels. I think there are no indications of long vowels from I’ve seen.

จริงๆ Should read Jing, because of จริ, but its read as ‘ChNing’ บท should read as with Bot/Bhot and LR gives me Btb.

Also, there are also false consonants and certain rules that are ignored?

Also, the HTML export doesn’t separate words like in the subtitles.