Small QoL recommendations

Thomas_B · September 22, 2020, 5:17pm

No excuse necessary, Korean is hard!

I. Verbs / Adjectives

Fair warning… this answer starts simple, but then gets much messier.

The simple:

For verbs, the “dictionary” form ends in “다”
To get the “true” root, you just remove that “다” – In other words, for any conjugation, the “다” is removed and a new ending is applied to the remaining stem.

Example:

먹다 (to eat) + 었어 (informal past) = 먹었어 (i ate)
mokda + osso = mokosso

Easy!

But…

If the remaining verb stem ends in a vowel, the vowel often shifts or merges with the endings.

가다 (to go) + 었어 (informal past) = 갔어 (i went)
Explanation: the ㅓ flips to ㅏ in this case
Bad transliteration: kada + osso = kasso

And sometimes consonants shift too.

쉽다 (to be easy) + 었어 = 쉬웠어 (it was easy)
Explanation: the 다 is removed, then the final ㅂ is moved to the next character and replaced with ㅜ, then ㅜ merges with ㅓ…
Bad transliteration: suipda + osso = suiuosso
(i don’t know the official transliterations, but it’d be something like this, just to give you the feel for it)

So, there are 20ish rules like this. Once you code all these rules and exceptions you’re well on your way to learning Korean.

II. You

네가 (you) and 내가 (i) are irregulars unrelated to these verbs. There’s not a rule or other words that do this, just when you add ‘가’ (a subject marker) to these two: 너 (you) or 나 (i), the vowel shifts for no apparent reason.

So, back to verbs/adjectives…

You can just subtract the 다 for an 80% solution. (60%?)
You can code an entire morphological analyzer for Korean… which seems like a massive resource burn…

or -

Pick an analyzer/tagger from KoNLPy and learn its API. https://konlpy.org/ko/v0.4.1/morph/

As I’ve thought about this for my own projects, I feel like (3) is the textbook option. They’re pretty big research projects, and still not perfect, so they made me much more humble about my ability to build something like that from scratch. The APIs took a few hours to get the hang of, but they were pretty intuitive. They all pull from Java though, so since it’s not pure Python the setup isn’t trivial. Not sure how it would interface with your code, might mean a messy refactor on your end.

Also not sure about speed. Are you preprocessing all the subtitles or doing it live?

Topic		Replies	Views
Feature TODO List and roadmap (continuously updated) News from the Team	59	86018	March 2, 2022
Suggestions for Speed, Responsiveness, Efficiency Request	0	176	November 3, 2023
Dimming/hiding the subs In English	21	6795	June 30, 2020
Request for quality of life features / shortcuts for MORE advanced learners. In English	0	474	November 30, 2020
Love the app! Could you add a feature for word for word literal translation? In English	8	3604	December 26, 2024

Small QoL recommendations

Related topics