Small QoL recommendations

Thomas_B · July 1, 2020, 2:57pm

Korean has so many endings, and blurs the line between “grammatical ending” and “compound word.” Could be a nightmare to process. One popular Korean learning resource, “Korean Grammar in Use” is a three book series all about grammatical endings.

So, after thinking about it a hot minute, probably way less than your team, I can imagine a couple approaches:

The “simplify everything” approach: just take the roots and ignore all the endings.

So you’d take this subtitle:

나는 네가 먹고 있는 것을 알았어 (“I knew you were eating”)

and just process it like this:

나 - 너 - 먹다 - 있다 - 것 - 알다

Meaning, if I’ve marked 먹다 (“to eat”) as highlighted, it will mark this form, 먹고, as well. Maybe shows the root on hover so I understand what’s going on as a user.

Pros:

simplifies processing
this is how most people consciously parse sentences anyway, with grammar processing mostly happening instinctively (MIA and AJATT are big proponents of this philosophy)

Cons:

Controversial, a lot of people like treating grammar as a first class citizen in language learning
False positives: The system could insist a learner knows words that look very strange, that have have changed dramatically by the addition of many stacked particles
Inconsistent with how you probably handle other languages, and for scalability you probably don’t want too many special cases.

The “completionist” approach: let people highlight grammatical principles too, treat those as separate words.

So if the system encounters 먹고 싶어요 (“I want to eat”)

It breaks it down to 먹다 (“to eat”) + -고 ("-to X") + 싶다 (“to want”) + -아/어요 (“present tense, polite style”).

IFF I’ve tagged both 먹다 AND -고, then the system would highlight “먹고” fully.

If I’ve tagged 먹다 BUT NOT -고, it will only highlight half the word.

If I’ve tagged -고 only, then it will highlight that part only.

In general it will treat roots and endings as separate words.

Pros:

Accuracy
Going back and looking at the subtitles, it seems like your system is already trying to break down words into their components, so you may have a head start on this.

Cons:

Complexity. The engine for breaking words down will need a lot of tuning, including maybe just hard coding a lot of corner cases.*

There might be a third way? You guys have moved crazy fast tackling a lot of languages, when a developer could lose years figuring out on any one of these, so I’m impressed and sure you’ll find a good path.

Good luck, thanks for building this, definitely worth the subscription. If you need any help testing anything let me know.

Thomas

Here’s a tough case:

물어보다 is the commonly used word for “to ask.” It’s really 묻다 (“to ask”) plus the -아/어보다 ending, which means “to try to do.”

Nobody says 몯다 (“ask”) because it sounds way too much like 물다 (“to bite”) when conjugated, and you really don’t want people to misunderstand when you say you want to ask them something.

So… do you tag 물어보다 as 몯다 + some ending? I think you just hard code that one as one word, because that’s how people think of it, and how it’s listed in common vocab lists.

좋아하다 (“to like”) is another really common one of these. Technically 좋다 plus an ending, but nobody seems to think of it like that.

Topic		Replies	Views
Feature TODO List and roadmap (continuously updated) News from the Team	59	86618	March 2, 2022
Suggestions for Speed, Responsiveness, Efficiency Request	0	176	November 3, 2023
Dimming/hiding the subs In English	21	6801	June 30, 2020
Request for quality of life features / shortcuts for MORE advanced learners. In English	0	474	November 30, 2020
Love the app! Could you add a feature for word for word literal translation? In English	8	3612	December 26, 2024

Small QoL recommendations

Related topics