Source of the Corpus used

bharatingermany · October 18, 2023, 2:32pm

Dear fellow learners,

I would like to know how was the frequency list made? Generally all frequency list / dictionaries have the data on which the list is made mentioned. I saw David mention somewhere that it is based on NLP. I do not understand it.

If anybody can give me some more context, that would be lovely!

Thank you for your time!
Bharat

joan_LanguageProcess · October 20, 2023, 12:43pm

Hi @bharatingermany,

Here is what I can tell you from the deep dive I’ve done in the forum in the past:

I know that they pull frequency lists from opensubtitles[.org]:

And after Google it “NLP” refers to a type of programing language.

I hope some of this is helpful.

Topic		Replies	Views
frequency dictionary created by "Language Reactor" Ask the community	2	1118	October 17, 2023
What word list are you using for Spanish?	13	3995	August 2, 2020
How Does Language Reactor rank words by frequency? Ask the community	0	610	May 8, 2023
How do you know the frequency number of a word? Ask the community	0	382	May 10, 2023
Accessing most common words list outside of LLN	0	471	December 8, 2020

Source of the Corpus used

Related topics