The tmcn package
2025-10-23
Let’s explore tmcn!
The tmcn package is a text mining toolkit for the Chinese language.
| Chinese Words | Pinyin | English Translation |
|---|---|---|
| 爸爸 | baba | father |
| 妈妈 | mama | mother |
| 圣奥拉夫 | shengaolafu | Saint Olaf |
| 老师 | laoshi | teacher |
| 学生 | xuesheng | student |
| 电脑 | diannao | computer |
| 大学 | daxue | university |
| 秋季 | qiuji | fall (season) |
[1] "爸爸" "媽媽" "聖奧拉夫" "老師" "學生" "電腦" "大學"
[8] "秋季"
| Simplified | Traditional | English Translation |
|---|---|---|
| 妈妈 | 媽媽 | mother |
| 电脑 | 電腦 | computer |
# A tibble: 5 × 5
value py0 py Radical Stroke_Num_Radical
<chr> <chr> <chr> <chr> <int>
1 爸 ba bà 父 4
2 妈 ma mā 女 3
3 圣 sheng shènɡ 土 3
4 奥 ao ào yù 大 3
5 拉 la lá là lǎ lā 扌 3


journey_west_token <- journey_west |>
unnest_tokens(word, "第五回 亂蟠桃大聖偷丹 反天宮諸神捉怪", token = "words")
journey_dfs <- journey_west_token |>
anti_join(STOPWORDS) |>
mutate(simplified = toTrad(word, rev = T)) |>
count(simplified, sort = T) |>
slice_head(n = 50) |>
data.frame()
wordcloud2(
journey_dfs,
size = 1.2,
shape = 'cardioid',
minSize = 15
)