Based on community detection to automatically classify the keywords, can utilize different algorithms for clustering. In this vignette, a benchmark is provided to show the difference for various algorithms on multiple sizes of networks.
First, we’ll load the needed packages.
library(akc)
library(dplyr)
Then, we prepare the needed data. The built-in data table biblio_data_table
would be used here.
%>%
bibli_data_table keyword_clean() %>%
keyword_merge() -> clean_data
Next, a combination of network size and community detection algorithms are designed to be tested:
100:300 -> topn_sample
ls("package:akc") %>%
str_extract("^group.+") %>%
na.omit() %>%
setdiff(c("group_biconnected_component",
"group_components",
"group_optimal")) -> com_detect_fun_list
Finally, we’ll implement the computation and record the results.
= tibble()
all for(i in com_detect_fun_list){
for(j in topn_sample){
system.time({
%>%
clean_data keyword_group(top = j,com_detect_fun = get(i)) %>%
-> grouped_network_table
as_tibble %>% na.omit-> time_info
}) %>% nrow -> node_no
grouped_network_table %>% distinct(group) %>% nrow -> group_no
grouped_network_table %>%
grouped_network_table count(group) %>%
summarise(mean(n)) %>%
1]] -> group_avg_node_no
.[[%>%
grouped_network_table count(group) %>%
summarise(sd(n)) %>%
1]] -> group_sd_node_no
.[[c(com_detect_fun = i,
topn = j,
node_no = node_no,group_no = group_no,
avg = group_avg_node_no,
sd = group_sd_node_no,time_info[1:3]) %>%
bind_rows(all,.) -> all
}
}
= all %>%
res mutate_at(2:9,function(x) as.numeric(x) %>% round(2)) %>%
distinct(com_detect_fun,node_no,.keep_all = T) %>%
select(-topn,-contains("self")) %>%
setNames(c("com_detect_fun","No. of total nodes","No. of total groups",
"Average node number in each group","Standard deviation of node number",
"Computer running time for keyword_group function"))
The results are displayed in the following table.
::kable(res) knitr
com_detect_fun | No. of total nodes | No. of total groups | Average node number in each group | Standard deviation of node number | Computer running time for keyword_group function |
---|---|---|---|---|---|
group_edge_betweenness | 103 | 36 | 2.86 | 9.17 | 0.50 |
group_edge_betweenness | 207 | 68 | 3.04 | 12.53 | 2.98 |
group_edge_betweenness | 326 | 89 | 3.66 | 13.12 | 10.03 |
group_fast_greedy | 103 | 5 | 20.60 | 8.17 | 0.17 |
group_fast_greedy | 207 | 5 | 41.40 | 24.36 | 0.18 |
group_fast_greedy | 326 | 6 | 54.33 | 34.77 | 0.19 |
group_infomap | 103 | 1 | 103.00 | NA | 0.17 |
group_infomap | 207 | 4 | 51.75 | 94.83 | 0.22 |
group_infomap | 326 | 6 | 54.33 | 114.98 | 0.34 |
group_label_prop | 103 | 1 | 103.00 | NA | 0.16 |
group_label_prop | 207 | 1 | 207.00 | NA | 0.17 |
group_label_prop | 326 | 1 | 326.00 | NA | 0.18 |
group_leading_eigen | 103 | 4 | 25.75 | 9.57 | 0.17 |
group_leading_eigen | 207 | 5 | 41.40 | 19.19 | 0.18 |
group_leading_eigen | 326 | 7 | 46.57 | 35.15 | 0.22 |
group_louvain | 103 | 5 | 20.60 | 12.14 | 0.16 |
group_louvain | 207 | 8 | 25.88 | 14.11 | 0.17 |
group_louvain | 326 | 9 | 36.22 | 19.08 | 0.18 |
group_spinglass | 103 | 5 | 20.60 | 5.13 | 1.66 |
group_spinglass | 207 | 8 | 25.88 | 13.38 | 4.04 |
group_spinglass | 326 | 8 | 40.75 | 12.07 | 7.30 |
group_walktrap | 103 | 103 | 1.00 | 0.00 | 0.16 |
group_walktrap | 207 | 207 | 1.00 | 0.00 | 0.17 |
group_walktrap | 326 | 326 | 1.00 | 0.00 | 0.17 |
The session information is displayed as below:
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=Chinese (Simplified)_China.936
#> [3] LC_MONETARY=Chinese (Simplified)_China.936
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.936
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.2
#> [5] evaluate_0.15 highr_0.9 stringi_1.7.6 rlang_1.0.2
#> [9] cli_3.2.0 rstudioapi_0.13 jquerylib_0.1.4 bslib_0.3.1
#> [13] rmarkdown_2.13 tools_4.1.3 stringr_1.4.0 xfun_0.30
#> [17] yaml_2.3.5 fastmap_1.1.0 compiler_4.1.3 htmltools_0.5.2
#> [21] knitr_1.37 sass_0.4.0