Digital Humanities-Driven Core Author Discovery and Topic Mining Research (Postprint)
Wu Shuai, Yang Xiuzhang, Ren Tianshu, Liu Jianyi
Submitted 2025-08-14 | ChinaXiv: chinaxiv-202508.00216

Abstract

[Purpose/Significance] Under the impetus of digital humanities, this study conducts statistical analysis from a macro-level perspective on the achievements of the Journal of Redology over the past forty years from a researcher's viewpoint, while integrating data mining techniques from a micro-level perspective to analyze potential research areas in Redology, thereby better promoting the development of Redology research.

[Method/Process] This study primarily employs bibliometric statistics and topic mining methods for analysis. First, core research authors in Redology over the forty-year period are identified through bibliometric statistics, followed by topic mining of publications in the Redology field. The research hotspots of Redology over the past forty years are investigated from two aspects: core author discovery and topic evolution analysis.

[Results/Conclusion] Redology research can be broadly divided into four domains: character relationship research, social system research, current academic discussions, and version speculation of "Dream of the Red Chamber".

Full Text

Preamble

Journal of Literature and Data Science, Wu Shuai, Yang Xiuzhang, Ren Tianshu, et al. Research on Core Author Discovery and Topic Mining Driven by Digital Humanities[J]. Journal of Literature and Data Science, 2023, 5(1): 053-064.

Vol. 5 No. 1 March 2023

Research on Core Author Discovery and Topic Mining Driven by Digital Humanities

Wu Shuai¹,², Yang Xiuzhang², Ren Tianshu², Liu Jianyi²

(1. College of Information Management, Nanjing Agricultural University, Nanjing 210003, China;
2. School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China)

Abstract: [Purpose/Significance] Driven by digital humanities, this study conducts statistical analysis of the 40-year achievements of Studies on "A Dream of Red Mansions" from a researcher's perspective at the macro level, and integrates data mining techniques to analyze potential research fields in Redology at the micro level, thereby better promoting the development of Redology research. [Method/Process] The study primarily employs bibliometric statistics and topic mining methods. First, we quantitatively identify core authors in Redology research over the past four decades, then perform topic mining on Redology achievements. Research hotspots in 40 years of Redology are examined through both core author discovery and topic evolution analysis. [Result/Conclusion] Redology research can be broadly divided into four domains: character relationship studies, social system studies, contemporary academic discussions, and version speculation of A Dream of Red Mansions.

Keywords: Studies on "A Dream of Red Mansions"; Topic evolution; Digital humanities; Topic mining; Core author

A Dream of Red Mansions, as the foremost of China's Four Great Classical Novels, represents the pinnacle of ancient Chinese chapter-novel literature and has been adapted into television dramas nine times. It serves as a crucial foundation for studying classical Chinese literature[1]. With the development of artistic culture, Redology research and literary criticism have proliferated in recent years, with scholars analyzing the work from diverse perspectives, creating a vibrant academic landscape characterized by "a hundred flowers blooming and a hundred schools of thought contending"[2].

This research is supported by the Guizhou Provincial Science and Technology Program Project "Research on the Rescue and Collation of Shui Ethnic Literature and Endangered Shui Script Based on Big Data and Image Recognition" (Project No.: Qiankehe Basic [2020]1Y279) and the Guizhou University of Finance and Economics Project "Research on Shui Knowledge Graph Construction Empowered by AI Big Data and Statistical Recognition of Endangered Shui Script" (Project No.: 2021KYQN03).

[About the Authors] Wu Shuai (ORCID: 0000-0002-1162-4308), male, teaching assistant, Ph.D. candidate, research interests: bibliometrics and information resource management, Email: 472191973@qq.com; Yang Xiuzhang (ORCID: 0000-0001-9648-9506), male, teaching assistant, Ph.D. candidate, research interests: digital humanities and topic mining, Email: 1455136241@qq.com (corresponding author); Ren Tianshu (ORCID: 0000-0001-5930-3653), female, master's student, research interests: information resource management and bibliometrics, Email: 413076113@qq.com; Liu Jianyi (ORCID: 0000-0003-1693-6631), male, master's student, research interests: bibliometrics and topic mining, Email: 1105053117@qq.com.

Journal papers indexed in the Chinese Social Sciences Citation Index (CSSCI) play a guiding role in their respective disciplines, representing academic achievements with strong scholarly value, novel research, and high creativity. Studies on "A Dream of Red Mansions", as the only Redology journal included in the CSSCI database, has published numerous papers over its 40-year history, serving as the primary communication platform for Redology research[3]. Since its inception, the journal has maintained a rigorous academic stance, balancing scholarly excellence with accessibility, achieving both high academic standards and considerable readability, thus earning favor among Redology researchers and literature enthusiasts alike. As an important carrier for exchange and dissemination of Redology research, Studies on "A Dream of Red Mansions" has effectively promoted the development of Redology. With internet technology applied to journal publishing, research achievements have emerged continuously, with scholars expressing diverse viewpoints and employing various methods to study A Dream of Red Mansions, gradually transforming Redology into a highly integrated academic field.

Redology researchers often hold different interpretations of A Dream of Red Mansions, making it difficult to precisely reflect contemporary research themes. To address this limitation, this study examines 5,582 journal papers from Studies on "A Dream of Red Mansions" indexed in CNKI between April 2, 1979, and April 2, 2019, employing bibliometric statistics and topic mining methods. First, we quantitatively identify core authors in 40 years of Redology research, then conduct topic mining on Redology achievements, exploring research hotspots from both core author discovery and topic evolution perspectives.

1 Related Research

Studies on "A Dream of Red Mansions" serves as the primary academic journal for A Dream of Red Mansions research, publishing content related to ideological studies, artistic value, historical materials, Redology research, author history, and cultural relics verification. It enjoys high academic reputation in both Redology and classical literature research fields, representing the professional level and academic quality of A Dream of Red Mansions research and providing important reference value.

1.1 Traditional Literature Research Status

In the big data era, data mining technology has developed rapidly, yielding numerous academic research achievements. However, relatively few studies domestically and internationally have employed data mining[4] and machine learning[5] algorithms to deeply mine journal literature. Pei Jie[6] used traditional bibliometric methods to explore typical characteristics and development patterns in Japanese translation studies of A Dream of Red Mansions. Zhang Qingshan et al.[7] employed traditional bibliometric methods to clarify the disciplinary nature, scope, and framework of Redology. Gao Huaisheng[8] used traditional bibliometric methods to review the development of Redology research, finding that each stage of Redology's development has relied on literature. Sun Weike et al.[9] systematically reviewed 2017 Redology research achievements, discovering that Redology research emphasizes the integration of multiple methods, effectively promoting field integration and interdisciplinary cross-fertilization.

Traditional literature research primarily focuses on original text reading, expert lectures, core literature reading, and forum participation, with core literature reading being the most common approach. This typically involves keyword searching and download volume filtering, methods that are relatively singular and insufficient for intuitively presenting deeper themes in Redology research. There is a lack of data mining methods applied to studying hot topics and temporal development in Studies on "A Dream of Red Mansions".

1.2 Topic Mining Literature Research Status

With big data technology development, increasing numbers of scholars have recognized the importance of data's potential value, dedicating themselves to combining data mining or machine learning methods to derive valuable conclusions from massive literature data. Shen Lin[10] systematically established the typology and combination patterns of furniture in A Dream of Red Mansions through text content summarization. Wu Di et al.[11] deeply excavated literature materials related to A Dream of Red Mansions collected in Xiangyan Congshu to further understand the dissemination of Redology research. Chen Xiao[12] examined the image world of A Dream of Red Mansions in the Qing Dynasty, exploring connections between text and images to broaden the scope of Redology data mining. Cai Yongming et al.[13] proposed a CA-LDA model for Chinese short text topic analysis, increasing the probability of grouping words with identical collocation relationships into the same topic and providing new research methods for short text literature data. Wu Shuai et al.[14] explored library and information science development using data measurement and social network analysis. Yang Xiuzhang et al.[15-16] examined the development of Qingshui River basin literature using bibliometric and social network analysis methods, and discovered core authors using composite indices and knowledge graphs.

Employing data mining methods to deeply excavate literature data can reveal potential value. This study uses bibliometric statistics and topic mining methods to explore 40 years of Studies on "A Dream of Red Mansions" journal papers, which to some extent reflects the professional level and academic quality of A Dream of Red Mansions research.

2 Research Framework

This study aims to analyze 5,582 papers published in Studies on "A Dream of Red Mansions" over 40 years, mining high-citation papers, core authors, major research institutions, and core topics. The specific analysis process consists of four steps, as shown in Figure 1 [FIGURE:1].

(1) Using the Selenium module in Python to custom-crawl Studies on "A Dream of Red Mansions" journal papers indexed in CNKI, saving them as CSV files.
(2) Preprocessing the relevant paper data, including data cleaning, relationship extraction, and outlier handling, then saving as CSV files.
(3) Identifying core authors based on citation counts and publication volume, including pre-selecting core author candidates using Price's Law and selecting core authors using composite indices.
(4) Conducting topic mining on 40 years of Studies on "A Dream of Red Mansions" papers, including temporal topic evolution analysis, co-word network analysis, and social network analysis.

2.1 Data Collection

This study aims to deeply mine core authors and core topics of Studies on "A Dream of Red Mansions" journal papers indexed in CNKI. Using the Selenium module in Python, we crawled papers from the journal between April 2, 1979, and April 2, 2019. The crawled fields include: paper title, author, publication date, citation count, download count, keywords, and abstract.

2.2 Data Cleaning

Some data in CNKI-indexed Studies on "A Dream of Red Mansions" papers is incomplete, requiring preprocessing to standardize data formats. Our preprocessing includes data cleaning, outlier detection and handling, and related numerical processing.

3 Core Author Group Discovery

While big data provides massive diversified information, it also brings information overload challenges, particularly acute in academic research. As online submission replaces traditional methods, academic outputs grow rapidly, making precise identification of core authors increasingly difficult. Core authors constitute the solid foundation of disciplinary research[17], determining research direction and academic output quality. Traditional identification methods rely solely on publication volume while ignoring paper quality. Therefore, this study employs a method based on Price's Law and composite indices to identify core authors in Studies on "A Dream of Red Mansions". We first identify core author candidates using Price's Law, then select core authors using composite indices based on publication volume and citation counts.

3.1 Price's Law Analysis

We combine first authors' publication volume and citation counts from Studies on "A Dream of Red Mansions" to screen core author candidates using Price's Law. The specific procedures are as follows:

(1) Determine minimum citation count: The most-cited paper in Studies on "A Dream of Red Mansions" is Wang Jinbo's 2010 publication "The Overlooked First Complete English Translation of A Dream of Red Mansions—An Introduction to Father Bonsall's English Translation," cited 71 times, denoted as Nc_max. Using Price's minimum citation formula (1), we calculate the minimum citation count, denoted as M_c. Authors with cumulative citations reaching 7 or more qualify as core author candidates.

(2) Determine minimum publication count: The most prolific author in Studies on "A Dream of Red Mansions" is Feng Qiyong, with 124 publications, denoted as Np_max. Using Price's minimum publication formula (2), we calculate the minimum publication count, denoted as Mp. Authors with 9 [TABLE:9] or more publications qualify as core author candidates.

(3) Core author candidate confirmation: We screen authors from Studies on "A Dream of Red Mansions" meeting formula (1) or formula (2), perform deduplication, and ultimately select 363 qualified core author candidates who published 2,957 papers (52.97% of total indexed papers) with 10,324 citations (66.41% of total citations).

3.2 Composite Index Selection

From the 363 core author candidates identified by Price's Law, we set a composite index threshold of 2 to select 35 core authors from Studies on "A Dream of Red Mansions". The specific steps are:

(1) Determine average publication volume: The total publications of 363 core author candidates from Studies on "A Dream of Red Mansions" is denoted as X_total; the total number of candidates is denoted as n. Using average publication formula (3), we calculate the average publication volume, denoted as x.

(2) Determine average citation count: The total citations of papers by the 363 core author candidates is denoted as Y_total; the number of candidates is denoted as n. Using average citation formula (4), we calculate the average citation count, denoted as y.

(3) Composite index selection: Using the average publication volume x and average citation count y of core author candidates, we calculate each candidate's composite index score using formula (5), denoted as score_i. In the calculation, x_i represents the total publications of candidate i, and y_i represents their total citations. With a composite index threshold of 2, we select 35 core authors from Studies on "A Dream of Red Mansions".

Table 1 [TABLE:1] presents the 35 core authors selected through composite index analysis. The top author is Feng Qiyong with 124 publications, average citations per paper of 1.89, composite index of 11.72, and most-cited work "Interpreting A Dream of Red Mansions" with 14 citations. The second is Hong Tao with 25 publications, average citations of 12.92, composite index of 7.21, and most-cited work "A Dream of Red Mansions Translation and East-West Culture/Language" with 58 citations. The third is Hu Wenbin with 50 publications, average citations of 4.18, composite index of 6.74, and most-cited work "A Dream of Red Mansions and Chinese Name Culture" with 60 citations.

Analysis of average citations per paper reveals eight core authors with 7 or more citations. Hong Tao ranks first (12.92 citations), followed by Wang Jinbo (10.42), Liu Yongliang (9.33), Rao Daoqing (7.88), Duan Jiangli (7.63), Chen Weizhao (7.59), Yu Xiaohong (7.36), and Mei Xinlin (7.00).

4 Literature Topic Mining

Keywords represent the core content of journal papers, roughly reflecting research themes, methods, and hot topics. Conducting topic mining on keywords from Studies on "A Dream of Red Mansions" papers can clarify major research directions, methods, and hot topics in the field. Our topic mining includes temporal topic evolution analysis, co-word network analysis, and social network analysis.

4.1 Temporal Topic Evolution Analysis

CiteSpace temporal sequence topic evolution analysis examines topic development along a temporal axis. Based on 5,582 papers from Studies on "A Dream of Red Mansions" over 40 years, we generated the temporal topic evolution analysis shown in Figure 2 [FIGURE:2]. Each node represents a topic, and connections between nodes indicate co-occurrence relationships. The timeline spans 1979 to 2019.

By examining the temporal distribution of word frequency and combining keywords with high change rates, we identify emerging themes and development trends across periods. Core themes include "Liu Xinwu," "Daiyu," "Yihong公子," "version," "author," "tombstone," and "social scientist." Overall, Chinese Redology research over four decades has evolved from point to line to plane, involving not only deep literature mining but also archaeological research and film/television production, enabling more objective and accurate restoration of the novel's themes, historical context, and authorial tendencies, providing theoretical foundations for further Redology development.

4.2 Co-word Network Analysis

Using Python, we constructed a keyword co-occurrence matrix for 5,582 papers from Studies on "A Dream of Red Mansions". When two keywords appear in the same paper, they are considered co-occurrent, building a relationship edge with weight +1; otherwise, no relationship exists and weight is 0, as shown in formula (6). In co-occurrence matrix analysis, co-occurrence frequency indicates keyword closeness and thematic relevance; zero co-occurrence indicates no relationship.

To better reflect core Redology research content, we removed these keywords: "Hongloumeng" (our research theme), "Cao Xueqin" (the author), "Hongloumeng Xuekan" (our target journal), and "zhanghui novel" (the novel's form). Based on keyword co-occurrence analysis, we derived Table 2 [TABLE:2] showing high-frequency co-occurring terms.

The top 20 co-occurrences are: "Baoyu" and "Daiyu" (582 times), "Jia Baoyu" and "Yihong" (369 times), "Daiyu" and "Baochai" (184 times), "Baoyu" and "Baochai" (178 times), "Baoyu" and "Sister Feng" (121 times), "Baoyu" and "Grandmother Jia" (102 times), "Daiyu" and "Sister Feng" (88 times), "Baoyu" and "Jia Mansion" (83 times), "Baoyu" and "Qingwen" (83 times), "Daiyu" and "Grandmother Jia" (77 times), "Jiaxu manuscript" and "Gengchen manuscript" (76 times), "Grand View Garden" and "Baoyu" (70 times), "Grandmother Jia" and "Sister Feng" (66 times), "manuscript" and "version" (65 times), "Jia Mansion" and "Daiyu" (65 times), "Baoyu" and "Jia Zheng" (62 times), "Baoyu" and "Gengchen manuscript" (55 times), "Mr." and "Redology" (53 times), "Baoyu" and "Lady Wang" (52 times), and "Grandmother Jia" and "Lady Wang" (51 times).

The high-frequency co-occurrences reveal two main research directions: character relationship studies and A Dream of Red Mansions version studies. Character relationships center on "Baoyu" and "Daiyu," with other characters associated with them. Version studies focus primarily on the Jiaxu and Gengchen manuscripts.

4.3 Social Network Analysis

Social network algorithms are near-clustering algorithms that can identify strong and weak relationship networks, visually representing relationships through knowledge graphs. Nodes represent relationship points, edges represent connections, and the algorithm clusters closely-related nodes in similar regions while dispersing sparsely-connected nodes peripherally, enabling intuitive identification of core relationship points.

Due to numerous scattered nodes with weight coefficients of 1, 2, and 3 affecting overall network effectiveness, we applied Price's Law for node selection using formula (7). In formula (7), M_f represents the minimum frequency for high-frequency co-occurring terms, and N_fmax represents the maximum frequency from Price's Law statistics. According to Price's Law, high-frequency term frequency must be ≥19, so we set a co-occurrence threshold of 19 for social network analysis, generating the thematic keyword co-occurrence knowledge graph shown in Figure 3 [FIGURE:3]. This graph contains 54 core nodes generating 108 relationship edges, with a modularity coefficient of 0.549, indicating effective modularization.

The keyword relationship graph reveals four main modules in Studies on "A Dream of Red Mansions" research: character relationship studies, social system studies, contemporary academic discussions, and version speculation of A Dream of Red Mansions. Character relationship studies focus on "Baoyu" and "Daiyu," with other characters associated with them (Granny Liu shows relatively isolated relationships). Social system studies center on "feudal society," reflecting the novel's creative historical context. Contemporary academic discussions feature Mr. Feng Qiyong as a Redology representative, who published 124 papers in the journal, with his representative work "Interpreting A Dream of Red Mansions." Version speculation focuses primarily on the Zhiyanzhai Re-Commented Story of the Stone (also known as Story of the Stone) from the collection of Xu Ye (also known as Songge), a top scholar and Grand Secretary in the late Qing Dynasty, alongside some scholars' focus on Jiaxu, Gengchen, Chengjia, and Chengyi manuscripts.

With artistic and cultural development, Redology research and literary criticism have proliferated in recent years, with scholars analyzing the work from diverse perspectives, creating a vibrant academic landscape. Studies on "A Dream of Red Mansions", as the primary academic journal for Redology, has published numerous papers over 40 years as the main communication carrier. However, when Redology researchers conduct in-depth studies, they often rely on content analysis and personal interpretation, yielding relatively singular results. It is necessary to fully consider the essence of A Dream of Red Mansions and core scholars' cognitive understanding of the original work.

This study employs bibliometric statistics and topic mining methods, first quantifying core authors in 40 years of Redology research, then conducting topic mining on Redology achievements to examine research hotspots through core author discovery and topic evolution analysis. Results indicate that 40 years of Redology research focuses on character relationships, social systems, contemporary academic discussions, and version speculation. These findings primarily identify core research themes and developmental trajectories for Studies on "A Dream of Red Mansions", providing editorial standards and thematic suggestions for the journal's editorial board, and offering better research direction guidance for Redology scholars.

References

[1] Wen Qingxin. As a literary phenomenon: The productive critical reception of modern "'A Dream of Red Mansions'-ization"[J]. Chinese Literature Research, 2021(2): 155-162.
[2] Zhao Jianzhong. Reflections on the "author" of A Dream of Red Mansions and "Cao Studies" on the centennial of "New Redology"[J]. Studies on Ming-Qing Fiction, 2021(1): 4-24.
[3] Wang Hui. Laying foundations and upholding integrity—Review of the "40th Anniversary Symposium of the Institute of Redology Studies and the Founding of Studies on 'A Dream of Red Mansions'"[J]. Studies on "A Dream of Red Mansions", 2020(1): 8-16.
[4] Yang Xiuzhang, Wu Shuai, Xia Huan, et al. Research on China's film industry in 2019 from a big data perspective[J]. Film Literature, 2020(23).
[5] Wu Shuai. Identification of frontier topics in scientific research[D]. Guiyang: Guizhou University of Finance and Economics, 2021.
[6] Pei Jie. Dream crossing to Japan[D]. Shanghai: Shanghai International Studies University, 2020.
[7] Zhang Qingshan, Qiao Fujin, Miao Huaiming, et al. Discussion on Redology philology[J]. Journal of China University of Mining and Technology (Social Sciences Edition), 2016, 18(5): 89-96.
[8] Gao Huaisheng. Academic review of the "High-end Forum on A Dream of Red Mansions Philology: Historical Review and Future Prospects"[J]. Journal of Henan Institute of Education (Philosophy and Social Sciences Edition), 2016, 35(3): 3-11.
[9] Sun Weike, He Weiguo, Hu Qing, et al. 2017 annual research report on Chinese Redology development[C]//. 2017 Annual Report on Chinese Art Development Research, 2018: 363-385.
[10] Shen Lin. Research on furniture categories in the Cheng Jia edition of A Dream of Red Mansions[D]. Changsha: Central South University of Forestry and Technology, 2021.
[11] Wu Di, Wu Jiaru. Textual research on A Dream of Red Mansions materials collected in Xiangyan Congshu[J]. Studies on "A Dream of Red Mansions", 2017(6): 175-189.
[12] Chen Xiao. The image world of A Dream of Red Mansions in the Qing Dynasty[D]. Hangzhou: China Academy of Art, 2012.
[13] Cai Yongming, Chang Qing. Chinese short text topic analysis based on co-word network LDA model[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(3): 305-317.
[14] Wu Shuai, Ren Tianshu, Liu Jianyi, et al. Exploration of library and information science development based on data measurement and social network analysis[J]. Information Research, 2022(1): 28-40.
[15] Yang Xiuzhang, Wu Shuai, Xia Huan, et al. Exploration of Qingshui River basin culture based on bibliometrics and social network analysis[J]. Modern Computer, 2019(35): 19-26, 37.
[16] Yang Xiuzhang. Research on bibliometric analysis and knowledge graph of Shui ethnic literature[J]. Modern Computer (Professional Edition), 2019(1): 25-32.
[17] Yang Xiuzhang, Xia Huan, Yu Xiaomin, et al. Analysis of core author groups in Shui ethnic literature based on composite index and knowledge graph[J]. Computer Era, 2019(4): 13-17.

Research on Core Author Discovery and Topic Mining Driven by Digital Humanities

Wu Shuai¹,², Yang Xiuzhang², Ren Tianshu², Liu Jianyi²

(1. College of Information Management, Nanjing Agricultural University, Nanjing 210003, China;
2. School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China)

Abstract: [Purpose/significance] Driven by digital humanities, the achievement of the 40th anniversary of Studies on "A Dream of Red Mansions" were statistically analyzed from the perspective of researchers from the macro level, and the potential research fields of Redology were analyzed from the micro level by integrating data mining technology, so as to better promote the Redology research. [Method/process] The methods of measurement statistics and topic mining were used in turn for analysis. First, the core authors of Redology research over the past four decades were counted. Then, the topics mining of the achievements in the field of Redology were carried out. The research hotspots of Redology research over the past four decades from the two aspects of core author discovery and topic evolution. [Result/conclusion] The research on Redology can be divided into four areas as a whole, which are the study of character relationships, the study of social systems, the current academic discussion and the guessing of the version of A Dream of Red Mansions.

Keywords: Studies on "A Dream of Red Mansions"; Topic evolution; Digital humanities; Topic mining; Core author

Submission history

Digital Humanities-Driven Core Author Discovery and Topic Mining Research (Postprint)