Abstract
[Purpose] This paper introduces the construction of a short video data analysis and visualization platform for mainstream media at Hebei Daily Newspaper Group, aiming to explore and analyze the dissemination mechanisms and evaluation systems of short videos within mainstream media, thereby promoting the digital transformation of traditional media. Through research on the dissemination patterns of short videos, it provides media organizations with a basis for communication strategies and content optimization. [Method] Employing technical methods such as data analysis, data mining, and data visualization, combined with the practical requirements of the platform, this study analyzes the dissemination effectiveness and influence of short video content through the collection and processing of massive short video datasets, exploring the patterns and trends of short video dissemination. [Results/Conclusion] Successfully developed a data analysis system integrating short video monitoring, dissemination power prediction, and daily ranking display, capable of comprehensively monitoring trending short videos and high-influence accounts. Through data modeling and algorithmic analysis, a scientific evaluation standard for short videos was constructed, endowed with certain predictive capabilities for dissemination power. The daily ranking provides creators with precise feedback, significantly improving management and operational efficiency. This system offers mainstream media more intelligent and precise support for short video content, driving innovation and development in the new media era.
Full Text
Mainstream Media Short Video Data Analysis and Visualization Platform Design
Zhang Fan, Liu Wei, Meng Jing
(Hebei Daily Newspaper Group, Shijiazhuang, Hebei 050022)
Abstract
[Purpose] This paper introduces the construction of a mainstream media short video data analysis and visualization platform at Hebei Daily Newspaper Group, aiming to explore and analyze the dissemination mechanisms and evaluation systems of short videos in mainstream media, thereby promoting the digital transformation of traditional media. By studying the propagation patterns of short videos, it provides media organizations with a basis for communication strategies and content optimization.
[Method] Employing data analysis, data mining, data visualization, and other technical means, and combining them with the platform's practical requirements, the system analyzes the dissemination effects and influence of short video content by collecting and processing massive amounts of short video data, thereby exploring the patterns and trends of short video propagation.
[Results/Conclusion] The platform successfully developed an integrated data analysis system that combines short video monitoring, propagation prediction, and daily ranking displays, enabling comprehensive monitoring of trending short videos and high-impact accounts. Through data modeling and algorithmic analysis, a scientific evaluation standard for short videos was constructed, with certain predictive capabilities for propagation power. The daily ranking provides creators with precise feedback, significantly improving management and operational efficiency. This system offers more intelligent and precise short video content support for mainstream media, driving innovation and development in the new media era.
Keywords: short video; dissemination mechanism; evaluation system; data analysis; data visualization
Classification Code: G223
Document Code: A
Article ID: 1671-0134(2025)04-69-05
DOI: 10.19483/j.cnki.11-4653/n.2025.04.013
Citation Format: Zhang Fan, Liu Wei, Meng Jing. Design of Mainstream Media Short Video Data Analysis and Visualization Platform [J]. China Media Technology, 2025, 32(4): 69-73.
Introduction
Mainstream media outlets such as China Media Group, Xinhua News Agency, and People's Daily have actively promoted short video content innovation across various domains. As President Xi Jinping emphasized, "Wherever the people are, that's where the focus of propaganda and ideological work should be. Cyberspace has become a new space for people's production and life, and it should also become a new space for our Party to build consensus" [1]. The Decision of the Central Committee of the Communist Party of China on Further Deepening Reform Comprehensively to Advance Chinese Modernization proposes to "build a working mechanism and evaluation system adapted to all-media production and communication, and promote systematic reform of mainstream media." It also stresses "exploring effective mechanisms for the integration of culture and technology," pointing out that technological integration is the necessary path for media industry development and providing new directions for innovation in the media sector, highlighting the strategic value of culture-technology convergence.
As short videos have become increasingly important in the media industry, they have emerged as a key force driving the digital transformation of traditional media. According to the latest China Internet Network Development Statistics Report, by December 2024, the total number of Chinese internet users had exceeded 1.1 billion. Additionally, QuestMobile data from September 2023 shows that short video platforms continue to grow rapidly despite their large monthly active user base, reflecting that short videos have become a mainstream channel for information dissemination and entertainment, with growing significance.
In response to this industry transformation, short video data analysis and propagation prediction have become important tools for improving media decision-making efficiency. However, the challenges of massive data volume, complex structures, and rapid changes require building more efficient short video data monitoring and analysis platforms to help transform unstructured data into valuable decision-making insights.
This paper aims to introduce the design and application of the mainstream media short video data analysis and visualization platform at Hebei Daily Newspaper Group. The core objective is to enhance the dissemination efficiency and management capabilities of short video content through data analysis, data mining, and visualization technologies. Specifically, the system consists of four main subsystems: a data capture and storage subsystem, a short video monitoring and evaluation system, a short video propagation prediction system, and an own media daily ranking system. The data capture and storage subsystem is responsible for real-time crawling and storage of short video platform data, providing a data foundation for subsequent analysis. The short video monitoring and evaluation system focuses on displaying the propagation patterns of trending short videos across platforms. The short video propagation prediction system employs negative binomial regression and other techniques to forecast propagation trends to a certain extent. The own media daily ranking system generates daily rankings based on content performance, helping content creators and decision-makers understand real-time popularity and user feedback. The construction of this platform provides more intelligent and precise short video content management and propagation support for mainstream media, promoting innovation and development in the new media era [2-4].
Fund Project: 2024 Shijiazhuang Science and Technology Plan Project "Multimodal-Based Mainstream Media Short Video Propagation Data Analysis and Visualization System" (Project No.: 2457906501).
1.1 System Overall Design
This system decomposes the short video data analysis and decision-making process into multiple independent yet closely interconnected subsystems. The system architecture adopts a modular design, with each subsystem sharing data and collaborating through interfaces to achieve efficient information flow and processing [5-8].
The system architecture can be divided into four main layers: data acquisition layer, data processing layer, analysis and prediction layer, and decision support layer.
Data Acquisition Layer: Responsible for obtaining raw data from short video platforms. This layer includes functions such as crawler technology, API interface calls, data cleaning, and storage to ensure data timeliness and accuracy.
Data Processing Layer: Processes, stores, and manages collected data through big data technologies. Database management systems are used to store data by category, providing support for subsequent analysis and processing.
Analysis and Prediction Layer: Employs data mining and machine learning algorithms for in-depth analysis of data, generating visual charts and propagation trend predictions.
Decision Support Layer: Transforms analysis results into actionable decision-making information and visualization dashboards, providing data support for short video content producers and managers to help them formulate more effective strategies.
The system's primary data sources include multiple short video platforms such as Douyin, Kuaishou, WeChat Channels, Bilibili, Xiaohongshu, and Weibo, with a focus on mainstream media accounts across these platforms. The system adopts a multi-source data acquisition strategy: for platforms that provide open APIs, data is directly accessed through interfaces; for content without open APIs, customized crawler programs are employed within platform rule constraints; for other platforms where direct data acquisition is inconvenient, partnerships with third-party data service providers are established. All collected data is uniformly stored in a centralized data warehouse to lay a solid foundation for subsequent analysis and prediction.
1.3 System Platform and Integration
The system employs a microservices architecture, integrated with Dubbo and Spring Boot to enable modular independent deployment. Each microservice focuses on different business domains and interacts through RESTful APIs. Frontend data is transmitted to the backend via APIs, processed by the backend, and stored in MySQL, while Spark handles big data computing and stores results in Hive. Apache Kafka is utilized to enhance data real-time capabilities.
The frontend uses Vue.js with ECharts for visualization and Axios for asynchronous communication. The backend is based on Spring Boot and Dubbo, where Spring Security handles authentication, MyBatis manages data persistence, and Shiro is used for permission management. Service scheduling and management are handled by Dubbo to ensure high system availability. For deployment, the system uses Docker containerization and relies on Kubernetes for cluster management and automated scaling. Frontend-backend communication employs SSL/TLS encryption, and sensitive information such as passwords is encrypted for storage. The overall architecture combines efficiency, security, and scalability, supporting high concurrency and big data processing. Meanwhile, a comprehensive logging and monitoring system ensures stable platform operation and facilitates timely identification and resolution of potential issues.
1.4 Key Technologies
1.4.1 Frontend Technology
Frontend technology is crucial for system visualization and user experience, particularly in short video data analysis where presenting clear, intuitive trends, rankings, and user feedback is key to success. Both Vue.js and React are widely used frameworks in modern frontend development for building complex, dynamic user interfaces. For short video data visualization systems, choosing these frameworks enables more efficient construction of highly interactive and responsive frontends. ECharts is a powerful open-source visualization library supporting various chart types such as line charts, bar charts, and heatmaps. In short video data analysis, it can help implement video propagation trend charts, daily ranking displays, and user interaction analysis [9-10].
1.4.2 Data Processing Technology
Hive is a data warehouse infrastructure built on Hadoop that primarily provides an SQL-like query language called HiveQL for operating large-scale datasets. Its design goal is to simplify big data processing, enabling non-programmers to query data using SQL-like syntax. Apache Spark is a fast and universal big data processing engine that supports both batch and stream processing. Unlike Hadoop's MapReduce, Spark loads data into memory for computation, thereby reducing disk I/O and typically achieving faster computation speeds than Hadoop [11-12].
1.4.3 Negative Binomial Regression
In regression analysis, a model featuring a quadratic term with a negative coefficient is called negative binomial regression. It is commonly used to describe phenomena with parabolic trends and is applicable to scenarios where data exhibits inverted U-shaped or normal distributions. This technique is widely applied in data analysis for short video platforms and social media, particularly for predicting short video propagation power and trends. When applied to short video propagation prediction, negative binomial regression can effectively forecast video dissemination patterns [13-16].
1.4.4 OpenCV
OpenCV is an open-source computer vision library widely used in image and video processing, machine learning, and other fields. It provides a powerful set of tools for image processing, object detection, feature matching, and deep learning integration. In this system, it is primarily used to automatically extract visual features from short videos [17-18].
2.1 Data Capture and Storage Subsystem
The data capture and storage subsystem forms the foundation of the entire system, responsible for real-time crawling and storage of data from major short video platforms and social media. It aims to ensure timely and accurate data acquisition. The captured data includes video titles, links, cover images, publishing accounts, publication time, view counts, likes, comments, shares, and more. After data crawling, standardization is performed across different platforms.
Since some platforms adjust data to varying degrees, a data monitoring and alarm mechanism is necessary to ensure timely and accurate data acquisition. By monitoring crawling frequency, success rates, and platform response status, the system triggers alarms to alert administrators when anomalies occur, such as crawling failures or platform interface updates.
2.2 Short Video Monitoring and Evaluation System
The short video monitoring and evaluation system is a comprehensive analysis tool designed to track and evaluate the dissemination effects and influence of trending short video content and influential accounts published by mainstream media. It primarily includes four functional modules: propagation heat ranking, comprehensive account propagation analysis, media influence ranking, and cross-platform media work difference analysis.
2.2.1 Propagation Heat Ranking
By configuring weights for interaction metrics such as likes, shares, and comments, the system calculates a comprehensive index for each short video work. This index is used to display the top 50 works across platforms, with the ranking refreshed every 10 minutes to ensure it reflects the latest trends and popular content. Works with significant changes in their comprehensive index are specially marked to facilitate timely capture of their dynamic changes. Additionally, the system calls sentiment analysis and text semantic matching interfaces to deeply analyze the emotional tone of videos. Works with strong emotional expression are marked to help users identify content that may trigger strong emotional resonance. Meanwhile, identical works appearing across multiple platforms are distinguished through labeling, alerting users to monitor the work's performance and reception on different platforms.
2.2.2 Comprehensive Account Propagation Analysis
This module displays changes in the number of works, follower counts, and average comprehensive index of short video platform accounts within custom time periods. The average comprehensive index is calculated as the sum of comprehensive indices for all works published by an account in a day divided by the number of works published that day. These data trends are presented in chart form to help users easily identify outstanding accounts during specific time periods and promptly detect fluctuations in their propagation power.
2.2.3 Media Influence Ranking
The media influence ranking provides users with a more comprehensive perspective by showing the integrated performance of the same media across multiple platforms. Specifically, the ranking combines key metrics such as the number of works published within a specific time period, follower counts, and the comprehensive influence index of works. The work comprehensive influence index is calculated as the sum of comprehensive indices for all works in that period divided by the total number of works, aiming to present the media's integrated influence across different platforms. This module helps users understand the media's overall performance across platforms, providing cross-platform data comparison and analysis, enabling accurate assessment of media dissemination effectiveness across different channels.
2.2.4 Cross-Platform Media Work Difference Analysis
By comparing the performance of the same type of work across different platforms, this module helps users deeply analyze dissemination effect differences between platforms. Through charts and visualized data, it clearly presents works' interaction situations and audience preferences on each platform. This module not only helps media identify which platforms are suitable for different content but also provides in-depth insights into content propagation differences across platforms. For creators, this feature can provide decision-making support for developing cross-platform content strategies, helping them precisely reach target audiences.
2.3 Short Video Propagation Prediction System
Propagation prediction systems typically employ regression analysis, time series forecasting, deep learning, and other machine learning technologies to build different types of prediction models. Among them, regression analysis models predict future video propagation effects by analyzing the relationship between historical video data and propagation volume. Considering the cost, quantity, and features of historical effective samples, this system selects negative binomial regression to build the prediction model.
Through modeling historical samples, short video samples are extracted across 35 dimensions including visual features, auditory features, and publication features. After regression analysis, 35 indicator items are ultimately selected for prediction model construction, including contrast, clarity, sharpness, dominant color, dominant color proportion, music type, presence of human voice, presence of dialect, content features, regional features, production features, emotional features, whether published on a workday, publication time period grouping, video duration, video duration grouping, account follower count, whether it includes celebrities, whether there is an anchor appearance, and more. Based on the regression coefficients of these extracted indicators, the system sets the relevant parameters for the current regression model.
When users upload a video for analysis and fill in basic classification information, relevant analysis interfaces for visual, auditory, and publication features are triggered to extract features from the short video data, which are then fed into the regression model to obtain propagation predictions and provide optimization suggestions across the 35 indicator dimensions.
2.4 Own Media Daily Ranking System
The own media daily ranking system is a data analysis tool specifically designed for creators to help them understand the performance of their own media account video content. Due to authorization relationships, the accessible analytical data extends far beyond frontend metrics like likes, shares, and comments. With support from short video platform data interfaces, the system can also access multi-dimensional backend data including completion rate, 5-second completion rate, cover click-through rate, 2-second bounce rate, average playback duration, homepage visits, follower growth, shares to chat and Moments, and more. Through this feature, creators can clearly see each video's interaction situation on the platform. Additionally, it displays information about outstanding creators within the media organization to motivate continuous improvement. Decision-makers can also quickly understand the current content publishing situation of the media through this system's analysis, enabling more scientific content decisions that drive creator growth and platform content optimization.
3. Platform Construction Key Points
3.1 Timeliness and Standardization of Data Acquisition
In the current short video ecosystem, content updates and propagation are extremely rapid, and platform interfaces and presentation forms continue to evolve. Only through real-time monitoring, adjusting storage methods according to interface changes, and timely collecting dynamic data from each platform can we accurately grasp content popularity changes and trends. To ensure both data integrity and platform-specific characteristics, we need to establish strongly consistent data storage rules during the acquisition process, especially concerning the standardization of comparative metrics. The core objective of data standardization is to transform the data structures and presentation forms of various platforms into a unified standard format to facilitate subsequent analysis and modeling. This not only reduces errors caused by different data structures but also improves data analysis efficiency and accuracy.
3.2 Timeliness of Propagation Prediction Models
In the process of building prediction models, most features can be efficiently extracted through automated technologies such as computer vision and sentiment analysis. However, for more complex dimensions involving video content classification, commentary style analysis, and shot transition patterns, manual annotation remains necessary to ensure data accuracy. Therefore, prediction model parameters need regular updates to adapt to the continuously evolving trends in short video content. With ongoing technological advancement and algorithm optimization, achieving comprehensive automation of feature extraction is key to prediction model upgrading. At that point, stored data can directly drive real-time model updates, thereby improving prediction accuracy and adaptability, enabling the model to more precisely reflect the dynamic changes in short video propagation power.
3.3 High Availability and Scalability of the System
The volume of stored short video data is enormous, especially during traffic peaks or when hot events occur, when the system needs to rapidly process multiple video data streams. Distributed architecture and load balancing can improve the system's concurrent processing capacity, ensuring efficient response even when data surges. As short video platforms develop, new functional requirements and data sources may continue to emerge. The system's flexibility and scalability can support future demand changes, preventing architectural limitations during business expansion. The design considers modular architecture to support subsequent function expansion. Through loose coupling design, the system can quickly integrate new data sources and functions without affecting existing capabilities.
3.4 Multi-Dimensional and Intuitive Analysis Presentation
For data visualization systems, providing user-selectable and clearly understandable analysis results is a core requirement. Short video propagation effects are multi-dimensional, and presenting them in various forms can help users analyze video performance from multiple angles and make more precise decisions. Different dimensional displays can enrich analysis results and enhance decision support accuracy. Visualization tools such as charts, heatmaps, and scatter plots can be used to display different dimensional data. Clear interface design not only improves user operation efficiency but also reduces information overload, enabling users to focus on the most critical data. The interface design should be as concise as possible, avoiding excessive redundant information. Through data charts, pie charts, bar charts, and other methods, complex data is presented in ways that allow users to quickly identify underlying trends and patterns.
Conclusion
The platform has successfully designed and developed an integrated data analysis and visualization system that combines short video monitoring, propagation prediction, and own media daily ranking displays, enabling comprehensive monitoring of trending short videos and high-impact accounts from mainstream media. By combining data modeling and algorithmic analysis, the platform has constructed a scientific and rational short video evaluation standard with certain short video propagation prediction capabilities. Additionally, the platform's daily ranking function provides creators with rapid, precise feedback, significantly improving short video management and operational efficiency. The system provides more intelligent and precise support for short video content management and propagation for mainstream media, driving innovation and development in the new media era.
References
[1] Xi Jinping. Accelerating Media Convergence Development and Building an All-Media Communication Landscape [J]. Qiushi, 2019(6): 4-8.
[2] Zhao Bing. Enhancing the "Presence Capability" of Mainstream Media on Mobile Internet with Internet Thinking [J]. Chinese Journalist, 2024(2): 12-15.
[3] Zhao Bing. Grasping Communication Patterns in the All-Media Era and Striving to Improve Positive Propaganda Quality and Efficiency [J]. News Front, 2024(6): 4-8.
[4] Zhang Yuhao. Practice and Reflection on Driving Communication Power Improvement Through All-Media Production and Communication Mechanisms—Taking Hebei Daily Newspaper Group's Communication Power Dashboard as an Example [J]. News and Writing, 2025(1): 9-11.
[5] Li Jinling, Yuan Xin, Yang Biao. Design and Implementation of an Infectious Disease Data Visualization Platform Based on Web Technology [J]. Computer Applications and Software, 2023(10): 101-106, 173.
[6] Zhang Fan, Du Yaru, Zhang Xindong, et al. Design of an Integrated Business and Finance Platform in a Media Convergence Environment [J]. China Media Technology, 2024(4): 84-88.
[7] Shen Enya. Big Data Visualization Technology and Application [J]. Science & Technology Review, 2020(3): 68-83.
[8] Li Xiangming. Research on Data Visualization of Finance Companies Under the Background of Treasury System Construction [J]. Finance & Accounting, 2023(15): 64-66.
[9] Zhao Jun. Implementation of Epidemic Statistics Charts Based on Vue.js [J]. Computer Programming Skills & Maintenance, 2020(3): 144-147.
[10] Li Ping, Li Yong, Fan Quanrun. Design and Implementation of a University Teaching Status Data Visualization Analysis Platform [J]. Experimental Technology and Management, 2020(5): 46-51, 10.
[11] Li Junli. Parallelization of Mutual Information Calculation for Categorical Data Under the Spark Platform [J]. Computer Engineering and Applications, 2021(7): 95-100.
[12] Wang Yanyan, Wang Yanning, Liu Jiaxin, Ren Jiadong. Research on Port Logistics Big Data Application Based on Hadoop [J]. Journal of Yanshan University, 2023(3): 216-220, 228.
[13] Zhu Maoran, Ma Xiaoyi, Gao Song, et al. The Impact of Emotional Disagreement on Social Media Information Re-Dissemination—Taking Weibo as an Example [J]. Journal of Intelligence, 2024(5): 143-151.
[14] Peng Yanni. Research on Key Technologies for Time-Event Sequence Data Visualization [D]. Changsha: Central South University, 2022.
[15] Huang Yangkun, Chen Changfeng. Automation of Visual Communication and Aesthetic Construction of National Image—A Computational Aesthetics Study Based on Twitter Social Bots [J]. Modern Communication—Journal of Communication University of China, 2023(8): 96-104.
[16] Sun Zhenghui, Zheng Jianping, Wang Youwei. Research on the Impact of Short Video Visual, Auditory, and Content Features on E-Commerce Marketing Effectiveness [J]. Journal of Marketing Science, 2023(4): 1-21.
[17] Li Xianfeng, Xu Sen, Hua Yiming. Maritime Video Ship Fire Smoke Detection Technology Based on OpenCV Computer Vision [J]. Ship Science and Technology, 2021(22): 202-204.
[18] Guan Lu, Zhou Baohua. Application of Computer Vision Technology in Journalism and Communication Research [J]. Contemporary Communication, 2022(3): 20-26.
Author Biographies:
Zhang Fan (1991—), male, from Meixian County, Shaanxi Province, holds a master's degree, works in the Technical Support and Development Department of Hebei Daily Newspaper Group as a senior engineer, with research interests in media convergence, data analysis, and informatization; Liu Wei (1980—), male, from Ningjin County, Shandong Province, holds a bachelor's degree, works at Hebei Daily Newspaper Group as a chief editor, with research interests in media convergence; Meng Jing (1993—), female, from Xingtai City, Hebei Province, holds a master's degree, works in the Editor-in-Chief Office of Hebei Daily Newspaper Group as an assistant editor, with research interests in media convergence, data analysis, and informatization.
(Responsible Editor: Li Yansong)