From Correlation to Causation: Integrated Causal Inference and Precision Intervention Pathways for Construction Workers' Mental Health Risks
Ling Zhicheng, Yu-Han Liu, Juan Wang, Huanglong
Submitted 2025-11-03 | ChinaXiv: chinaxiv-202511.00044 | Mixed source text

Abstract

Abstract

Background: Construction workers face severe mental health challenges, with the prevalence of anxiety, depression, stress, and burnout being significantly higher than in other industries. However, existing research is mostly limited to a "problem-oriented" approach, relying excessively on cross-sectional data and traditional statistical methods. These approaches struggle to effectively handle high-dimensional, non-linear data and often overlook individual heterogeneity, leading to weak causal inference. This study aims to overcome these limitations by constructing an innovative analytical framework, providing a more robust scientific basis for mental health interventions among construction workers.

Methods: This study proposes a four-in-one analytical paradigm of "data-driven, causal inference, heterogeneity identification, and robustness assessment." A total of 1,000 employees were randomly sampled from the Fourth Engineering Office of the Second Aviation Engineering Bureau of China Communications Construction, resulting in 912 valid questionnaires. In the data analysis, Elastic Net regression was first employed for high-dimensional variable reduction and feature selection. Second, Double Machine Learning (DML) combined with XGBoost was introduced for causal effect estimation to address confounding bias. Subsequently, Latent Profile Analysis (LPA) was utilized to identify heterogeneous subgroups of mental health. Finally, Negative Control Outcomes (NCO) and E-value analysis were conducted to assess the robustness of the causal inference. All statistical analyses were performed using R 4.5.1 software and relevant core analysis packages, with the threshold for statistical significance set at $P < 0.05$.

Results: The study identified that factors such as smoking, gender, sense of meaning in work, emotional self-regulation, workplace ostracism, emotional exhaustion, alcohol dependence, and cynicism have significant causal effects on depression, anxiety, and stress. Latent Profile Analysis categorized construction workers into three groups: high-risk (5.7%), medium-risk (32.6%), and mentally healthy (61.7%), with the high-risk group exhibiting significantly higher levels of anxiety, depression, and stress. E-value analysis results indicated that the causal effect models possess moderate to high robustness (E-values ranging from 1.21 to 1.78).

Conclusion and Significance: This study successfully constructed and applied an integrated analytical framework, greatly enhancing the strength of evidence for causal inference from cross-sectional observational data. The findings not only reveal key risk and protective factors affecting the mental health of construction workers but also identify heterogeneous groups with different mental health risks. These findings provide a direct scientific basis for developing precise and efficient mental health intervention strategies, particularly emphasizing the importance of targeting the "emotional exhaustion-cynicism" core pathway and enhancing protective factors such as "emotional self-regulation" and "sense of meaning in work," which holds significant practical guidance for improving the mental health of construction workers.

Full Text

Preamble

From Association to Causation: Integrated Causal Inference and Precision Intervention Paths for Construction Workers' Mental Health Risks

An Empirical Study Based on a Four-in-One Framework and Cross-Sectional Data

Abstract

The mental health of construction workers is a critical factor influencing both occupational safety and industry sustainability. While existing research has identified numerous factors associated with mental health, most studies rely on traditional correlation analysis, which fails to capture the complex causal mechanisms required for effective intervention. This study proposes an integrated causal inference framework—the "Four-in-One" model—to transition from identifying associations to establishing causal pathways. Utilizing cross-sectional data from a large-scale survey of construction workers, we employ advanced machine learning and structural equation modeling to map the causal landscape of mental health risks. Our findings reveal specific high-leverage intervention points, moving beyond broad correlations to precise, actionable paths. This research provides a theoretical and empirical basis for developing targeted mental health support systems within the construction industry, shifting the focus from reactive management to proactive, precision-based prevention.

1. Introduction

The construction industry is characterized by high physical demands, hazardous working environments, and significant psychological pressure. Consequently, construction workers face disproportionately high risks of mental health issues compared to other labor sectors. Despite a growing body of literature exploring these risks, the field remains dominated by associational studies. While these studies are valuable for identifying potential risk factors, they often fall short of explaining the underlying causal structures. In the context of occupational health, understanding "what causes what" is essential for designing interventions that are both efficient and effective.

This study addresses this gap by integrating causal inference methodologies with a comprehensive "Four-in-One" framework. This framework considers individual, organizational, environmental, and social dimensions as an interconnected system. By applying this model to cross-sectional data, we aim to disentangle the web of correlations and identify the primary drivers of mental health risks among construction workers.

2. Theoretical Framework: The Four-in-One Model

To systematically analyze the mental health of construction workers, we propose a "Four-in-One" integrated framework. This model posits that mental health outcomes are not the result of isolated factors but are shaped by the synergy of four distinct domains:

  1. Individual Factors: Including demographic characteristics, psychological resilience, and personal coping mechanisms.
  2. Organizational Factors: Encompassing management styles, safety climate, workload, and job security.
    3.

1 皖南医学院人文与管理学院

Nantong University Xinglin College

背景

Construction workers face severe mental health challenges, with the prevalence of anxiety, depression, stress, and burnout significantly higher than in other industries. Existing research remains largely limited by an over-reliance on cross-sectional data and traditional statistical methods. These conventional approaches struggle to effectively process high-dimensional, non-linear data and often overlook individual heterogeneity, resulting in weak causal inference. This study aims to overcome these limitations by constructing an innovative analytical framework, thereby providing a more robust scientific basis for mental health interventions among construction workers.

方法

This study proposes a "four-in-one" analytical paradigm. A sample of employees was randomly selected from the Fourth Engineering Office of the Second Aviation Engineering Bureau of China Communications Construction Company, resulting in valid questionnaires used for data analysis.

The analytical framework consists of four primary stages. First, Elastic Net regression was employed for high-dimensional variable reduction and feature selection. Second, Double Machine Learning (DML) using the XGBoost algorithm was introduced to estimate causal effects while addressing confounding bias. Third, Latent Profile Analysis (LPA) was utilized to identify heterogeneity in mental health patterns. Finally, negative control analysis and E-value analysis were conducted to evaluate the robustness of the causal inferences. All statistical analyses were performed using R version 4.5.1 and its core analysis packages, with the threshold for statistical significance set at $P < 0.05$.

结果

The study identified that factors such as smoking, sense of work meaning, emotional self-regulation, and alcohol dependence exert significant causal effects on depression, anxiety, and stress. Latent profile analysis categorized construction workers into three distinct groups: a high-risk group (32.6%) and a mentally healthy group (61.7%), with the high-risk group exhibiting significantly higher levels of anxiety, depression, and stress.

E-value analysis results indicate that the causal effect models possess moderate to high robustness. In conclusion, this study successfully constructed and applied an integrated analytical framework, significantly enhancing the strength of evidence for causal inference derived from cross-sectional observational data. The findings not only reveal key risk and protective factors affecting the mental health of construction workers but also identify heterogeneous groups with varying mental health risk profiles. These discoveries provide a direct scientific basis for formulating precise and efficient mental health intervention strategies. Specifically, the study emphasizes the importance of targeting core pathways and enhancing protective factors such as emotional self-regulation, offering vital practical guidance for improving the mental health of construction workers. As the cornerstone of global economic and social development, the construction industry shapes the contours of modern civilization. Behind this civilization lies a vast population often overlooked by society; though contributors to the advancement of civilization, they labor long-term in extremely harsh physical and psychological environments, enduring pressures unimaginable to the average person. With the general increase in social attention toward mental health issues, the psychological well-being of construction workers—this unique group—is gradually moving from the periphery into the spotlight, becoming an undeniable public health and social issue.

Existing research reveals a concerning picture: the mental health status of construction workers is generally poor, with stress levels and resulting issues such as burnout and substance abuse (especially alcohol dependence) occurring at rates significantly higher than in many other industries. According to the "Blue Book on Occupational Mental Health," a substantial proportion of construction workers exhibit anxiety and depressive tendencies, while the rate of occupational burnout has reached an alarming level \cite{2017}. Behind these cold statistics lies the immense suffering endured by countless individuals at physiological, psychological, and social functional levels, as well as the profound impact on personal quality of life, family harmony, and the safety of the entire industry. Mental health problems not only severely impair workers' cognitive functions, emotional regulation, and interpersonal skills—increasing the risk of various psychosomatic diseases—but also directly lead to lack of concentration and increased operational errors, serving as a major trigger for safety accidents \cite{2004}. Related studies indicate that over 80% of safety accidents in the construction industry are directly related to workers' unsafe behaviors, which are often closely linked to psychological exhaustion and burnout caused by long-term harsh working conditions and high-intensity labor. Deeply exploring the internal mechanisms, key risk factors, and complex interactions of construction workers' mental health issues is not only a matter of humanitarian concern but also an urgent requirement for ensuring sustainable industry development and maintaining public safety \cite{2000}. Although the severity of these issues has become increasingly prominent and academic research has made some progress, the field remains in a relatively primary stage of exploration. The knowledge system and evidence base have not yet reached a state of stability or maturity. While existing research reveals the prevalence of these problems, it also exposes several profound limitations that constitute weaknesses in the current chain of evidence, making it fragile when facing complex real-world problems \cite{Samuel2022}. First, regarding research content, the vast majority of studies focus excessively on measuring negative psychological states like anxiety and stress, while paying little attention to protective psychological factors and traditional lifestyle habits. Although this helps identify high-risk individuals, it fails to depict a complete picture of worker mental health or provide effective guidance for intervention development from a positive promotion perspective. Methodologically, cross-sectional designs dominate, which possess inherent deficiencies in explaining causal effects. More critically, existing research paradigms are generally traditional, relying heavily on correlation analyses (such as regression) and simple mediation models. These methods prove inadequate when processing high-dimensional, multi-collinear, and complex data prevalent among construction workers \cite{Chernozhukov2018}. They struggle to effectively address model specification bias or accurately estimate the causal effects of key risk factors while strictly controlling for confounding variables. Traditional methods also tend to treat the study population as a homogeneous group, masking the significant heterogeneity among individuals. In the construction worker population, these differences likely lead to vastly different psychological reactions when facing the same stressors. Ignoring this heterogeneity not only limits the depth of research but also renders interventions based on such findings ineffective in practice \cite{Samuel2025}. Furthermore, there is a lack of systematic assessment regarding the sensitivity of research conclusions to unobserved confounding factors. This results in many seemingly significant findings remaining at the level of association, with reliability that may be fragile; the presence of certain unmeasured key variables could potentially overturn entire conclusions. These deep-seated limitations collectively result in a fragile evidence chain in current construction worker mental health research, which is insufficient to support scientific, precise, and effective intervention strategies \cite{Fagbenro2024}. To address these issues, the fundamental solution lies in a profound methodological innovation. It is essential to construct a more robust, rigorous, and comprehensive analytical framework. We propose and systematically explain a new paradigm.

This "four-in-one" integrated analytical paradigm addresses specific weaknesses in current research at each stage. Its integrated application will collectively push the field toward a higher level of scientific evidence. Simultaneously, through causal effect estimation and in-depth study of heterogeneity, this research will further refine and supplement theoretical studies and provide preliminary guidance for precision interventions in the field of construction worker mental health.

方法

This study was conducted in collaboration with the Fourth Engineering Office of the Second Aviation Engineering Bureau of China Communications Construction. Between [Month] and [Month], a random sampling of [Number] full-time employees (representing [Percentage] of the total workforce) was performed via an online questionnaire system. Following rigorous data cleaning—which involved the exclusion of incomplete responses, abnormal patterns, and invalid entries—a total of [Number] valid questionnaires were retained (an effective recovery rate of 91.2%), including [Number] male participants. To ensure adherence to ethical standards and maintain high data quality, we organized comprehensive briefing sessions prior to the survey. A background-blind sampling method was employed, followed by secondary briefings for selected employees to confirm informed consent and ensure the accuracy of their understanding of the questionnaire.

Regarding the analytical strategy, this study constructed and implemented an integrated analytical paradigm. The process begins with a data-driven approach, utilizing Elastic Net regression for feature selection to prevent model specification bias. Subsequently, the study transitions to causal inference, employing Double Machine Learning (DML) models to estimate the causal effects of key factors on depression, anxiety, and stress after controlling for high-dimensional confounders. This is followed by a robustness assessment, which introduces negative control variables selected via network centrality indices and E-value analysis to quantify the potential impact of unmeasured confounding. Finally, the analysis concludes with heterogeneity identification: Latent Profile Analysis (LPA) is used to identify high-risk subgroups beneath the average treatment effects and to characterize their unique risk and protective factor profiles.

2.2 算法筛选

To avoid model misspecification and achieve data-driven feature selection, this study employs Elastic Net regression to perform dimensionality reduction on high-dimensional variables. By utilizing joint $L_1$ and $L_2$ regularization, this method enables simultaneous variable selection and coefficient shrinkage.

This process yields a refined and robust set of features, establishing a solid foundation for subsequent causal inference. To develop more causally interpretable explanations within cross-sectional data, this study introduces the Double Machine Learning (DML) model, a frontier approach in econometrics.

The DML framework is designed to address the core challenge of observational studies: confounding bias. Its fundamental principle involves using machine learning models to capture the components of the treatment variable and the outcome variable that can be explained by confounding variables. By removing these components from the original variables, the model regresses the remaining, relatively exogenous variation to obtain more robust estimates of causal effects. This approach allows us to utilize algorithms such as XGBoost to flexibly capture complex relationships between variables without pre-specifying linear forms, providing an ideal tool for processing the high-dimensional and non-linear data characteristic of construction worker research.

To reveal potential heterogeneous subgroups underlying the average treatment effect, this study employs Latent Profile Analysis (LPA). Based on individuals' response patterns across mental health indicators, the model identifies mutually exclusive latent classes using a probabilistic framework, thereby achieving population segmentation and providing a targeted basis for precision interventions. Unlike traditional clustering methods, this probabilistic model identifies latent class structures through maximum likelihood estimation and assigns each individual a posterior probability of class membership, resulting in statistically verifiable classification results.

2.3 敏感性分析

Even when employing cutting-edge causal inference methods, conclusions remain dependent on the strong assumption of "no unmeasured confounding." However, in the social sciences, unobserved or unmeasured confounding factors are often unavoidable. To systematically evaluate the potential bias introduced by such unmeasured confounding, this study introduces two complementary tools: negative control estimation and E-value analysis.

Negative control variables are defined as variables that, theoretically, have no causal relationship with either the exposure or the outcome. This study innovatively quantifies the selection process for these variables using two thresholds: first, the comprehensive centrality of candidate variables within the network is calculated to identify key nodes; second, the correlation coefficients between these candidates and depression, anxiety, or stress must be lower than a specified threshold. Only variables that simultaneously satisfy the criteria of "high confounding potential" and "low causal association" serve as negative controls. If the model detects a significant effect on such a variable, it indicates the presence of uncontrolled confounding. Conversely, the E-value provides a more direct quantitative standard, defined as the minimum strength of association that an unmeasured confounder would need to have with both the exposure and the outcome to fully explain away the observed effect.

A larger E-value indicates that a higher intensity of unmeasured confounding would be required to overturn the conclusions, thereby suggesting that the results are more robust.

2.4 量表选择

Data analysis for this study was conducted using R version 4.5.1. Continuous variables are expressed as means, while categorical variables are presented as frequencies or percentages. Advanced statistical analyses were performed using core packages including glmnet, DoubleML, simex, EValue, and tidyLPA. For all statistical tests, a p-value of less than 0.05 ($P < 0.05$) was considered statistically significant.

Descriptive statistics of demographic variables were performed on the total study population. The analysis of demographic characteristics shows that the sample is predominantly male (86.62%). The age distribution is relatively balanced, with those under 40 years old accounting for 52.08% and those aged 40 and above accounting for 47.92%. Work experience shows a diverse distribution: those with fewer than 10 years of experience represent 28.73%, while those with 10–20 years and over 20 years represent 37.72% and 33.55%, respectively. Regarding behavioral and health characteristics, a majority of participants report alcohol consumption (56.47%). The incidence of work-related injuries is relatively low (18.86%), while the COVID-19 infection rate is high (81.47%). Furthermore, the vast majority of the sample consists of migrant workers, reaching a proportion of 92.87%. Overall, these data accurately reflect the characteristics of the construction industry workforce, which is primarily composed of young male migrant workers.

During the feature engineering stage, a Pearson correlation heatmap [FIGURE:1] was generated to encompass various dimensions, including individual traits and organizational contexts. This heatmap quantifies the direction and strength of linear associations between variables through a color gradient, allowing for an intuitive analysis of correlation patterns among features. This approach not only facilitates the identification of key features highly correlated with the target variables but also provides a visual basis for detecting multicollinearity. These insights guide feature selection and the construction of derived features, ensuring the effectiveness and parsimony of the feature set while laying the foundation for subsequent model interpretation and robustness verification.

Correlation Heatmap

3.2.2 弹性网络回归

In this study, embedded feature selection was performed using Elastic Net regression. Following parameter tuning via 10-fold cross-validation, 11 non-zero coefficients were identified, with their absolute values serving as a measure of feature importance. Self-emotional regulation, sense of meaning at work, workplace well-being, alcohol consumption status, and the use of emotions were identified as the primary negative contributors. Conversely, alcohol dependence and organizational identification constituted the first tier of positive contributors. All variables associated with these non-zero coefficients were preserved as the feature set for subsequent analysis.

Feature engineering...

3.3 因果效应估计

This study employs Double Machine Learning (DML) with XGBoost as the base learner. After fitting the model using the optimal parameters (Table [TABLE:N]), the results (Figure [FIGURE:N]) indicate that after partialing out high-dimensional confounding factors, variables such as sense of meaning in work and self-emotional regulation demonstrate significant effects.

Eight independent variables, including cynicism, show significant causal effect estimates in the models for depression, anxiety, and stress. Furthermore, the total average effects across various causal pathways are relatively strong (Table [TABLE:N]: Hyperparameter Configuration: depth, nthread, nrounds, folds). The causal inference results and specific coefficients for significant variables are as follows: the total average effect for self-emotional regulation is $[-0.201, -0.025]$, and for sense of meaning in work is $[-0.146, -0.004]$. Other coefficients include $[0.022, 0.042]$, $[0.028, 0.207]$, $[0.087, 0.239]$, $[0.098, 0.307]$, $[0.048, 0.365]$, and $[0.141, 0.283]$. Regarding the selection of covariates for the DML analysis, network centrality analysis reveals that the standardized composite centrality of self-emotional evaluation is highly correlated with multiple confounders across all variable nodes (see Supplementary Table [TABLE:N]). Based on prior data, it is known that:

There is no theoretical causal path for the outcome, and the correlation coefficient is less than $0.1$, which satisfies the negative control hypothesis. Consequently, it was uniquely included in the final covariate set to absorb residual confounding.

3.4.2 敏感性分析

Negative Control Outcomes

In the field of causal inference and observational studies, negative control outcomes (NCOs) serve as a critical methodological tool for detecting and mitigating the effects of unmeasured confounding. While traditional approaches often rely on the assumption of "no unmeasured confounders" (exchangeability), this assumption is frequently violated in real-world data. Negative control outcomes provide a robust framework for sensitivity analysis and bias correction by identifying variables that are associated with the treatment-assignment process or the confounding factors, but are known—based on prior scientific knowledge—not to be causally affected by the treatment under investigation.

Conceptual Framework

The fundamental logic of a negative control outcome is that if a treatment appears to have a statistically significant "effect" on an outcome where no such biological or physical mechanism exists, that effect must be a manifestation of bias. This bias typically arises from unmeasured confounding or selection bias that affects both the primary outcome of interest and the negative control. By observing the relationship between the treatment and the NCO, researchers can qualitatively assess the presence of confounding or, in more advanced applications, quantitatively adjust the primary effect estimate.

Methodological Applications

The use of negative control outcomes can be categorized into two primary levels of rigor:

  1. Bias Detection: At this level, NCOs are used as a diagnostic tool. If the estimated effect of the treatment on the NCO is non-zero, the researcher concludes that the study design is susceptible to confounding. This serves as a "sanity check" for the validity of the primary findings.
  2. Bias Correction and Identification: Recent developments in causal inference, such as the proximal causal inference framework, utilize NCOs (often in conjunction with negative control exposures) to formally identify and estimate the causal effect of interest. By leveraging the association between the treatment and the NCO, it is possible to analytically subtract the confounding bias from the primary treatment effect estimate, provided certain structural assumptions are met.

Selection Criteria for Negative Control Outcomes

For an outcome to serve as a valid negative control, it must satisfy several key criteria:
- No Causal Link: There must be a strong theoretical or empirical basis to assert that the treatment does not cause the negative control outcome.
- Shared Confounding: The NCO should be influenced by the same unmeasured confounders that affect the primary outcome of interest.
- Measurement Comparability: Ideally, the NCO should be measured using the same data sources and methods

分析

Self-reported emotional evaluation was employed as a control variable to support the validity of the model. Overall, the sensitivity analysis indicates that the causal inferences regarding most psychological and behavioral variables possess a high degree of reliability. Within the intervals of the various effect models utilized in this study, the results demonstrate that the proposed effect models exhibit a moderate-to-high level of robustness.

3.5 潜在剖面分析

This study utilized the Anxiety, Depression, and Stress subscales of the DASS-21 scale as outcome variables. We evaluated the optimal number of profiles for the Latent Profile Analysis (LPA) by comprehensively examining common fit indices, with a particular focus on the Entropy value. Based on these criteria, the model with $n_profiles$ was found to provide the best fit. As shown in the clustering plot ([FIGURE:N]), the boundaries between groups are distinct, indicating that the clustering results possess high discriminative power and provide a comprehensive classification of the sample.

3.5.2 剖面结构刻画

Results and Discussion

Through inter-group comparisons ([TABLE:1]), this study found that the mean scores for depression and stress in the high-risk group (32.6% of the sample) were significantly higher than the overall average. Another group, comprising approximately 5.7% of the sample, had scores within the borderline range and requires close monitoring. In contrast, the mental health group (61.7% of the sample) showed mean scores across all dimensions that were lower than the overall average. Through factor identification, the study identified six high-risk factors, including alcohol dependence and workplace exclusion. Conversely, self-efficacy motivation, sense of work meaning, supervisor support, and work well-being were identified as significant protective factors for the population ([FIGURE:Class]).

Core Profile Factors

By constructing and applying an integrated analytical paradigm that combines cutting-edge econometrics with machine learning methods, this study conducted an in-depth causal exploration and heterogeneity analysis of mental health risk factors among Chinese construction workers based on cross-sectional survey data. The results not only validate and supplement core theories in the field of occupational health but, more importantly, provide a solid scientific basis and clear target pathways for precision interventions.

The core finding of this study—that emotional exhaustion has a significant causal effect on depression, anxiety, and stress—provides strong empirical evidence for understanding the psychological consequences of burnout. This deeply reinforces the Job Demands-Resources (JD-R) model and Conservation of Resources (COR) theory \cite{Ling2025}. When work demands continuously deplete employees' resources without effective replenishment, it leads to emotional exhaustion. The continuous loss of resources further triggers defensive "depersonalization," an emotional detachment strategy adopted to preserve remaining psychological resources. Our causal estimates, obtained while strictly controlling for confounding variables, quantify this pathological pathway from burnout to psychological distress. This confirms that emotional exhaustion is not only a core dimension of burnout but also a key causal antecedent inducing generalized negative emotions \cite{Fernandez2000}. Moving beyond the traditional singular focus on risk factors, this study identified the significant protective roles of work meaning and emotional self-regulation through causal inference. This provides empirical support for COR theory, which emphasizes that individuals not only strive to protect existing resources but also actively invest in and acquire new resources (e.g., psychological capital, positive work experiences) to cope with stress and threats. As a key intrinsic resource, a sense of work meaning helps workers find a sense of value and belonging in harsh environments, thereby buffering the impact of external pressure \cite{Steger2012}. Meanwhile, emotional self-regulation is a dynamic personal resource used to manage emotional responses, directly reducing the risk of anxiety and depression \cite{Gross2009}. This study not only validates the core mechanisms of COR theory but also constructs a more comprehensive "Risk-Protection Balance Model" by integrating both types of factors, addressing the insufficient discussion of the causal role of protective factors in existing occupational health literature.

The identified high-risk factors highly overlap with the risk factors showing significant causal effects in the overall model. This overlap carries profound practical significance: it reveals that the factors driving up the average risk of the population are precisely the problems concentrated in extreme high-risk individuals. This convergent evidence greatly enhances the credibility of these factors as core intervention targets. Similarly, the protective factors identified in the latent profiles are highly consistent with the protective causal variables in the global model, further confirming the critical role of enhancing these resources in building psychological resilience.

This overlap and interaction provide a direct blueprint for stratified intervention strategies [FIGURE:61]. For the "Mental Health Group" (61.7%), the focus should be on universal prevention, such as promoting work meaning and emotional regulation skills through corporate culture and team building. The "At-Risk Group" (5.7%) should receive selective prevention, including mental health screening and stress management workshops to prevent them from sliding into high-risk states. For the critical "High-Risk Group" (32.6%), indicative prevention and early treatment must be implemented, providing one-on-one psychological counseling, Employee Assistance Programs (EAP), and comprehensive interventions such as job adjustments.

According to the framework by Mrazek and Haggerty (1994), this stratified intervention model based on risk levels allows limited public health resources to be deployed with maximum efficiency where they are needed most, thereby maximizing intervention effectiveness.

The most prominent contribution of this study lies at the methodological level. To address the fundamental challenge of causal inference with cross-sectional data, we proposed a "four-in-one" framework. This provides a systematic and highly robust solution for processing complex observational data, the advancement and universality of which warrant further discussion.

Regarding its advancement, the integrated analytical paradigm proposed in this study contributes methodologically by systematically advancing data analysis from descriptive association to robust causal inference and precise population segmentation. Data-driven Elastic Net screening avoids subjective model specification bias at the source, laying an objective foundation for factor identification \cite{Hastie2005}. The Double Machine Learning (DML) model controls for high-dimensional confounding through flexible machine learning algorithms, enabling us to obtain effect estimates approaching causality from cross-sectional data, which significantly enhances the theoretical value of the findings. More importantly, Latent Profile Analysis (LPA) identifies heterogeneous sub-populations with distinct internal risk patterns, transforming "precision health" from a concept into an actionable practical goal. The built-in robustness assessment framework goes beyond routine analysis by actively challenging and quantifying the stability of the conclusions against unmeasured confounding, greatly enhancing the scientific credibility of our findings. Furthermore, the quantitative indicator screening scheme proposed for covariate selection is also original \cite{Steger2012}. In terms of universality, this framework is not limited to construction workers or the field of mental health. Any field involving complex interactions where randomized controlled trials are difficult to conduct—such as education economics, policy evaluation, the causes of social inequality, or chronic disease risk exploration—can benefit from this paradigm. It allows for the extraction of higher-level causal evidence from existing observational data, saving research costs, improving the quality of preliminary studies, and breaking down research barriers to support scientific decision-making and precision intervention.

Limitations and Future Outlook

Despite significant progress in methodology and application, this study has limitations that point toward future research directions. The inherent limitation of the cross-sectional design is fundamental. Although methods like DML greatly enhance the strength of causal inference, they cannot replace true time-series data to establish the uniqueness of causal direction. Future research should strive to establish prospective cohorts and conduct at least two waves of longitudinal surveys to verify the temporal correctness of the causal paths identified here.

Secondly, the data are derived from self-reports, which may be subject to common method bias and recall bias, especially regarding sensitive issues such as alcohol use. Future studies could consider introducing objective indicators, such as monitoring physiological stress markers (e.g., heart rate variability, cortisol levels) via wearable devices, to provide a more comprehensive health assessment.

Finally, the development and evaluation of intervention measures are the ultimate goals. While this study identified intervention targets, the next step is to design specific intervention programs based on these targets and evaluate their cost-effectiveness through randomized controlled trials (RCTs) or quasi-experimental designs. This will strengthen the evidence for action and truly translate research findings into practical outcomes that improve the psychological well-being of construction workers.

Author Contribution Statement:
Ling Zhicheng completed the methodological definition, data analysis, visualization, and original draft writing.

Liu Yuhan completed data analysis and original draft writing. Wang Juan completed data analysis and original draft writing. Huang Long completed data collection, methodological definition, final manuscript review, and funding acquisition.

Ethical Statement:

This study follows the Declaration of Helsinki and has been approved by the Ethics Review Committee of Wannan Medical College. All participants provided informed consent before the survey. The research process strictly adhered to confidentiality principles, and data were anonymized to ensure that the privacy and rights of participants were protected.

References:
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018).

Double/debiased machine learning treatment structural parameters.

Econometrics Journal Fagbenro, Sunindijo, Illankoon, Frimpong, (2024).

Importance Prefabrication Easing Construction Workers Experience Mental Health Stressors.

International Journal Environmental Research Public Health Fernandez, Castilla, Moore, (2000).

Social Capital Work: Networks Employment Phone Center.

American Journal Sociology Ling, Zhang, Zhang, Zhang, Huang, (2025).

Construction workers depression, anxiety, stress, factors China: cross-sectional study.

Journal Global Health

Samuel, Yosia, Changxin, Frimpong, (2022). Domains Psychosocial Factors Affecting Young Construction Workers:

Systematic Review. Buildings Samuel, Yosia, Changxin, Frimpong, Ayirebi, Kolawole, (2025). scoping review research mental health conditions among young construction workers.

Construction Innovation Steger, Duffy, (2012). Measuring Meaningful Work:

Meaning Inventory (WAMI). Journal Career Assessment Hastie, (2005).

Regularization Variable Selection Elastic Journal Royal Statistical Society Series Statistical

Methodology

(2010). Mental health levels of underground coal miners (01). (2009). Mental health status of printing factory workers exposed to n-hexane (03), 222-224+226. (2017).

Structural equation model analysis of the impact of psychosocial factors on the mental health of enterprise workers.

China Occupational Medicine (02), 188-192+197. (2004). A controlled study on the mental health of laid-off workers from state-owned enterprises. China Journal of Health Psychology, 301-302+274.

Chen Qiuzhu & Guo Wenbin (2000). Investigation of the mental health status of laid-off workers. Journal of Health Psychology.

Scale Details, Cut-off Values, or Clinical Grading (Cronbach's $\alpha$): Depression, Anxiety, and Stress Scale (DASS-21) for symptoms of depression, anxiety, and stress; Insomnia Severity Index (ISI) for insomnia severity (0-4); Alcohol Dependence Scale (0-3); Maslach Burnout Inventory-General Survey (MBI-GS).

7 点频率量表

The Work-Family Conflict Scale (MBI-GS) (0-6) and the Family Conflict Scale (1-5) are utilized to assess the degree of interference between professional and domestic life; higher scores on these scales indicate a more pronounced level of conflict between work and family responsibilities.

The General Self-Efficacy Scale (GSES) (1-4) is employed to measure an individual's overall belief in their own competence, where higher scores reflect a stronger sense of general self-efficacy. Conversely, on the resource scarcity and psychological stress scale (1-7), higher scores indicate that the individual perceives a greater degree of resource depletion and experiences higher levels of psychological pressure.

5 点频率量表

Higher scores indicate a more severe degree of perceived workplace ostracism by the individual (1-5).

Standardized Network Centrality Coefficients; Betweenness Centrality; Closeness Centrality; Eigenvector Centrality; Composite Centrality; Workplace Well-being; Sense of Meaning at Work; Motivation to Prove Self-Competence; Work Self-Efficacy; Career Centrality; Proactive Personality; Self-Emotion Appraisal; Work-Family Conflict; Proving Self-Competence; Self-Emotion Regulation; Others' Emotion Appraisal; Unethical Pro-organizational Behavior (UPB); Number of Workplace Injuries.

Submission history

From Correlation to Causation: Integrated Causal Inference and Precision Intervention Pathways for Construction Workers' Mental Health Risks