ChinaRxiv

Large Language Models that Can Distinguish One-shot from Repeated Games: Understanding and Intervention in Risky Decision-Making

Zhou Lei, Li Litong, Wang Xu, Qu Huafeng, Hu Qianyu, Li Aimei, Gu Chenyan

Submitted 2025-12-07 | ChinaXiv: chinaxiv-202509.00060

Note: Figures in this paper have not yet been translated.

Abstract

Theoretical research on risky decision-making primarily relies on backward inference from behavioral outcomes and self-report data, lacking direct observation of the decision-making process, which constrains the explanation of its underlying mechanisms and the development of effective behavioral intervention programs. The application of artificial intelligence large language models (LLMs) provides a pathway to overcome these limitations. This paper systematically examines the simulation potential of LLMs in risky decision-making through three studies, conducting single-shot and repeated games based on DeepSeek-R1 and generating decision rationales, employing GPT-4o for inductive thematic analysis (ITA) of these rationales, constructing a technical pathway for LLMs to generate decision strategy texts, and applying it to decision interventions. Findings: (1) ChatGPT-3.5/4 can reproduce the typical choice patterns of humans in single-shot (more risk-averse) versus repeated (more risk-seeking) games; (2) LLMs can distinguish between single-shot/repeated game logics and correctly apply normative and descriptive theories respectively to generate corresponding strategies, which are highly recognized; (3) Intervention texts generated by LLMs based on different strategies can effectively influence people's inherent risky decision-making preferences in medical, financial, content creation, and e-commerce marketing contexts. This research systematically validates LLMs' capability to simulate behavioral preferences and their understanding of decision-making, and constructs a new paradigm for decision interventions based on generative AI, providing theoretical and practical foundations for AI-assisted high-stakes decision-making.

Full Text

Large Language Models Capable of Distinguishing "Single-Play vs. Repeated-Play Gambles": Understanding and Intervening in Risky Decision-Making

Lei Zhou¹, Litong Li¹, Xu Wang¹, Huafeng Ou¹, Qianyu Hu¹, Aimei Li², Chenyang Gu¹

¹School of Management, Jinan University, Guangzhou 510632
²School of Management, Jinan University, Guangzhou 510632

(Special Issue on Psychology and Artificial Intelligence Development)

Abstract

Risky decision-making (RC) is a common and important form of decision-making in daily life. Its theoretical development primarily follows two major paths: normative theory and descriptive theory. The paradigms of single-play and repeated-play gambles provide an effective framework for distinguishing between these theories. However, prior research lacks direct observation of the decision-making process, which limits the deep understanding of individual behavior and hinders the development of effective behavioral interventions. In recent years, large language models (LLMs) have demonstrated highly human-like characteristics by not only simulating human preferences in behavioral performance but also exhibiting similar reasoning pathways. This offers a promising solution to the aforementioned limitations. This study, grounded in the classic RC paradigms of single versus repeated gambles, investigates the capability of LLMs to simulate and understand risk preferences and decision-making processes. Specifically, this study explores the potential of LLMs' understanding of decision strategies to generate intervention texts and evaluates their effectiveness in influencing human decisions.

This work comprises three studies. In Study 1, GPT-3.5 and GPT-4 were employed to simulate human responses to gambling decisions under nine probability conditions (with constant expected value), which generated a total of 3,600 responses across single and repeated gamble scenarios. In Study 2, LLM-generated strategies were constructed through a three-stage process (decision rationale extraction, strategy generation and quality evaluation), then human participants were required to complete decision-making tasks in two experiments: Experiment 1 replicated the medical/financial scenarios (N = 349, N_male = 174, M_age = 21.79) of Sun et al. (2014) in a 2 (context: medical vs. financial) × 2 (application frequency: single vs. repeated) within-subjects design, and Experiment 2 examined digital contexts with a 2 (context: content creation vs. e-commerce marketing) × 2 (frequency: single vs. repeated) mixed design (context as between subjects). Subsequently, DeepSeek-R1 was used to perform the same tasks and generate strategy texts through the three-stage process. Finally, participants were instructed to evaluate their acceptance of the LLM-generated strategies. Study 3 extended the Study 2 methodology to determine whether the LLM-generated intervention texts could reverse participants' classic choice preference across the single versus repeated gamble scenarios. The Study 2 experimental contexts (Experiment 1: medical vs. financial, N = 460, N_male = 205, M_age = 21.80; Experiment 2: content creation vs. e-commerce marketing, N = 240, N_male = 106, M_age = 29.12) were mirrored in Study 3, in which strategically designed intervention texts were presented during the decision-making tasks to test their capacity to modify participants' inherent risk preference between the single and repeated gamble conditions and evaluate the persuasive efficacy of LLM-generated strategies on human decision biases.

Study 1 shows that the LLMs (GPT-3.5 and GPT-4) can successfully replicate the typical human pattern of risk aversion in single-play scenarios and risk seeking in repeated-play scenarios, though both models demonstrated an overall stronger tendency toward risk seeking compared with human participants. Study 2 demonstrates that human participants preferred low-EV certain options in single-play contexts and high-EV risky options in repeated-play contexts in both experiments. The participants also showed high agreement with the strategies generated by the LLMs in different scenarios. Study 3 confirms that the LLM-generated intervention texts can significantly influence participants' choice tendency in all four scenarios, with strong intervention effects observed in the single-play contexts. The LLM intervention strategies are characterised by reliance on expected value computations (normative) when promoting risky choices and emphasis on certainty and robustness (descriptive) when promoting safe choices.

In summary, this study demonstrates that (1) LLMs can effectively simulate context-dependent human preferences in risky choice, particularly the shift from risk aversion in single plays to risk seeking in repeated plays; (2) LLMs can distinguish between the logic underlying single and repeated gambles and apply normative and descriptive reasoning accordingly to externalise decision strategies; and (3) the decision strategies extracted from LLM-generated reasoning can be used to construct effective intervention texts that can alter human preferences in classic risk decision tasks, thereby validating the feasibility and effectiveness of an LLM-based cognitive intervention pathway. This study offers a new technological paradigm for AI-assisted decision intervention and expands the application boundary of LLMs to human cognitive process modelling and regulation.

Keywords: risk decision-making, single- vs. repeated-play gambles, large language models, decision strategy, intervention

1 Introduction

Risky decision-making is a core topic in behavioral research, crucial for human survival and development. Its theoretical development primarily follows two major paths: normative theory and descriptive theory \cite{Benartzi_Thaler_1999}. The paradigms of single-play and repeated-play gambles provide an effective method for distinguishing between these two theoretical approaches. However, existing theoretical research mainly relies on reverse inference from behavioral outcomes and self-report data, lacking direct observation of the decision-making process. This limitation constrains the explanatory power of underlying mechanisms and hinders the development of effective behavioral intervention strategies.

In recent years, artificial intelligence (AI), particularly large language models (LLMs), has offered new perspectives for risky decision-making research. LLMs can simulate human decision-making preferences and externalize abstract decision strategies. However, this raises important questions: Are AI-generated decision strategies acceptable to humans? Can these strategies effectively intervene in human decision-making? Addressing these questions not only reveals AI's explanatory power regarding human decision preferences, providing new evidence for decision model construction, but also offers theoretical foundations for AI-based decision nudging.

This paper conducts three studies using LLMs to systematically examine their simulation capabilities for risky decision-making preferences and process mechanisms, investigate their understanding of human decision strategies, construct methods for generating intervention texts targeting single-play and repeated-play risk preferences using LLMs, and validate intervention effectiveness through experiments.

1.1 Behavioral Differences in Risky Decision-Making Between Single-Play and Repeated-Play Gambles

Risky decision-making refers to decisions where options have multiple outcomes with known probabilities \cite{Kahneman_Tversky_1979}. Theoretical development has bifurcated into normative and descriptive paths: normative models primarily assume mathematical expectation, following the "weighted sum" principle of expected value theory \cite{VonNeumann_Morgenstern_1947}, while descriptive models adopt psychological perspectives, focusing on biases and heuristics in actual decision-making behavior \cite{Hertwig_Erev_2009}.

Real-world risky decisions can be categorized as single-play or repeated-play scenarios. For instance, entrepreneurs may encounter one or multiple similar investment opportunities within a period, and doctors may treat one or multiple patients with similar conditions \cite{Redelmeier_Tversky_1992}. Research has found that people often exhibit preference reversals between single-play and repeated-play contexts \cite{Anderson_etal_2023}. For example, Sun et al. (2014) designed a financial emergency management task where participants chose between: Option A, which guaranteed ¥5,000; and Option B, which offered a 0.1% probability of obtaining ¥6,000,000 and a 99.9% probability of obtaining ¥0. Participants made choices in both single-play (implementing the plan once) and repeated-play (implementing the plan 100 times) contexts. Results showed that people preferred the certain Option A in single-play gambles but favored the risky Option B in repeated-play gambles. This choice preference difference has been repeatedly validated across multiple studies \cite{Lopes_1996, Kumar_Lim_2008, Barberis_Huang_2009}.

Single-play and repeated-play choice preferences can be explained by descriptive and normative theories, respectively. According to classic normative theory—Expected Utility Theory—in repeated-play gambles, individuals tend to calculate expected values by weighting outcomes by their probabilities and select the option with the maximum expected value \cite{VonNeumann_Morgenstern_1947}. In single-play gambles, however, people's choices often violate expected utility assumptions and align more closely with descriptive theories. For example, Myopic Loss Aversion Theory proposes that in single-play contexts, individuals' excessive sensitivity to loss leads them to avoid risk and choose certain options \cite{Tversky_BarHillel_1983}. Similarly, the equate-to-differentiate theory \cite{Li_2004} suggests that decision-makers equate dimensions with small differences (e.g., probability) while basing decisions only on dimensions with large differences (e.g., outcome magnitude).

Traditional risky decision-making research suffers from several limitations: First, it primarily relies on reverse inference from behavioral outcomes, lacking systematic direct observation of decision processes; second, exploration of decision strategies depends on self-report data and choice results, leading to insufficient explanatory power regarding underlying mechanisms. These constraints not only limit theoretical depth in explaining real-world complex decision contexts \cite{Kelton_etal_2010, Morreale_etal_2018} but also hinder the development of effective behavioral intervention strategies.

1.2 Artificial Intelligence Facilitating Decision Theory Research

AI development and application provide new perspectives and approaches for psychological research. As the most advanced AI tool, LLMs have demonstrated "human-like" characteristics across multiple domains \cite{Salles_etal_2020, 焦丽颖等_2025}. At the behavioral level, LLMs are often directly employed as experimental subjects. For instance, Qin et al. (2024) successfully replicated 22 social psychology experiments, including leadership behavior and workplace interactions, using ChatGPT with 93.2% accuracy; Argyle et al. (2023) found that GPT-3 can simulate different human subpopulations in social science research, generating responses highly similar to humans in form, semantics, and structure; Strachan et al. (2024) noted that LLMs' theory-of-mind capabilities are comparable to humans. At the process mechanism level, multiple studies have found that LLMs' feature extraction pathways and hierarchical processing mechanisms converge with brain neural response patterns \cite{Mischler_etal_2024}, demonstrating similarities between LLMs' language processing mechanisms and the human brain.

In decision-making research, LLMs can simulate human decision preferences. For example, Zhao et al. (2024) found that ChatGPT-4 can adjust its risky decision-making behavior under emotional arousal, showing lower risk preference especially when fear is induced; Aher et al. (2023) found that ChatGPT exhibits human-like cooperative behavior in prisoner's dilemma, ultimatum, and trust games. Additionally, studies have found LLMs show high anthropomorphism in voting behavior, economic games, and classic social experiments \cite{Dillion_etal_2023}. Jones and Steinhardt (2022) revealed that LLMs are susceptible to irrelevant information and initial inputs when generating open-ended text, leading to output deviations similar to human framing and anchoring effects. Park et al. (2024) noted that LLMs align with humans in the "less-is-better effect" experiment, preferring lower-priced goods.

Furthermore, LLMs demonstrate "human-like" qualities at the decision process level. For instance, research has found that LLMs engage in chain-of-thought reasoning by re-reading context when generating words, similar to humans using sticky notes to aid thinking or iteratively refining arguments when solving math problems or writing essays \cite{Hagendorff_etal_2023}. Moreover, as model complexity increases, LLMs can reduce reasoning biases, exhibiting System 2-like thinking \cite{Wei_etal_2022}. Additionally, Binz and Schulz (2023) revealed through two-step task experiments that GPT-3 exhibits human-like model-based reinforcement learning characteristics, adjusting behavioral strategies based on environmental states and reward information to maximize rewards. Similarly, Webb et al. (2023) found that GPT-3 demonstrates reasoning abilities comparable to or even stronger than humans across four analogy reasoning tasks, showing certain similarities in analogical reasoning and abstract pattern recognition. Goli and Singh (2024) found that GPT, like humans, considers multiple factors in intertemporal decision-making, particularly regarding risk and uncertainty, opportunity costs and investment choices, and urgency and immediate needs.

In summary, LLMs can simulate both human decision preferences and process mechanisms, thereby helping reveal key steps in the cognitive "black box." Additionally, this capability facilitates transforming abstract theories into concrete strategies, making decision logic more tangible. These advantages provide new pathways to overcome traditional decision theory research limitations.

1.3 Artificial Intelligence Intervening in Decision-Making Behavior

Leveraging their powerful content generation and multimodal interaction capabilities, LLMs can effectively intervene in people's behavior and improve decision quality \cite{Gupta_etal_2024, 章彦博等_2025}. In medical decision-making, for example, AI chat assistants like ChatGPT have been validated to significantly improve vaccine health literacy and vaccination confidence, effectively correct public cognitive biases about vaccines, and generate high-quality public health communication or vaccine advocacy materials \cite{Altay_etal_2023, Deiana_etal_2023, Karinshak_etal_2023, Xia_etal_2025}. Moreover, LLM interventions can affect not only behavior but also reshape individuals' cognitive processing pathways: For instance, Ferguson et al. (2023) found that GPT-3-generated subtle sexism explanation texts can significantly adjust people's understanding and expression, including language style and argumentation angles.

Compared with traditional intervention methods, LLM interventions offer several advantages: (1) Higher personalization: Traditional interventions typically adopt static "one-size-fits-all" strategies, easily disrupted by individual differences and contextual factors \cite{Jin_Han_2014, Pavey_Churchill_2014, Peters_Levin_2008}; LLMs can dynamically adjust strategies based on individuals' behavioral and psychological characteristics, achieving personalized precision interventions \cite{Xia_etal_2025}. (2) Higher intelligence: Traditional interventions mainly rely on manual design and preset information, lacking the ability to integrate complex contextual information and respond in real-time; LLMs can integrate multi-source data in real-time and automatically generate more targeted and persuasive texts \cite{Choi_etal_2025}. (3) More stable effects: The applicability and stability of traditional interventions remain controversial. For example, Peng et al. (2021) noted that negative framing better enhances vaccination willingness, while Liu et al. (2019) argued that positive framing is more persuasive; LLMs acquire strong reasoning capabilities through large-scale corpus training \cite{deVarda_etal_2025, Strachan_etal_2024}, generating intervention texts with strong readability and logical coherence, demonstrating stable cross-contextual applicability.

However, although LLMs can generate intervention texts based on instructions, their effectiveness is highly dependent on input quality due to their probabilistic word-output mechanism \cite{Lin_2024, Lin_2025, deKok_2025, Khalid_Witmer_2025}. Additionally, when lacking sufficient context or background information, LLMs are prone to generating incorrect or fabricated information, requiring supplementation with examples, instructions, or fine-tuning to incorporate professional knowledge \cite{Lin_2023}.

To overcome these limitations, LLMs can be employed for Inductive Thematic Analysis (ITA) to improve intervention text quality \cite{Zhang_etal_2024}. Compared with traditional ITA methods (e.g., manual extraction, LDA, and BERTopic), LLM-based ITA offers lower costs, stronger contextual understanding, and more effective processing of unstructured data \cite{Prescott_etal_2024}. Currently, this approach shows promising application prospects in healthcare \cite{Deiner_etal_2024}, education \cite{Katz_Fleming_etal_2024}, transportation \cite{Carvalho_etal_2024}, and customer satisfaction research \cite{Dai_etal_2023}.

Thus, artificial intelligence, particularly LLMs, serves as an important tool for decision support, demonstrating significant decision optimization effects across multiple domains.

1.4 Research Objectives

In summary, traditional risky decision-making research lacks systematic observation of decision processes, limiting theoretical development and behavioral intervention effectiveness. Applying artificial intelligence helps clarify strategies and logic behind decisions and generates intervention plans to assist decision-making. Based on the classic single-play/repeated-play gamble paradigm, this study first reveals LLMs' simulation and understanding capabilities regarding risky decision preferences and processes, then explores how to apply LLMs' understanding of decision strategies to generate decision intervention texts, and validates their intervention effects on human decision-making.

This paper includes three studies: Study 1 examines LLMs' (ChatGPT-3.5/4) performance in simulating human decisions through classic single-play and repeated-play gambling tasks. Study 2 first constructs a method for LLMs to generate decision strategy texts: using DeepSeek-R1 to generate decision rationales for single-play and repeated-play gambles, then employing GPT-4o for inductive thematic analysis (ITA) to build a technical pathway for generating LLM decision strategy texts, which is then applied to decision intervention. Study 3, based on Study 2's methodology, tests whether LLM-generated strategy texts can effectively influence people's inherent risk decision preferences in medical, financial, content creation, and e-commerce marketing contexts.

This study selected GPT-4, GPT-3.5, and DeepSeek-R1 as research objects based on the following considerations: (1) These models demonstrate superior performance in semantic understanding and logical reasoning \cite{Hebenstreit_etal_2024, Lehr_etal_2024, DeepSeekAI_etal_2025}, representing advanced levels of current closed-source and Chinese open-source models. (2) The GPT series and DeepSeek originate from Western and Chinese cultural backgrounds, respectively, providing a cross-cultural perspective for examining LLMs' risky decision-making behavior and enhancing the generalizability of research conclusions.

The research aims to address three core questions:
Question 1. Can LLMs simulate human choice patterns in single-play and repeated-play gambles?
Question 2. Can LLMs distinguish the logic behind single-play and repeated-play gambles and correctly apply normative and descriptive theories to generate corresponding strategies?
Question 3. Can LLM-generated strategy texts effectively intervene in humans' inherent choice preferences in risky decision-making?

2 Study 1: LLMs Simulating Human Risky Decision Preferences

This study aims to evaluate LLMs' (GPT-3.5 and GPT-4) risky decision-making performance in single-play versus repeated-play contexts, addressing two core questions: (1) Do LLMs exhibit behavioral preference reversals between single-play and repeated-play contexts like humans? (2) How similar are LLMs' decision-making performance and human decision outcomes?

2.1.1 Experimental Subjects

This study selected two representative LLMs in the field—OpenAI's GPT-3.5 and GPT-4—as experimental subjects, accessed via API interfaces \cite{Mei_etal_2024} to ensure data independence and parameter consistency. The following measures were taken to guarantee research accuracy and stability: (1) Standardized parameter settings: Balancing experimental standardization and comparability; (2) Standardized prompt design: Adopting a unified three-part prompt structure (system/user/assistant) \cite{Chen_etal_2023} to avoid output biases due to prompt differences; (3) Multiple independent measurements: Each parameter task was repeated \cite{Strachan_etal_2024}. All experimental procedures strictly followed OpenAI's usage protocols and ethical guidelines, ensuring: (1) Only task-relevant content was generated; (2) Data privacy and security were protected; (3) Obtained data were used solely for academic research purposes and not disclosed to any third-party institutions or organizations.

2.1.2 Experimental Design and Materials

This study employed Samuelson's (1963) classic "fallacy of large numbers" gambling paradigm to manipulate single-play and repeated-play risky decision-making contexts. This experiment reveals that when individuals face attractive gambling opportunities, they typically refuse participation in single-play contexts but tend to accept in repeated-play contexts. Specific tasks were as follows:

(1) Single-play: Imagine a gambling game where players have a 50% probability of winning ¥2000 and a 50% probability of losing ¥500. Suppose you have the opportunity to participate in this gamble once. Would you choose to participate? Please answer "participate" or "not participate."

(2) Repeated-play: Imagine a gambling game where players have a 50% probability of winning ¥2000 and a 50% probability of losing ¥500. Suppose you have the opportunity to gamble five times. Would you choose to participate? Please answer "participate" or "not participate."

The experiment used a 2 (model type: GPT-3.5 vs. GPT-4) × 2 (gamble type: single vs. repeated) between-subjects design. To compare behavioral performance under different probability conditions, the total expected value of options was fixed at ¥750 (expected value of gain outcomes was ¥1000, loss outcomes ¥250). Nine groups of decision tasks with different risks but equal expected values were created by adjusting probability and value parameters (see Table 1 [TABLE:1]).

Considering GPT's sensitivity to order \cite{Lu_etal_2022, CodaForno_etal_2023}, the presentation order of options was balanced (e.g., the order of "gain/loss" and "participate/not participate"). Therefore, each gamble task had four presentation formats. Additionally, to test the consistency of GPT's decision-making behavior, each decision task was simulated 100 times (i.e., 25 queries per presentation format), ultimately creating 2 models × 2 gamble contexts × 9 decision tasks × 100 repetitions = 3,600 conversation windows. Strachan et al. (2024) noted that LLMs do not retain historical information across different conversation windows; moreover, Mei et al. (2024) found that LLMs can not only simulate human behavioral patterns and thinking styles but also exhibit individual differences similar to humans. Based on this, this study treated these 3,600 conversation windows as 3,600 independent behavioral samples while maintaining consistent prompt structures.

Table 1 Study 1 experimental task parameters

Gamble Task Amount (¥) Probability (%) Amount (¥) Probability (%) +10000 +5000 +3333 +2500 +2000 +1667 +1429 +1250 +1111

2.1.3 Experimental Procedure

The study adopted a zero-shot prompt design to minimize interference from preset information on model decisions. To ensure experimental accuracy: First, each interaction created an independent session to exclude contextual residual effects. Second, experimental prompts adopted a standard three-part framework (system/user/assistant) \cite{Chen_etal_2023}:

(1) System: Used to set model operation instructions. This study required GPT to simulate human thinking processes and answer decision tasks using chain-of-thought methods \cite{Wei_etal_2022}.

During preliminary testing, various prompt structures were attempted (e.g., concise command type: "Please judge whether to participate in this gamble"; or open-ended type: "What would you choose?"). Results showed such prompts easily triggered model refusals, one-sided responses, or lack of decision rationales. Therefore, the prompt structure was improved to "role-playing + output decision rationale" settings, effectively guiding models to adopt "chain-of-thought" methods to generate responses with clear logic, which showed superior performance in response stability, task focus, and explanatory power. Additionally, existing research indicates that prompts with role-setting and reasoning requirements better elicit LLMs' step-by-step reasoning processes analogous to "System 2," thereby improving output logic and normativity \cite{Wei_etal_2022, Khalid_Witmer_2025}. Based on empirical results and theoretical evidence, we uniformly adopted this prompt structure as the experimental standard input to ensure comparability across models and tasks. The study's prompt was:

"You will conduct multiple rounds of decision tasks and must provide an answer in each round. If you do not provide an answer, I will assume you chose randomly. I want you to make decisions like a human. Please wait for my prompt, then tell me your answer and briefly explain your decision rationale."

(2) User: Randomly assigned a specific gamble task presenting the gambling scenario, requiring the model to make a "participate" or "not participate" judgment.

(3) Assistant: The model generates response text (decision outcome) based on the prompt.

All outputs were automatically processed via Python programs. Following Goli and Singh's (2024) data processing standards, responses that refused to answer (1 instance, e.g., "As an AI, I do not have personal opinions or emotions") or did not follow expected response formats (12 instances, e.g., "Is this gamble conducted individually or multiple times?") were excluded. Valid data totaled 3,587 instances, with an overall validity rate of 99.64%. Subsequently, the proportion of "participate" choices under each condition was calculated as the core indicator for measuring LLMs' risk preference and behavioral patterns. A prompt example is shown in Figure 1 [FIGURE:1]; others are detailed in Supplementary Material A1.

Additionally, to ensure experimental standardization and comparability, this study referenced multiple LLM decision-making studies with similar themes and tasks \cite{Ding_etal_2024, Goli_Singh_2024, Karinshak_etal_2024, Arora_etal_2024} and uniformly set model parameters as follows: (1) The max_tokens parameter was set to 256 to ensure content generation completeness and model reasoning capability; (2) The temperature parameter was set to 1 to balance output diversity and content quality; (3) Frequency penalty and presence penalty parameters were canceled to avoid artificial interference with model output distribution. These parameter configurations aimed to maximally preserve models' natural generation characteristics, ensuring experimental results authentically reflected GPT models' behavioral performance in risky decision tasks \cite{Chen_etal_2023}.

Figure 1 Study 1 prompt example

2.2.1 Similarity Between GPT and Human Choice Preferences

The study employed a binomial logistic regression model to examine the effects of model type (GPT-4 vs. GPT-3.5), gamble context (single vs. repeated), and their interaction on GPT behavior ("participate" vs. "not participate"). In the model, "GPT-4," "repeated gamble," and "participate" were coded as 1, while "GPT-3.5," "single gamble," and "not participate" were coded as 0.

Regression results are shown in Table 2 [TABLE:2]. The main effect of gamble context was significant (β = 0.597, SE = 0.159, Wald χ²(1) = 14.17, p < 0.001, Exp(β) = 1.82), indicating significant differences in GPT's decision-making behavior between single-play and repeated-play contexts. Specifically, GPT's "participate" proportion in repeated-play contexts (89.56%) was significantly higher than in single-play contexts (80.35%), demonstrating that LLMs exhibit human-like preference reversal: relatively more risk-averse in single plays and more risk-seeking in repeated plays. The main effect of model type was significant (β = –0.815, SE = 0.124, Wald χ²(1) = 43.09, p < 0.001, Exp(β) = 0.44), with GPT-3.5's participation rate (89.37%) higher than GPT-4's (80.54%), indicating GPT-4 tends to choose "not participate" in decisions, being relatively more conservative. This result may stem from OpenAI's introduction of "Rule-Based Reward Models" in recent model training \cite{Achiam_etal_2023}, which provides additional rewards for models' refusal responses to potential risks, thereby guiding them to avoid high-risk outputs. Consequently, GPT-4 produces relatively fewer "dangerous answers" (i.e., high-risk options). Similarly, technical reports note that GPT-4 may be overly cautious due to high safety weighting, potentially reinforcing its conservative behavioral patterns in high-risk decision contexts \cite{Achiam_etal_2023, Yuan_etal_2023}. Moreover, the interaction between gamble context and model type was not significant (β = 0.251, SE = 0.202, Wald χ²(1) = 1.55, p = 0.213, Exp(β) = 1.29), indicating that gamble context's effect on risk preference remained consistent between GPT-3.5 and GPT-4. LLMs' average choice proportions under each probability condition are detailed in Figure 2 [FIGURE:2]A.

In summary, these results demonstrate that GPT exhibits human-like behavioral preference reversal effects between single-play and repeated-play contexts, being more risk-averse in single plays than repeated plays. Additionally, GPT-4 is more risk-averse than GPT-3.5.

Table 2 Study 1 regression analysis results

Variable β SE 95% CI Wald χ² Exp(β) p Gamble frequency (repeated=1, single=0) 0.597 0.159 [0.289, 0.911] 14.17 1.82 <0.001 Model type (GPT-4=1, GPT-3.5=0) -0.815 0.124 [-1.061, -0.574] 43.09 0.44 <0.001 Gamble frequency × Model type 0.251 0.202 [-0.149, 0.646] 1.55 1.29 0.213

2.2.2 Differences Between GPT and Human Choice Preferences

Using classic experimental data as reference, data from GPT-3.5 and GPT-4 under equivalent probability conditions were selected to compare differences between GPT and human choice preferences.

Based on Redelmeier and Tversky's (1992) classic findings (N = 185), when facing the gamble task "50% chance to win $2000, 50% chance to lose $500," 43% of participants chose to participate in a one-time gamble, while 63% were willing to participate in five identical gambles (p < 0.005). Chi-square comparisons between GPT-generated data and these results under the same probability conditions (see Figure 2B) showed that LLMs' participation proportions were significantly higher than humans' in both single-play and repeated-play contexts (single-play: humans: 43%; GPT-3.5: 84.85%, χ²(1, N = 284) = 45.75, p < 0.001; GPT-4: 100%, χ²(1, N = 283) = 75.06, p < 0.001; repeated-play: humans: 63%; GPT-3.5: 95.92%, χ²(1, N = 284) = 14.55, p < 0.001; GPT-4: 100.00%: χ²(1, N = 283) = 36.06, p < 0.001).

These results indicate that although GPT exhibits human-like behavioral patterns in single-play and repeated-play tasks, its risk-seeking willingness is significantly higher than humans'.

Figure 2 Proportions of LLMs and humans choosing to participate in gambles in single-play and repeated-play contexts in Study 1. Panel A shows GPT (3.5/4) average participation rates across all probability conditions; Panel B shows GPT (3.5/4) participation rates at 50% probability, along with human data from Redelmeier and Tversky (1992) under equivalent probability conditions. In each bar group, from left to right represents GPT-3.5, GPT-4, and human groups, respectively.

3 Study 2: Method Construction and Acceptance Testing for LLM-Generated Strategy Texts

Study 2 introduces the LLM DeepSeek-R1 to explore how to generate decision strategies for single-play and repeated-play gambles based on its reasoning content and tests human acceptance of these strategies. The study first constructs a method for LLMs to generate single-play and repeated-play decision strategies. Then, using Sun et al.'s (2014) medical and financial contexts (Experiment 1) and self-designed emerging digital platform contexts (content creation and e-commerce marketing, Experiment 2), human participants replicated classic single-play/repeated-play findings and evaluated their acceptance of LLM-generated strategy texts.

3.1 Method Construction for LLM-Generated Strategy Texts

This study established a three-stage standardized generation process to construct strategy texts:

(1) Decision rationale acquisition and strategy theme extraction: First, the DeepSeek-R1 model completed single-play and repeated-play gamble tasks to obtain decision rationales under different gamble conditions. Second, refusals or format-noncompliant content were excluded, retaining only complete, clearly structured texts for subsequent analysis. Next, GPT-4o was called to conduct inductive thematic analysis (ITA) to extract core strategy themes embedded in decision rationale texts. This analysis process strictly followed Khalid and Witmer's (2025) four-step prompt engineering framework (task setting, structure design, result validation, result iteration). In prompt design, GPT-4o was designated as a "professional text analysis expert," with its task clearly defined as "inducting strategy themes from decision rationale texts," requiring output containing three components: "theme title, brief definition, representative statements."

(2) Strategy text generation: Based on extracted strategy themes, combined with experimental contexts (e.g., medical/financial/content creation/e-commerce marketing × single/repeated gamble) and target options (certain/risky), structured prompts guided GPT-4o to generate strategy texts. Prompts consisted of three parts: 1) specific context description; 2) target option setting; 3) intended strategy theme (e.g., "expected value maximization" or "heuristic decision-making"). To ensure output consistency and normativity, each strategy text was limited to 100–150 characters and uniformly adopted neutral, concise, and persuasive language style. Strategy texts served as materials for subsequent intervention research.

(3) Content evaluation: Strategy text quality was structurally scored across five dimensions: "rationality," "professionalism," "logicality," "readability," and "persuasiveness."

The standardized generation process is schematically shown in Figure 3 [FIGURE:3].

Figure 3 LLM decision strategy generation flowchart

3.1.1 Decision Rationale Acquisition and Strategy Theme Extraction

First, DeepSeek-R1 repeatedly executed single-play and repeated-play gamble tasks (design identical to Study 1) and generated reasoning processes. After obtaining decision rationales from its reasoning process, texts were grouped by gamble type (single vs. repeated) and inductive thematic analysis (ITA) was performed separately to generate decision strategies, systematically identifying dimensions relied upon by LLMs in different decision contexts.

The selection of DeepSeek-R1's reasoning process for strategy generation was based on the following reasons \cite{DeepSeekAI_etal_2025}: First, compared with other mainstream LLMs (e.g., Claude-3.5, GPT-4o, OpenAI o1-mini/o1-1217), DeepSeek-R1 demonstrates superior performance across multiple standardized reasoning tasks, including mathematics, programming, and knowledge Q&A (detailed data in Supplementary Material E Table S1). Second, DeepSeek-R1 gradually developed clearer, more structured reasoning processes through reinforcement learning during training. Finally, DeepSeek-R1 shows outstanding generalization capability and language generation quality, particularly ranking first in multiple Chinese evaluation benchmarks. In summary, DeepSeek-R1 exhibits strong reasoning and text generation capabilities, with particularly robust Chinese processing ability, better meeting ITA requirements for semantic completeness and logical clarity.

To ensure scientific rigor and reproducibility of ITA analysis, this study followed Khalid and Witmer's (2025) four-step prompt engineering (PE) framework: (1) Select single or multiple prompt strategies based on the analysis task; (2) Create specific prompt templates containing task instructions, analysis examples, and response instructions; (3) Execute these prompts and validate consistency with analysis objectives; (4) Evaluate and iteratively optimize model-generated results to ensure generated theme coding meets expected analysis standards.

In implementation, prompts for calling GPT-4o to conduct ITA mainly consisted of three parts: (1) Role setting: Instructing GPT-4o to act as a professional text analysis expert for thematic analysis; (2) Experimental content: Briefly explaining experimental purpose and design; (3) Output requirements: Requiring GPT-4o to output analysis results in three aspects: theme title, brief definition, and representative examples. A prompt example is shown in Figure 4 [FIGURE:4].

Figure 4 Study 2 inductive thematic analysis prompt example

3.1.2 Strategy Text Generation

Based on strategy dimensions obtained from inductive thematic analysis, the study employed GPT-4o to generate strategy texts corresponding to four decision contexts (medical, financial, content creation, e-commerce marketing) under single-play and repeated-play conditions.

Decision context description: Experiment 1 adopted medical and financial decision contexts adapted from Sun et al. (2014). These two contexts have high universality and practical importance in real life and are the most widely used and representative context types in risky decision-making research \cite{Reeck_etal_2022, Shahid_etal_2019, Popovic_etal_2019, Weber_etal_2002, Scarffe_etal_2024}. Materials were strictly translated following Brislin's (1986) translation-back-translation procedure: (1) Initial translation: Three business administration graduate students independently translated original English materials into Chinese; (2) Synthesis and coordination: All authors and three translators compared three translation drafts with the original, discussing and revising discrepancies sentence by sentence to form a Chinese draft; (3) Back-translation: One English major graduate student back-translated the Chinese draft into English; (4) Back-translation consistency evaluation: Two domain experts with Western academic backgrounds independently rated semantic consistency between back-translated and original English materials sentence by sentence using a 5-point Likert scale (1 = completely inconsistent, 5 = completely consistent), with an average score of 4.83, indicating high semantic consistency. Experiment 2 self-designed two innovative decision contexts: (1) "Content creation" context, simulating decision-making in short-video topic selection strategies; (2) "E-commerce marketing" context, simulating e-commerce operators' marketing strategy decisions. These two contexts represent emerging digital platform-specific risky decision contexts to expand external validity. Context descriptions are in Supplementary Material B1.

During text generation, the study strictly followed de Kok's (2025) four principles for LLM text analysis: (1) Clearly define research tasks and identify required information to avoid errors or hallucinations due to lack of context; (2) Balance performance and cost by selecting appropriate models and usage methods (e.g., zero-shot, few-shot, or fine-tuning); (3) Design high-quality prompts to ensure clear instructions, standardized formats, and easily parseable outputs; (4) Assess construct validity through systematic validation of model outputs using human-annotated samples to ensure credibility and usability.

Based on these factors, the study used three-element prompts to generate decision strategies: (1) Task content: Require GPT to write a persuasive message encouraging individuals to choose a specific option in a given context based on previously extracted theme content; (2) Theme content: Provide previously extracted gamble strategy content; (3) Task context: Medical, financial, content creation, and e-commerce marketing, testing text intervention adaptability and generalizability across domains. Specific prompt details are in Supplementary Material B2.

3.1.3 Content Evaluation

The quality of strategy texts generated for each context was evaluated through the following process: Via the Credamo platform (https://www.credamo.com/), 76 participants were recruited (33 for medical/finance contexts, 43 for content creation/e-commerce marketing contexts). After excluding 5 unqualified participants who failed attention checks or had response times beyond ±3 standard deviations \cite{Tabachnick_Fidell_2007, 王伊萌等_2023}, the final valid sample included 71 participants (medical/finance: 11 males, 19 females, mean age = 29.83; content creation/e-commerce marketing: 18 males, 23 females, mean age = 31.61). Detailed demographic distributions are in Supplementary Material E Table S2. Each participant read all eight categories of GPT-4o-generated strategy texts (presentation order balanced using Latin square design) and rated content across five dimensions (rationality, professionalism, logicality, readability, persuasiveness \cite{Xia_etal_2025}). Example: "I believe this text content performs very well in professionalism." Ratings used a 7-point Likert scale (1 = very poor, 7 = very good).

3.1.4 Results and Discussion

Inductive thematic analysis results showed that LLMs exhibited four core strategy dimensions in single-play and repeated-play contexts, respectively. Single-play strategies focused on: (1) Expected value and mathematical rational decision-making; (2) Risk tolerance and decision preferences; (3) Loss aversion and psychological effects; (4) High-return and low-probability events. Repeated-play strategies emphasized: (1) Expected return maximization; (2) Risk-return balance; (3) Cumulative effects of repeated gambles; (4) High returns covering low risks (see Supplementary Material A2).

Content evaluation results (one-sample t-tests) showed that for both medical/finance and content creation/e-commerce marketing contexts, all eight strategy texts generated by GPT-4o scored significantly above the midpoint of 3.5 across all dimensions (ps < 0.001; statistics detailed in Supplementary Material E Table S3). This indicates that GPT-generated strategy texts received high evaluations in rationality, professionalism, logicality, readability, and persuasiveness, laying the foundation for subsequent intervention research.

Through systematic analysis of LLMs' decision rationales under single-play and repeated-play contexts, the study extracted LLMs' strategy dimensions under different conditions and generated corresponding decision strategies: In single-play gambles, individuals prefer certain options, and LLMs' strategies emphasize the importance of "certainty dimension," arguing that even if the risky option has higher expected value, its uncertainty may lead to more severe consequences; in repeated-play gambles, individuals prefer risky options, and LLMs' strategies highlight rational judgment based on expected value calculation, reminding individuals to weight and sum according to expected value maximization principles rather than over-focusing on probability differences between options. Medical context tasks and corresponding generated decision strategies are shown in Figure 5 [FIGURE:5]; financial, content creation, and e-commerce marketing context tasks and strategies are in Supplementary Materials B3.2, B3.3, and B3.4.

This phenomenon indicates that GPT can concretize decision strategies humans employ in such tasks: When choosing certain options, individuals tend to rely on descriptive theories like the equate-to-differentiate decision model, making decisions based on comparing dimensions with larger differences (probability levels); when choosing risky options, individuals align more with normative theory assumptions, adopting expected utility theory frameworks to calculate optimal decisions through weighted summation of option probabilities and outcomes.

These results reveal LLMs' strategic tendencies under different gamble frequencies, demonstrating their ability to clearly distinguish the process mechanisms behind different choice preferences in single-play versus repeated-play contexts and correctly apply descriptive and normative theoretical logic frameworks to characterize these internal mechanisms.

Figure 5 Study 2 experimental task and LLM-generated decision strategy example

3.2 Testing Human Acceptance of LLM-Generated Strategies

This study includes two experiments. Experiment 1 used college students to replicate Sun et al.'s (2014) medical and finance single-play and repeated-play experiments, presenting participants with corresponding LLM-generated strategy texts to examine human acceptance of LLM decision strategies. Experiment 2, based on self-designed content creation and e-commerce marketing contexts, further tested Experiment 1's results and included participants with actual work experience to verify result robustness and expand external validity.

3.2.1 Experiment 1

Experimental design and sample: This experiment followed Sun et al.'s (2014) design, replicating their single-play/repeated-play tasks in medical and financial gain-frame contexts, using a 2 (context: medical vs. financial) × 2 (gamble frequency: single vs. repeated) within-subjects design.

Sample size was calculated using power analysis methods proposed by Pawel et al. (2023). First, effect sizes were derived from Sun et al.'s (2014) reported repeated-measures ANOVA main effect F-values: medical context F(1,113) = 12.48 (Cohen's f ≈ 0.33), financial context F(1,113) = 37.59 (Cohen's f ≈ 0.57). Using G*Power 3.1 with α = 0.05 and power = 0.80, minimum required sample sizes were 21 and 9 for the two contexts, respectively. However, considering potential publication bias may overestimate effect sizes in original research \cite{OpenScienceCollaboration_2015} and applying Simonsohn's (2015) "small telescopes" method to set actual sample size at 2.5 times the theoretical value (i.e., 285 participants) to enhance statistical power and replication validity, this study recruited 360 participants via Credamo. This represented approximately 3.2 times the original sample size (N = 114). Based on Sun et al.'s (2014) demographics, two conditions were set: (1) Age between 18–25; (2) Current student status. After excluding 11 unqualified participants with response times beyond ±3 standard deviations \cite{Tabachnick_Fidell_2007, 王伊萌等_2023}, the final valid sample included 349 participants (174 males, 175 females, mean age = 21.79, SD = 1.88). Detailed demographic distributions are in Supplementary Material E Table S4.

Materials and procedure: Following Sun et al. (2014), participants completed two tasks in a manager role: choosing disease response plans in the medical context and financial emergency plans in the financial context. Each context included a low-EV certain Option A and a high-EV risky Option B, distinguished as single-play (implement once) and repeated-play (implement 100 times) gamble conditions.

The experiment included two phases: (1) Decision phase: Participants rated their choice tendency on a 4-point scale (1 = very likely to choose Option A, 2 = likely to choose Option A, 3 = likely to choose Option B, 4 = very likely to choose Option B); (2) Strategy evaluation phase: Based on participants' choices (Option A/B), corresponding LLM-generated strategy texts were presented, and similarity between the texts and their own decision thinking was rated on a 7-point scale (1 = "not similar at all," 7 = "very similar"). Finally, demographic information was collected. Experimental materials are detailed in Supplementary Material C.

Results and discussion: To test the effects of gamble frequency and context on participants' risky decisions, the ordinal package \cite{Christensen_2023} was used in R to build a Cumulative Link Mixed Model (CLMM). The model used participants' 4-point scale choices as the dependent variable, with gamble frequency (repeated vs. single), context (financial vs. medical), and their interaction as fixed effects, and participants as random intercepts and random slopes. Regression coefficients and statistics are in Table 3 [TABLE:3].

Results showed a significant interaction between gamble frequency and context (β = 0.766, SE = 0.265, p = 0.004). Simple effects analysis using the emmeans package \cite{Lenth_2025} revealed (see Figure 6 [FIGURE:6]) that in both medical and financial contexts, participants significantly preferred risky options in repeated-play versus single-play gambles (medical: β = 1.97, SE = 0.175, z = 11.245, p < 0.001; financial: β = 2.73, SE = 0.256, z = 10.671, p < 0.001). This result indicates that in repeated gambles, people prefer high-EV risky options, while in single gambles they prefer low-EV certain options, successfully replicating the original study's key findings.

Table 3 CLMM regression results for effects of gamble frequency, context, and their interaction on risky decision tendency

Variable β SE 95% CI z p Gamble frequency (repeated=1, single=0) 1.97 0.175 [1.624, 2.309] 11.245 <0.001 Context (financial=1, medical=0) -1.83 0.241 [-2.303, -1.354] -7.594 <0.001 Gamble frequency × Context 0.766 0.265 [0.246, 1.286] 2.891 0.004

To test acceptance of LLM-generated decision strategies, similarity ratings (7-point scale) between LLM-generated texts and participants' own decision thinking were analyzed. One-sample t-test results showed similarity ratings significantly above the midpoint of 3.5 across all conditions (ps < 0.001; detailed results in Supplementary Material E Table S5), indicating model-generated strategy texts were highly similar to human thinking patterns in corresponding contexts.

Figure 6 Choice distributions in medical and financial contexts under single-play and repeated-play conditions in Study 2 Experiment 1. The vertical axis uses a 4-point scale where Option A represents the certain option and Option B the risky option. "1" = "very likely to choose Option A," "2" = "likely to choose Option A," "3" = "likely to choose Option B," "4" = "very likely to choose Option B." Boxplot horizontal lines from top to bottom represent upper quartile, median, and lower quartile; overlapping lines indicate identical values; white dots are means (M), error bars are standard errors (SE). In each pair, left and right distributions represent single-play and repeated-play conditions, respectively.

In summary, the study successfully replicated Sun et al.'s (2014) core finding that individuals prefer high-EV risky options in repeated gambles and low-EV certain options in single gambles. Additionally, LLM-generated decision strategies received high acceptance, demonstrating that LLMs can accurately distinguish decision logic differences between single-play and repeated-play contexts and effectively externalize human decision strategies in such tasks.

3.2.2 Experiment 2

To enhance external validity, Experiment 2 added "content creation" and "e-commerce marketing" contexts to the original "medical" and "financial" contexts and included participants with actual work experience to further expand the applicability of findings across different decision types and target populations.

Experimental design and sample: This experiment used a 2 (context: content creation vs. e-commerce marketing) × 2 (gamble frequency: single vs. repeated) mixed design, with "context" as a between-subjects variable. Based on G*Power v3.1 \cite{Faul_etal_2007}, with medium effect size (Cohen's f = 0.25), α = 0.05, and power = 0.80, minimum required sample size was 34. Via Credamo, 220 participants were recruited. After excluding 10 unqualified participants who failed attention checks or had response times beyond ±3 standard deviations, the final valid sample included 210 participants (92 males, 118 females), mean age = 28.76 (SD = 9.65). Detailed demographic distributions are in Supplementary Material E Table S6.

Materials and procedure: (1) Content creation context: Simulated content creators' strategic choices in short-video topic selection (choose stable gain of 300 followers [certain option] vs. 40% probability of gaining 1000 followers [risky option]); (2) E-commerce marketing context: Simulated e-commerce operators' marketing strategy decisions (choose stable monthly income of ¥20,000 [certain option] vs. 30% probability of monthly income of ¥100,000 [risky option]). Both contexts included single-play and repeated-play conditions, with overall task framework and procedure consistent with Experiment 1.

Results and discussion: CLMM analysis used participants' 4-point scale choices as the dependent variable, with gamble frequency (repeated vs. single), context (content creation vs. e-commerce marketing), and their interaction as fixed effects, and participants as random intercepts and random slopes. Regression coefficients and statistics are in Table 4 [TABLE:4].

Results showed the gamble frequency × context interaction was not significant (p = 0.993). Simple effects analysis using the emmeans package \cite{Lenth_2025} revealed (see Figure 8 [FIGURE:8]) that in both content creation and e-commerce marketing contexts, participants significantly preferred high-EV risky options under repeated-play conditions (content creation: β = 0.631, SE = 0.253, z = 2.498, p = 0.013, 95% CI = [0.135, 1.127]; e-commerce marketing: β = 0.635, SE = 0.263, z = 2.415, p = 0.016, 95% CI = [0.119, 1.151]). This result, consistent with Experiment 1, shows individuals prefer low-EV certain options in single gambles and high-EV risky options in repeated gambles.

Table 4 CLMM regression results for effects of gamble frequency, context, and their interaction on risky decision tendency

Variable β SE 95% CI z p Gamble frequency (repeated=1, single=0) 0.635 0.263 [0.120, 1.151] 2.415 0.016 Context (creation=1, marketing=0) 0.619 0.271 [0.085, 1.153] 2.283 0.022 Gamble frequency × Context -0.004 0.360 [-0.713, 0.706] -0.009 0.993

Acceptance tests for LLM-generated strategy texts (one-sample t-tests, ps < 0.001; detailed results in Supplementary Material E Table S7) showed similarity ratings significantly above 3.5 across all conditions, consistent with Experiment 1, indicating GPT-4o-generated strategy texts were highly consistent with human thinking patterns in content logic.

Figure 8 Choice distributions in content creation and e-commerce marketing contexts under single-play and repeated-play conditions in Study 2 Experiment 2. The vertical axis uses a 4-point scale where Option A represents the certain option and Option B the risky option. "1" = "very likely to choose Option A," "2" = "likely to choose Option A," "3" = "likely to choose Option B," "4" = "very likely to choose Option B." Boxplot horizontal lines from top to bottom represent upper quartile, median, and lower quartile; overlapping lines indicate identical values; white dots are means (M), error bars are standard errors (SE). In each pair, left and right distributions represent single-play and repeated-play conditions, respectively.

4 Study 3: Intervention Effects of LLM-Generated Strategies on Human Decision-Making

Building on Study 2's methodology, Study 3 further tested whether LLM-generated strategy texts targeting different contexts and options could intervene in people's inherent choice preferences in single-play and repeated-play gambles, conducting two experiments in "medical/finance" contexts and "content creation/e-commerce marketing" contexts.

4.1 Experiment 1

4.1.1 Experimental Design and Sample

A 2 (text type: control vs. intervention) × 2 (context: medical vs. financial) × 2 (gamble frequency: single vs. repeated) mixed design was used, with "context" and "gamble frequency" as within-subjects variables. Presentation order of medical/finance contexts and single/repeated gambles was balanced between participants. Similar to Study 2, 490 participants were recruited via Credamo across two identical conditions. After excluding 30 unqualified participants with response times beyond ±3 standard deviations \cite{Tabachnick_Fidell_2007, 王伊萌等_2023}, the final valid sample included 460 participants (205 males, 255 females), mean age = 21.80 (SD = 1.94). Detailed demographic distributions are in Supplementary Material E Table S8.

4.1.2 Materials and Procedure

The experiment continued Study 2's framework, using single-play and repeated-play tasks in medical and financial contexts. The main difference from Study 2 was that participants were randomly assigned to intervention or control groups: participants first read context and option descriptions, then read GPT-4o-generated intervention texts (intervention group) or control texts (control group) before making their choice. All texts were presented without indicating their source. Finally, participants completed the decision task and basic demographic information.

Intervention texts were based on LLM-generated strategy content from Study 2, designed to guide participants to change inherent preferences: (1) In single-play gambles, guide people to choose risky options; (2) In repeated-play gambles, guide people to choose certain options. Therefore, intervention texts in single-play gambles used normative strategies from repeated-play contexts, emphasizing expected value and rational calculation; intervention texts in repeated-play gambles used descriptive strategies from single-play contexts, highlighting dimensional dominance—i.e., certainty and long-term stability. Control group texts only contained objective descriptions of target options.

Example for "medical context—single-play" condition:
Intervention text: "Choosing Option B allows rational decision-making based on expected value calculation. Although Option B carries high risk (3/5 probability of saving no one), it offers a 2/5 probability of saving all 600 lives, with expected value = 2/5 × 600 = 240 lives, significantly higher than Option A's 200 lives. Despite the risk, Option B's expected benefit is more desirable; rational decision-makers should consider expected value rather than single-outcome fluctuations."

Control text: "Under Option B implementation, two outcomes will occur according to probability distribution: when the 40% probability event occurs, all 600 people will be successfully rescued; when the 60% probability event occurs, no rescue will be achieved."

Specific tasks and intervention texts are detailed in Supplementary Material D.

4.1.3 Results and Discussion

To test text type effects on participants' risky decisions, CLMM analysis used participants' 4-point scale choices as the dependent variable, with text type (intervention vs. control), gamble frequency (repeated vs. single), context (financial vs. medical), and their interactions as fixed effects, and participants as random intercepts and random slopes. Regression coefficients and statistics are in Table 5 [TABLE:5].

Results showed the text intervention effect differed significantly between single-play and repeated-play gambles (β = -1.840, SE = 0.284, z = -6.480, p < 0.001, 95% CI = [-2.397, -1.284]), with context showing no significant moderating effect (p = 0.735). Further simple effects analysis (see Figure 10 [FIGURE:10]) revealed that in single-play gambles, intervention group participants significantly preferred risky options compared to control group (medical: β = 1.072, SE = 0.180, z = 5.940, p < 0.001; financial: β = 1.168, SE = 0.241, z = 4.845, p < 0.001, 95% CI = [0.696, 1.640]). In repeated-play gambles, intervention group participants significantly preferred certain options (medical: β = -0.768, SE = 0.204, z = -3.760, p < 0.001, 95% CI = [-1.169, -0.368]; financial: β = -0.893, SE = 0.240, z = -3.724, p < 0.001, 95% CI = [-1.362, -0.423]). These results indicate that in single-play conditions, LLM intervention significantly enhanced preference for high-EV risky options, while in repeated-play conditions, it had the opposite effect, making people more inclined to choose certain options.

Table 5 CLMM regression results for effects of text type, gamble frequency, context, and their interactions on risky decision tendency

Variable β SE 95% CI z p Text type (intervention=1, control=0) 1.072 0.180 [0.718, 1.426] 5.940 <0.001 Gamble frequency (repeated=1, single=0) 1.501 0.202 [1.103, 1.898] 7.430 <0.001 Context (financial=1, medical=0) -1.518 0.234 [-1.974, -1.062] -6.487 <0.001 Text type × Gamble frequency -1.840 0.284 [-2.397, -1.284] -6.480 <0.001 Text type × Context 0.096 0.283 [-0.458, 0.650] 0.339 0.735 Gamble frequency × Context 0.503 0.287 [-0.061, 1.067] 1.752 0.080 Text type × Gamble frequency × Context -0.220 0.382 [-0.972, 0.532] -0.576 0.565

Figure 10 Choice distributions in medical and financial contexts under single-play and repeated-play conditions in Study 3 Experiment 1. In Panels A and B, the vertical axis uses a 4-point scale where Option A represents the certain option and Option B the risky option. "1" = "very likely to choose Option A," "2" = "likely to choose Option A," "3" = "likely to choose Option B," "4" = "very likely to choose Option B." Boxplot horizontal lines from top to bottom represent upper quartile, median, and lower quartile; overlapping lines indicate identical values; white dots are means (M), error bars are standard errors (SE). In each pair, left and right distributions represent control and intervention groups, respectively.

4.2 Experiment 2

4.2.1 Experimental Design and Sample

A 2 (text type: control vs. intervention) × 2 (context: content creation vs. e-commerce marketing) × 2 (gamble frequency: single vs. repeated) mixed design was used, with "text type" and "context" as between-subjects variables and "gamble frequency" as a within-subjects variable. Based on G*Power v3.1 \cite{Faul_etal_2007}, with medium effect size (Cohen's f = 0.25), α = 0.05, and power = 0.80, minimum required sample size was 48. Via Credamo, 245 participants were recruited. After excluding 5 unqualified participants who failed attention checks or had response times beyond ±3 standard deviations, the final valid sample included 240 participants (106 males, 134 females), mean age = 29.12 (SD = 9.19). Detailed demographic distributions are in Supplementary Material E Table S9.

4.2.2 Materials and Procedure

The experimental procedure remained consistent with Experiment 1. Intervention texts were all generated by GPT-4o; related materials and questionnaire content are detailed in Supplementary Materials B and D.

4.2.3 Results and Discussion

CLMM analysis used participants' 4-point scale choices as the dependent variable, with text type (intervention vs. control), gamble frequency (repeated vs. single), context (content creation vs. e-commerce marketing), and their interactions as fixed effects, and participants as random intercepts and random slopes. Regression coefficients and statistics are in Table 6 [TABLE:6].

Results showed the three-way interaction of text type × gamble frequency × context was not significant (p = 0.916); the two-way interaction of text type × gamble frequency was significant (β = -2.101, SE = 0.487, z = -4.314, p < 0.001, 95% CI = [-3.056, -1.147]). Further simple effects analysis (see Figure 11 [FIGURE:11]) revealed that in single-play gambles, intervention group participants preferred risky options compared to control group (content creation: β = 0.696, SE = 0.325, z = 2.143, p = 0.032, 95% CI = [0.059, 1.332]; e-commerce marketing: β = 0.961, SE = 0.335, z = 2.867, p = 0.004, 95% CI = [0.304, 1.619]). In repeated-play gambles, intervention group participants significantly preferred certain options (content creation: β = -1.335, SE = 0.333, z = -4.008, p < 0.001, 95% CI = [-1.988, -0.682]; e-commerce marketing: β = -1.140, SE = 0.348, z = -3.273, p = 0.001, 95% CI = [-1.822, -0.457]). These results indicate that in single-play conditions, LLM intervention significantly enhanced preference for high-EV risky options, while in repeated-play conditions it prompted preference for certain options.

In summary, across medical, financial, content creation, and e-commerce marketing contexts, LLM-generated intervention texts can intervene in people's inherent choice preferences in both single-play and repeated-play gambles.

Table 6 CLMM regression results for effects of text type, gamble frequency, context, and their interactions on risky decision tendency

Variable β SE 95% CI z p Text type (intervention=1, control=0) 0.696 0.325 [0.059, 1.332] 2.143 0.032 Gamble frequency (repeated=1, single=0) 0.819 0.350 [0.133, 1.505] 2.340 0.019 Context (creation=1, marketing=0) 0.917 0.336 [0.260, 1.574] 2.730 0.006 Text type × Gamble frequency -2.101 0.487 [-3.056, -1.147] -4.314 <0.001 Text type × Context -0.265 0.464 [-1.176, 0.645] -0.571 0.568 Gamble frequency × Context -0.533 0.474 [-1.457, 0.390] -1.125 0.261 Text type × Gamble frequency × Context 0.044 0.414 [-1.235, 1.375] 0.106 0.916

Figure 11 Choice distributions in content creation and e-commerce marketing contexts under single-play and repeated-play conditions in Study 3 Experiment 2. In Panels A and B, the vertical axis uses a 4-point scale where Option A represents the certain option and Option B the risky option. "1" = "very likely to choose Option A," "2" = "likely to choose Option A," "3" = "likely to choose Option B," "4" = "very likely to choose Option B." Boxplot horizontal lines from top to bottom represent upper quartile, median, and lower quartile; overlapping lines indicate identical values; white dots are means (M), error bars are standard errors (SE). In each pair, left and right distributions represent control and intervention groups, respectively.

5 General Discussion

Based on the single-play/repeated-play gamble paradigm, this study systematically examined LLMs' simulation validity for human risky decision-making preferences and process mechanisms, constructed methods for LLM-generated strategy texts, tested human acceptance of these texts, and validated their effectiveness in intervening in human decision-making. Findings show: LLMs (GPT-3.5/4) can replicate human typical choice patterns of greater risk aversion in single plays and greater risk seeking in repeated plays; LLMs (GPT-4o, DeepSeek-R1) can distinguish the internal logic of decision preference differences between single-play and repeated-play contexts, accurately applying descriptive and normative theories to generate differential strategies that receive high human acceptance; in medical, financial, content creation, and e-commerce marketing contexts, LLM-generated strategy texts can effectively intervene in people's inherent choice preferences in single-play and repeated-play gambles.

5.1 LLMs' Simulation of Risky Decision Preferences

This study found, based on the single-play/repeated-play gamble paradigm, that LLMs can replicate human choice patterns in single-play and repeated-play contexts, though with detailed differences. Specifically, when facing gambles with equal expected values, LLMs' participation rates in single-play contexts were significantly lower than in repeated-play contexts \cite{Benartzi_Thaler_1999, Kumar_Lim_2008}. However, similar to Chen et al.'s (2023) findings, both LLMs showed higher participation rates (mean ≈ 95%) than humans (mean ≈ 53%; \cite{Redelmeier_Tversky_1992}) in both single-play and repeated-play contexts, indicating stronger risk-seeking tendencies. This result may originate from model training corpora. The GPT series models' training data partially derived from Common Crawl, Wikipedia, and investment forums \cite{Brown_etal_2020}, containing not only human subjective cognition of risk and reward but also objective risk-reward regularities. When learning the former, models may acquire human-like risk preferences, but since they cannot truly experience negative emotions from losses, learning from objective regularities may form stronger risk-seeking preferences than humans.

By revealing similarities and differences between LLMs and humans in risky decision-making, this study theoretically helps distinguish data-driven from biologically-cognitive risky decision mechanisms, providing evidence for establishing AI decision evaluation frameworks. Practically, it helps optimize human-AI collaboration models to achieve complementary advantages. Additionally, this study suggests that when directly adopting LLM recommendations in high-risk domains like finance and healthcare, risk calibration mechanisms (e.g., threshold control or human review) should be established.

5.2 LLMs' Understanding of Human Risky Decision Strategies

This study systematically tested LLMs' understanding and generation capabilities regarding human risky decision strategies across four contexts (medical, financial, content creation, e-commerce marketing) using the single-play/repeated-play paradigm. Here, "understanding" refers to LLMs' ability to distinguish single-play/repeated-play characteristics and generate matching behavioral preferences and strategy texts based on corresponding theoretical frameworks (descriptive vs. normative). This reflects LLMs' reasoning capabilities regarding human information processing internal mechanisms (including logical reasoning, common-sense application, and causal inference) \cite{Mischler_etal_2024, Wei_etal_2022}, rather than merely superficially "simulating" human decision patterns at the behavioral level \cite{Argyle_etal_2023, Binz_Schulz_2023}.

In repeated-play gambles, LLMs captured human preferences for high-EV risky options, generating normative theory-framed strategies emphasizing rational decision-making through probability-weighted summation. In single-play gambles, LLMs identified human preferences for certain options, generating strategies focusing on "certainty" and "robustness" dimensions, aligning with descriptive theory heuristics \cite{Brandstätter_etal_2006} and equate-to-differentiate theory assumptions \cite{Li_2004}: ignoring/"equating" dimensions with small differences and making decisions based on key dimension comparisons. This finding not only validates Sun et al.'s (2014) theoretical hypothesis that humans employ different decision strategies in single-play and repeated-play contexts but also reveals that LLMs can explain different behavioral logics in these contexts.

Through LLM strategy externalization, this study provides direct computational modeling evidence for risky decision-making cognitive processes, helping researchers clarify internal logic of risky decisions, reveal traditionally unobservable decision black boxes, and establish theoretical and methodological foundations for using AI to intervene in risky decision-making.

5.3 LLM Intervention in Risky Decision-Making

This study innovatively leveraged LLMs' content generation capabilities to construct a risky decision intervention method based on strategy texts. Through single-play and repeated-play experiments in medical, financial, content creation, and e-commerce marketing contexts, the study systematically validated this method's effectiveness and mechanism.

The study established a three-stage strategy/intervention text generation process and applied it to decision intervention: (1) Use DeepSeek-R1 and GPT-4o to obtain decision rationales and apply ITA to extract core strategy themes; (2) Generate strategy texts based on these themes; (3) Evaluate text content. The intervention logic was based on the "preference reversal" principle: guide people to choose high-EV risky options in single-play gambles and low-EV certain options in repeated-play gambles. Results showed that compared to control groups, LLM-generated texts significantly changed people's inherent decision preferences across four contexts:

In single-play gambles, intervention texts emphasizing "expected return maximization" align with expected value theory's core logic \cite{Pascal_1670}, guiding decision-makers to rationally calculate based on expected value. Correspondingly, in repeated-play gambles, intervention texts—consistent with heuristic decision-making \cite{Brandstätter_etal_2006} and equate-to-differentiate theory \cite{Li_2004} logic—highlighted potential losses of risky options ("relatively large risk of loss each time") and stability of certain options ("ensuring stable returns each time"), guiding participants to simplify decisions based on key dimension (e.g., probability) differences.

Notably, LLM intervention effects varied across gamble contexts. For example, the magnitude of choice proportion changes between intervention and control groups for target options differed between single-play and repeated-play contexts. This suggests that the impact degree of LLM-generated intervention texts on decision behavior may be moderated by individual characteristics or involvement in specific decision contexts.

In summary, traditional risky decision interventions often used repeated-play decision logic (normative "weighted summation" strategies) to guide single-play decision behavior, leading to (1) separation between normative and descriptive theories, and (2) failure to effectively intervene in real-life risky decision-making. This study found that LLMs can deeply understand decision logics in single-play and repeated-play contexts and generate strategy texts highly consistent with and accepted by human thinking, which also demonstrate excellent effectiveness in behavioral intervention.

5.4 Research Contributions

This study is the first to systematically validate LLMs' ability to simulate behavioral preferences, understand decision-making, and intervene in human behavior through generated texts within the single-play/repeated-play gamble paradigm, offering three theoretical contributions:

First, by comparing LLMs' and humans' choice preferences in single-play and repeated-play risky decision tasks, the study systematically tested LLMs' simulation validity for human risky decision behavior. Its importance lies in: (1) Revealing that LLMs can simultaneously capture normative theory's rational calculation principles and descriptive theory's experiential decision modes, providing unique experimental techniques for testing decision theory divergence points; (2) LLMs can replicate human behavioral preference reversal in single-play and repeated-play gambles even without human cognition and emotion, suggesting at the cognitive level that human single-play and repeated-play risky decision processing may be driven by linguistic patterning and semantic association inherent in language systems \cite{Suri_etal_2024}.

Second, this study deeply explored LLMs' reasoning mechanisms in single-play and repeated-play risky decision tasks: LLMs exhibit human-like dual-mode reasoning mechanisms in risky decision-making, aligning more with normative theory's rational calculation in repeated-play contexts while shifting to descriptive theory heuristics in single-play contexts. This dynamic switching capability enables AI to reproduce the dialectical unity of human rationality and intuition. Currently, as AI model capabilities continue advancing, evaluation methods relying solely on task performance can no longer meet needs for understanding internal mechanisms. Parsing LLMs' reasoning paths is crucial for revealing "how" they make human-like decisions, while also providing new evidence for theoretical modeling in risky decision-making.

Finally, this study innovatively constructed a closed-loop intervention approach: "LLM simulation of decision-making → generation of decision rationales → inductive thematic analysis → generation of context-specific intervention texts," and validated its effectiveness in changing human risk preferences. This finding (1) expands research dimensions of human-AI interaction in behavioral decision theory, revealing that in single-play and repeated-play risky decisions, linguistic patterns and associations can trigger corresponding human decision responses and influence decision processes and outcomes even without traditional cognitive and emotional processes; (2) provides new directions for establishing reasonable "human-LLM" interaction rules. Through textual analysis of LLM-generated decision rationales, key linguistic features influencing human single-play and repeated-play decision preferences can be parsed, helping prompt designers \cite{Khalid_Witmer_2025} optimize interaction strategies to guide LLMs in generating more practical and reliable outputs.

Existing LLM decision-making research mostly relies on classic decision paradigms, focusing on testing LLM-human similarity at the behavioral level. In contrast, this study, based on the classic "single-play/repeated-play gamble" paradigm in risky decision-making, not only examined LLMs' human-like performance and mechanisms but also explored their understanding and intervention effectiveness on human risky decision behavioral mechanisms through a relatively complete and systematic research path.

Practically, this study first reveals that LLMs can clearly and consistently present complex strategies. This advantage enables them to overcome traditional interview limitations and serve as a "theory construction tool," potentially allowing researchers to efficiently explore decision mechanisms through "structured interviews" with LLMs to advance theoretical development. Second, it provides important references for developing generative AI-based behavioral nudging technologies, offering new ideas for correcting irrational decision behaviors and guiding intelligent assisted decision system design in financial investment, healthcare, e-commerce marketing, and other fields.

5.5 Limitations and Future Directions

This study has several limitations.

First, this study only used ChatGPT and DeepSeek models in Chinese contexts, without systematically comparing differences among various LLMs in simulation and strategy generation, nor exploring potential cultural differences across languages. Since models differ in training corpora \cite{Demszky_etal_2023, Thapa_Adhikari_2023} and alignment strategies \cite{Zhao_etal_2023}, their performance on the same task varies. The study used Claude and LLaMA models to replicate some experimental results for cross-model robustness testing, but conclusions may still suffer from model-specific bias. Future research could introduce mainstream and emerging models like Claude, Doubao, and Tongyi Qianwen to explore risky decision-making differences across languages and test conclusion generalizability.

Second, LLM intervention effects may be influenced by context, individual differences (e.g., personality traits, decision-making styles), and cultural backgrounds. For example, neuroticism in the Big Five personality traits and collectivist/individualist backgrounds may moderate risk tendencies \cite{Xia_etal_2019, Liu_etal_2021}, thereby affecting intervention effectiveness. Additionally, although LLMs offer personalization and intelligence advantages over traditional interventions, this study did not directly compare their effects. Future research could further optimize LLM text generation algorithms by controlling or integrating more individual characteristic information to improve intervention precision and ecological validity, advancing personalized decision support systems. Moreover, it is necessary to systematically compare LLM and traditional information framing intervention effects under identical task conditions to define their unique value and advantages.

Third, LLM intervention effects may stem from strategy mechanisms embedded in texts (e.g., expected utility, equate-to-differentiate theory mechanisms) or text effects (e.g., language professionalism, conciseness, emotional tone). While this study provided some evidence for strategy mechanism intervention effects, it could not exclude text effect influences. Future research could manipulate variables like language style or emotional tone with more rigorous experimental designs to reveal internal mechanisms of LLM text intervention effects.

6 Conclusion

This study first explored LLMs' simulation of risky decision-making choice preferences and process mechanisms and their understanding of human decision strategies, then investigated how to use LLMs to generate intervention texts affecting individual single-play and repeated-play risk preference choices, and validated intervention effectiveness. Findings: (1) LLMs can simulate human choice preference difference patterns in single-play and repeated-play gambles but show overall higher risk preference levels than humans; (2) LLMs can accurately distinguish decision logic differences between single-play and repeated-play contexts and apply descriptive and normative theories to generate corresponding strategies that receive high human acceptance; (3) Strategy texts extracted from LLM decision processes through inductive thematic analysis can effectively intervene in people's inherent choice preference patterns in single-play and repeated-play gambles.

References

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., ... McGrew, B. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774

Aher, G. V., Arriaga, R. I., & Kalai, A. T. (2023). Using large language models to simulate multiple humans and replicate human subject studies. In Proceedings of the 40th International Conference on Machine Learning (pp. 337–371). PMLR. https://proceedings.mlr.press/v202/aher23a.html

Altay, S., Hacquin, A. S., Chevallier, C., & Mercier, H. (2023). Information delivered by a chatbot has a positive impact on COVID-19 vaccines attitudes and intentions. Journal of Experimental Psychology: Applied, 29(1), 52–62. https://doi.org/10.1037/xap0000400

Anderson, M. A. B., Cox, D. J., & Dallery, J. (2023). Effects of economic context and reward amount on delay and probability discounting. Journal of the Experimental Analysis of Behavior, 120(2), https://doi.org/10.1002/jeab.868

Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J., Rytting, C., & Wingate, D. (2023). Out of one, many: Using language models to simulate human samples. Political Analysis, 31(3), https://doi.org/10.1017/pan.2023.2

Arora, C., Sayeed, A. I., Licorish, S., Wang, F., & Treude, C. (2024). Optimizing large language model hyperparameters for text generation. arXiv preprint arXiv:2408.10577. https://doi.org/10.48550/arXiv.2408.10577

Barberis, N., & Huang, M. (2009). Preferences with frames: A new utility specification that allows for the framing of risks. Journal of Economic Dynamics and Control, 33(8), https://doi.org/10.1016/j.jedc.2009.01.009

Benartzi, S., & Thaler, R. H. (1999). Risk aversion or myopia? Choices in repeated gambles and retirement investments. Management Science, 45(3), 364–381. https://doi.org/10.1287/mnsc.45.3.364

Binz, M., & Schulz, E. (2023). Using cognitive psychology to understand GPT-3. Proceedings of the National Academy of Sciences, 120(6), e2218523120. https://doi.org/10.1073/pnas.2218523120

Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The priority heuristic: Making choices without trade-offs. Psychological Review, 113(2), 409–432. https://doi.org/10.1037/0033-295X.113.2.409

Brislin, R. W. (1986). The wording and translation of research instruments. In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural research (pp. 137–164). Sage Publications.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/arXiv.2005.14165

Carvalho, T., Negm, H., & El-Geneidy, A. (2024). A comparison of the results from artificial intelligence-based and human-based transport-related thematic analysis. Findings. https://doi.org/10.32866/001c.94401

Chen, Y., Liu, T. X., Shan, Y., & Zhong, S. (2023). The emergence of economic rationality of GPT. Proceedings of the National Academy of Sciences, 120(51), e2316205120. https://doi.org/10.1073/pnas.2316205120

Choi, S., Kang, H., Kim, N., & Kim, J. (2025). How does artificial intelligence improve human decision-making? Evidence from the AI-powered Go program. Strategic Management Journal, 46(6), 1523–1554. https://doi.org/10.1002/smj.3694

Christensen, R. H. B. (2023). ordinal: Regression models for ordinal data (R package version 2023.12-4.1) [Computer software]. https://CRAN.R-project.org/package=ordinal

Coda-Forno, J., Witte, K., Jagadish, A. K., Binz, M., Akata, Z., & Schulz, E. (2023). Inducing anxiety in large language models increases exploration bias. arXiv preprint arXiv:2304.11111. https://doi.org/10.48550/arXiv.2304.11111

Dai, S. C., Xiong, A., & Ku, L. W. (2023). LLM-in-the-loop: Leveraging large language model for thematic analysis. arXiv preprint arXiv:2310.15100. https://doi.org/10.48550/arXiv.2310.15100

de Kok, T. (2025). ChatGPT for textual analysis? How to use generative LLMs in accounting research. Management Science. https://doi.org/10.1287/mnsc.2023.03253

de Varda, A. G., Saponaro, C., & Marelli, M. (2025). High variability in LLMs' analogical reasoning. Nature Human Behaviour, 9(7), 1339–1341. https://doi.org/10.1038/s41562-025-02224-3

DeepSeek-AI, Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., … Bi, X. (2025). Deepseek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948. https://doi.org/10.48550/arXiv.2501.12948

Deiana, G., Dettori, M., Arghittu, A., Azara, A., Gabutti, G., & Castiglia, P. (2023). Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines, 11(7), 1217. https://doi.org/10.3390/vaccines11071217

Deiner, M. S., Honcharov, V., Li, J., Mackey, T. K., Porco, T. C., & Sarkar, U. (2024). Large language models can enable inductive thematic analysis of a social media corpus in a single prompt: human validation study. JMIR Infodemiology, 4(1), e59641. https://doi.org/10.2196/59641

Demszky, D., Yang, D., Yeager, D. S., Bryan, C. J., Clapper, M., Chandhok, S., ... Pennebaker, J. W. (2023). Using large language models in psychology. Nature Reviews Psychology, 2(11), https://doi.org/10.1038/s44159-023-00241-5

Dillion, D., Tandon, N., Gu, Y., & Gray, K. (2023). Can AI language models replace human participants? Trends in Cognitive Sciences, 27(7), 597–600. https://doi.org/10.1016/j.tics.2023.04.008

Ding, Y., Zhang, L. L., Zhang, C., Xu, Y., Shang, N., Xu, J., Yang, F., & Yang, M. (2024). Longrope: Extending LLM context window beyond 2 million tokens. arXiv preprint arXiv:2402.13753. https://doi.org/10.48550/arXiv.2402.13753

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). GPower 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39*(2), 175-191.

Ferguson, S. A., Aoyagui, P. A., & Kuzminykh, A. (2023). Something borrowed: Exploring the influence of AI-generated explanation text on the composition of human explanations. In Extended Abstracts of the 2023 Conference on Human Factors in Computing Systems (pp. 1–7). https://doi.org/10.1145/3544549.3585727

Goli, A., & Singh, A. (2024). Frontiers: Can large language models capture human preferences? Marketing Science, 43(4), 709–722. https://doi.org/10.1287/mksc.2023.0306

Grossmann, I., Feinberg, M., Parker, D. C., Christakis, N. A., Tetlock, P. E., & Cunningham, W. A. (2023). AI and the transformation of social science research. Science, 380(6650), https://doi.org/10.1126/science.adi1778

Gupta, R., Nair, K., Mishra, M., Ibrahim, B., & Bhardwaj, S. (2024). Adoption and impacts of generative artificial intelligence: Theoretical underpinnings and research agenda. International Journal of Information Management Data Insights, 4(1), 100232. https://doi.org/10.1016/j.jjimei.2024.100232

Hagendorff, T., Fabi, S., & Kosinski, M. (2023). Thinking fast and slow in large language models. Nature Computational Science, 3(10), 833–838. https://doi.org/10.1038/s43588-023-00527-x

Hebenstreit, K., Praas, R., Kiesewetter, L. P., & Samwald, M. (2024). A comparison of chain-of-thought reasoning strategies across datasets and models. PeerJ Computer Science, e1999. https://doi.org/10.7717/peerj-cs.1999

Hertwig, R., & Erev, I. (2009). The description–experience gap in risky choice. Trends in Cognitive Sciences, 13(12), 517–523. https://doi.org/10.1016/j.tics.2009.09.004

Jiao, L., Li, C., Chen, Z., Xu, H., & Xu, Y. (2025). When AI "possesses" personality: Roles of good and evil personalities influence moral judgment in large language models. Acta Psychologica Sinica, 57(6), 929–946. [焦丽颖, 李昌锦, 陈圳, 许恒彬, 许燕. (2025). 当 AI"具有"人格：善恶人格角色对大语言模型道德判断的影响. 心理学报, 57(6), 929–946.]

Jin, H. J., & Han, D. H. (2014). Interaction between message framing and consumers' prior subjective knowledge regarding food safety issues. Food Policy, 44, 95–102. https://doi.org/10.1016/j.foodpol.2013.10.007

Jones, E., & Steinhardt, J. (2022). Capturing failures of large language models via human cognitive biases. Advances in Neural Information Processing Systems, https://doi.org/10.48550/arxiv.2202.12299

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 363–391. https://doi.org/10.2307/1914185

Karinshak, E., Hu, A., Kong, K., Rao, V., Wang, J., Wang, J., & Zeng, Y. (2024). LLM-globe: A benchmark evaluating cultural values embedded in LLM output. arXiv preprint arXiv:2411.06032. https://doi.org/10.48550/arXiv.2411.06032

Karinshak, E., Liu, S. X., Park, J. S., & Hancock, J. T. (2023). Working with AI to persuade: Examining a large language model's ability to generate pro-vaccination messages. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–29. https://doi.org/10.1145/3579592

Katz, A., Fleming, G. C., & Main, J. (2024). Thematic analysis with open-source generative AI and machine learning: A new method for inductive qualitative codebook development. arXiv preprint arXiv:2410.03721. https://doi.org/10.48550/arXiv.2410.03721

Kelton, A. S., Pennington, R. R., & Tuttle, B. M. (2010). The effects of information presentation format on judgment and decision making: A review of the information systems research. Journal of Information Systems, 24(2), 79–105. https://doi.org/10.2308/jis.2010.24.2.79

Khalid, M. T., & Witmer, A. P. (2025). Prompt engineering for large language model-assisted inductive thematic analysis. arXiv preprint arXiv:2503.22978. https://doi.org/10.48550/arXiv.2503.22978

Kumar, A., & Lim, S. S. (2008). How do decision frames influence the stock investment choices of individual investors? Management Science, 54(6), 1052–1064. https://doi.org/10.1287/mnsc.1070.0845

Lehr, S. A., Caliskan, A., Liyanage, S., & Banaji, M. R. (2024). ChatGPT as research scientist: Probing GPT's capabilities as a research librarian, research ethicist, data generator, and data predictor. Proceedings of the National Academy of Sciences, 121(35), e2404328121. https://doi.org/10.1073/pnas.2404328121

Lenth, R. V. (2025). emmeans: Estimated marginal means, aka least-squares means (R package version 1.11.0) [Computer software]. https://doi.org/10.32614/CRAN.package.emmeans

Li, S. (2004). A behavioral choice model when computational ability matters. Applied Intelligence, 20(2), 147–163. https://doi.org/10.1023/B:APIN.0000013337.01711.c7

Lim, S., & Schmälzle, R. (2024). The effect of source disclosure on evaluation of AI-generated messages. Computers in Human Behavior: Artificial Humans, 2(1), 100058. https://doi.org/10.1016/j.chbah.2024.100058

Lin, Z. (2023). Why and how to embrace AI such as ChatGPT in your academic life. Royal Society Open Science, 10(8), 230658. https://doi.org/10.1098/rsos.230658

Lin, Z. (2024). How to write effective prompts for large language models. Nature Human Behaviour, 8(4), 611–615. https://doi.org/10.1038/s41562-024-01847-2

Lin, Z. (2025). Techniques for supercharging academic writing with generative AI. Nature Biomedical Engineering, 9(4), 426-431. https://doi.org/10.1038/s41551-024-01185-8

Liu, N., Zhou, L., Li, A. M., Hui, Q. S., Zhou, Y. R., & Zhang, Y. Y. (2021). Neuroticism and risk-taking: the role of competition with a former winner or loser. Personality and Individual Differences, 179, 110917. https://doi.org/10.1016/j.paid.2021.110917

Liu, S. X., Yang, J. Z., & Chu, H. R. (2019). Now or future? Analyzing the effects of message frame and format in motivating Chinese females to get HPV vaccines for their children. Patient Education and Counseling, 102(1), 61−67. https://doi.org/10.1016/j.pec.2018.09.005

Lopes, L. L. (1996). When time is of the essence: Averaging, aspiration, and the short run. Organizational Behavior and Human Decision Processes, 65(3), 179–189. https://doi.org/10.1006/obhd.1996.0017

Lu, J., Chen, Y., & Fang, Q. (2022). Promoting decision satisfaction: The effect of the decision target and strategy on process satisfaction. Journal of Business Research, https://doi.org/10.1016/j.jbusres.2021.10.056

Mei, Q., Xie, Y., Yuan, W., & Jackson, M. O. (2024). A turing test of whether AI chatbots are behaviorally similar to humans. Proceedings of the National Academy of Sciences, 121(9), e2313925121. https://doi.org/10.1073/pnas.2313925121

Mischler, G., Li, Y. A., Bickel, S., Mehta, A. D., & Mesgarani, N. (2024). Contextual feature extraction hierarchies converge in large language models and the brain. Nature Machine Intelligence, 6(10), 1467–1477. https://doi.org/10.1038/s42256-024-00925-4

Morreale, A., Stoklasa, J., Collan, M., & Lo Nigro, G. (2018). Uncertain outcome presentations bias decisions: experimental evidence from Finland and Italy. Annals of Operations Research, 268(1–2), 259–272. https://doi.org/10.1007/s10479-016-2349-3

Mozikov, M., Severin, N., Bodishtianu, V., Glushanina, M., Baklashkin, M., Savchenko, A. V., & Makarov, I. (2024). The good, the bad, and the hulk-like GPT: Analyzing emotional decisions of large language models in cooperation and bargaining games. arXiv preprint arXiv:2406.03299. https://arxiv.org/abs/2406.03299

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716

Pascal, B. (1670). Pensées (W. F. Trotter, Trans.). Retrieved October 2012, http://oregonstate.edu/instruct/phl302/texts/pascal/pensees-contents.html

Park, P. S. (2024). Diminished diversity-of-thought in a standard large language model. Behavior Research Methods, 56(6), 5754–5770. https://doi.org/10.3758/s13428-023-02307-x

Pavey, L., & Churchill, S. (2014). Promoting the avoidance of high-calorie snacks: Priming autonomy moderates message framing effects. PLOS ONE, 9(7), e103892. https://doi.org/10.1371/journal.pone.0103892

Pawel, S., Consonni, G., & Held, L. (2023). Bayesian approaches to designing replication studies. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000604

Peng, L., Guo, Y., & Hu, D. (2021). Information framing effect on public's intention to receive the COVID-19 vaccination in China. Vaccines, 9(9), 995. https://doi.org/10.3390/vaccines9090995

Peters, E., & Levin, I. P. (2008). Dissecting the risky-choice framing effect: Numeracy as an individual-difference factor in weighting risky and riskless options. Judgment and Decision Making, 3(6), 435–448. https://doi.org/10.1017/s1930297500000012

Popovic, N. F., Pachur, T., & Gaissmaier, W. (2019). The gap between medical and monetary choices under risk persists in decisions for others. Journal of Behavioral Decision Making, 32(4), https://doi.org/10.1002/bdm.2121

Prescott, M. R., Yeager, S., Ham, L., Saldana, C. D. R., Serrano, V., Narez, J., Paltin, D., Delgado, J., Moore, D. J., & Montoya, J. (2024). Comparing the efficacy and efficiency of human and generative AI: Qualitative thematic analyses. JMIR AI, 3(1), e54482. https://doi.org/10.2196/54482

Qin, X., Huang, M., & Ding, J. (2024). AITurk: Using ChatGPT for social science research. PsyArXiv. https://doi.org/10.31234/osf.io/xkd23

Redelmeier, D. A., & Tversky, A. (1992). On the framing of multiple prospects. Psychological Science, 3(3), 191–193. https://doi.org/10.1111/j.1467-9280.1992.tb00025.x

Reeck, C., Mullette-Gillman, O. A., McLaurin, R. E., & Huettel, S. A. (2022). Beyond money: Risk preferences across both economic and non-economic contexts predict financial decisions. PLOS ONE, 17(12), e0279125. https://doi.org/10.1371/journal.pone.0279125

Salles, A., Evers, K., & Farisco, M. (2020). Anthropomorphism in AI. AJOB Neuroscience, 11(2), 88–95. https://doi.org/10.1080/21507740.2020.1740350

Samuelson, P. A. (1963). Risk and uncertainty: A fallacy of large numbers. Scientia, 98, 108–113.

Scarffe, A., Coates, A., Brand, K., & Michalowski, W. (2024). Decision threshold models in medical decision making: A scoping literature review. BMC Medical Informatics and Decision Making, 24(1), 273. https://doi.org/10.1186/s12911-024-02681-2

Shahid, N., Rappon, T., & Berta, W. (2019). Applications of artificial neural networks in health care organizational decision-making: A scoping review. PLOS ONE, 14(2), e0212356. https://doi.org/10.1371/journal.pone.0212356

Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. https://doi.org/10.1177/0956797614567341

Strachan, J. W. A., Albergo, D., Borghini, G., Pansardi, O., Scaliti, E., Gupta, S., Saxena, K., Rufo, A., Panzeri, S., Manzi, G., Graziano, M. S. A., & Becchio, C. (2024). Testing theory of mind in large language models and humans. Nature Human Behaviour, 8(7), 1285–1295. https://doi.org/10.1038/s41562-024-01882-z

Sun, H. Y., Rao, L. L., Zhou, K., & Li, S. (2014). Formulating an emergency plan based on expectation-maximization is one thing, but applying it to a single case is another. Journal of Risk Research, 17(7), 785–814. https://doi.org/10.1080/13669877.2013.816333

Suri, G., Slater, L. R., Ziaee, A., & Nguyen, M. (2024). Do large language models show decision heuristics similar to humans? A case study using GPT-3.5. Journal of Experimental Psychology: General, 153(4), 1066–1075. https://doi.org/10.1037/xge0001547

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Allyn & Bacon.

Thapa, S., & Adhikari, S. (2023). ChatGPT, Bard, and large language models for biomedical research: Opportunities and pitfalls. Annals of Biomedical Engineering, 51(12), https://doi.org/10.1007/s10439-023-03284-0

Tversky, A., & Bar-Hillel, M. (1983). Risk: The long and the short. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(4), 713–717. https://doi.org/10.1037/0278-7393.9.4.713

Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior (2nd rev. ed.). Princeton University Press.

Wang, Y., Zhang, J., Wang, F., Xu, W., & Liu, W. (2023). Do not think any virtue trivial, and thus neglect it: Mindfulness and wisdom—Serial mediating role of social mindfulness and perspective taking. Acta Psychologica Sinica, 55(4), 626–641. https://doi.org/10.3724/SP.J.1041.2023.00626

Webb, T., Holyoak, K. J., & Lu, H. (2023). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9), 1526–1541. https://doi.org/10.48550/arXiv.2212.09196

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. V., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://doi.org/10.48550/arxiv.2201.11903

Xia, D., Li, Y., He, Y., Zhang, T., Wang, Y., & Gu, J. (2019). Exploring the role of cultural individualism and collectivism on public acceptance of nuclear energy. Energy Policy, https://doi.org/10.1016/j.enpol.2019.05.014

Xia, D., Song, M., & Zhu, T. (2025). A comparison of the persuasiveness of human and ChatGPT generated pro-vaccine messages. Frontiers in Public Health, https://doi.org/10.3389/fpubh.2024.1515871

Yuan, Y., Jiao, W., Wang, W., Huang, J. T., He, P., Shi, S., & Tu, Z. (2023). GPT-4 is too smart to be safe: Stealthy chat with LLMs via cipher. arXiv preprint arXiv:2308.06463. https://doi.org/10.48550/arXiv.2308.06463

Zhang, J., Li, H. A., & Allenby, G. M. (2024). Using text analysis in parallel mediation analysis. Marketing Science, 43(5), 953–970. https://doi.org/10.1287/mksc.2023.0045

Zhang, Y., Huang, F., Mo, L., Liu, X., & Zhu, T. (2025). Suicidal ideation data augmentation and recognition technology based on large language models. Acta Psychologica Sinica, 57(6), 987–1000.

Zhao, F., Yu, F., & Shang, Y. (2024). A new method supporting qualitative data analysis through prompt generation for inductive coding. 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 164–169. https://doi.org/10.1109/IRI62200.2024.00043

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). https://doi.org/10.48550/arXiv.2303.18223

Zhao, Y., Yu, F., & Shang, Y. (2024). A new method supporting qualitative data analysis through prompt generation for inductive coding. 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), 164–169. https://doi.org/10.1109/IRI62200.2024.00043

Supplementary Materials

A Inductive Thematic Analysis Prompt Design and Results

A1 Inductive Thematic Analysis Prompt Design

You are a professional text analysis expert. I conducted an experiment and generated some decision rationale content. I need you to analyze it according to my requirements. My experimental content is as follows.

Research purpose: This study explores whether LLMs exhibit decision phenomena consistent with humans or show different decision results when conducting single-play and repeated-play gamble decision tasks. Through this research, we aim to reveal LLMs' decision-making mechanisms in such tasks and provide guidance for future use of LLMs to assist human decision-making.

Research design: This study selected DeepSeek as the research object. When interacting with LLMs, each interface contains three roles: system, user, and assistant. System messages are for operation commands, user prompts are direct queries, and assistant is GPT's feedback.

System message (Human): You will conduct multiple rounds of decision tasks and must provide an answer in each round. If you do not provide an answer, I will assume you chose randomly. I want you to make decisions like a human. Please wait for my prompt, then tell me your answer and briefly explain your decision rationale.

Decision task: This study compares decision performance in single-play and repeated-play contexts using the classic gambling game: Imagine a gambling game with 50% probability of winning ¥2000 and 50% probability of losing ¥500.
- Single-play: Now, suppose you have the opportunity to participate in this gamble once. Would you choose to participate? Please answer "participate" or "not participate."
- Repeated-play: Now, suppose you have the opportunity to gamble five times. Would you choose to participate? Please answer "participate" or "not participate."

Next, I will provide you with DeepSeek's decision rationale content from single-play/repeated-play gambles. Please analyze the decision rationale text and extract the thematic dimensions explaining why it made that decision. Please follow these requirements:

Ensure themes are unique, non-overlapping, and collectively provide broad understanding of the dataset.
Ensure analysis covers different dimensions such as risk preference, return expectation, etc., ensuring themes clearly reflect DeepSeek's considerations in single-play/repeated-play decision-making.

Next, I will provide decision rationales for your analysis.

A2 Inductive Thematic Analysis Results

A2.1 Strategy Themes Extracted in Single-Play Condition

1. Theme: Expected Value and Mathematical Rational Decision-Making

Expected value serves as a decision basis, measuring long-term gains or losses per gamble participation. Based on probability and return calculations, rational decisions should rely on expected value rather than single-gamble fluctuations, choosing options that maximize expected value.

Representative examples:
- "Expected return calculation is 0.1×10000 - 0.9×278 = 749.8 yuan, positive long-term return. Despite 90% loss probability, high-return mathematical expectation supports rational decision-making."
- "Expected return is 0.2×5000 - 0.8×313 = 749.6 yuan. Although 80% loss probability of 313 yuan seems risky, significant positive expected value means stable returns from long-term participation."

2. Theme: Risk Tolerance and Decision Preferences

Decision-makers' risk tolerance determines willingness to accept potential losses in single gambles. Risk-preferring individuals may accept short-term fluctuations for long-term returns, while risk-averse individuals may avoid high-probability losses.

Representative examples:
- "Despite 90% probability of losing 278 yuan, expected value calculation is 0.1×10000 - 0.9×278 = 749.8 yuan, significantly higher than loss. Rational decision-makers would participate if they can tolerate 278 yuan loss."
- "Despite 80% probability of losing 313 yuan, positive expected value means rational decisions should consider mathematical expectation rather than short-term fluctuations."

3. Theme: Loss Aversion and Psychological Effects

According to prospect theory in behavioral economics, decision-makers' sensitivity to losses typically exceeds sensitivity to equivalent gains. This loss aversion effect may lead decision-makers to avoid high-risk gambles even when expected value is positive.

Representative examples:
- "Expected value is: 0.9×1111 - 0.1×2500 = 749.9 yuan. Despite 10% loss probability of 2500 yuan, psychological loss aversion may cause participants to avoid this gamble."
- "Although expected value is positive, single gamble's high failure probability (90%) and potential loss (278 yuan) may trigger risk-averse emotional responses, ultimately leading to non-participation."

4. Theme: High Return and Low-Probability Events

In some gamble tasks, despite high loss probability, low-probability high returns (e.g., large bonuses) may attract risk-preferring participants. The appeal of high-return low-probability events can outweigh short-term losses.

Representative examples:
- "Expected value is 0.1×10000 - 0.9×278 = 749.8 yuan. Despite 90% probability of losing 278 yuan, 10% high return (10000 yuan) is sufficient to attract risk-preferring participants."
- "Although 80% probability of losing 313 yuan, 20% probability of gaining 5000 yuan can cover risk, with positive expected value supporting participation."

These thematic dimensions clearly reflect DeepSeek's considerations when making single-play gamble decisions, including expected value calculation, risk tolerance, loss aversion psychological effects, and reactions to high-return low-probability events. These factors jointly shape its decision-making process, forming a rational expected-value-based decision pattern.

A2.2 Strategy Themes Extracted in Repeated-Play Condition

1. Expected Return Maximization

Rational decision-makers base decisions on expected value calculations, choosing to participate when expected value is positive to obtain long-term returns.

Representative examples:
- "This gamble's expected value is 0.1×10000 - 0.9×278 ≈ 749.8 yuan per play. Five plays' total expected return is 3749 yuan, positive long-term return."
- "Each gamble's expected return is 0.2×5000 - 0.8×313 ≈ 749.6 yuan, with five plays' total expected return reaching 3748 yuan."

2. Risk-Return Balance

Decision-makers evaluate risk (loss probability) and return (potential gain) in gambles, deciding whether to participate based on risk tolerance.

Representative examples:
- "Despite 59% probability of losing all five plays (1390 yuan loss), 41% probability of at least one win offers opportunity to cover cumulative losses with single high return."
- "Although each gamble has 70% probability of 357 yuan loss, 30% win rate corresponds to 3333 yuan high return, making single expected value as high as 750 yuan."

3. Cumulative Effects of Repeated Gambles

Expected returns from multiple independent gambles tend toward long-term stability through the law of large numbers. Despite possible short-term fluctuations, long-term returns are more secure.

Representative examples:
- "Five independent gambles' expected total return is about 3750 yuan, with 83.2% probability of at least one win in ten plays, making overall expected return significantly positive."
- "Although each gamble has 80% high probability of 313 yuan loss, five independent gambles' total expected return is 3748 yuan (single expected value 749.6 yuan). Despite 32.8% probability of losing all five (total loss 1565 yuan), 67.2% probability of at least one win."

4. High Returns Covering Low Risks

Although single gambles have high loss probability, high-return probability is sufficient to cover potential losses in repeated gambles, especially when one win can cover multiple losses.

Representative examples:
- "Each gamble's expected return is 0.1×10000 - 0.9×278 = 749.8 yuan, with five plays' total expected return 3749 yuan. Although 59% probability of losing all five plays, as long as risk tolerance allows, positive expected value still supports participation long-term."
- "Each gamble's expected return is 0.3×3333 - 0.7×357 ≈ 750 yuan, with five plays' total expected return reaching 3750 yuan. Despite 70% single loss probability, 30% high return (3333 yuan) can cover multiple losses (357 yuan), and one win in five independent trials can achieve net profit."

These dimensions comprehensively cover DeepSeek's key considerations in decision-making: expected value maximization, risk-return balance, long-term stability of repeated gambles, and strategies where high-return opportunities cover low risks. These dimensions jointly help DeepSeek make rational decisions in repeated-play contexts.

B Experimental Contexts and Strategy Generation Prompts

B1 Experimental Contexts

B1.1 Medical Context

B1.1.1 Medical Context Task

Please imagine you are an emergency management decision-maker. A disease outbreak is expected to cause 600 deaths. You can choose either of two emergency plans to respond to the disease. Suppose the following are scientifically estimated outcomes of these two plans:

If Plan A is adopted, 200 lives will be saved.
If Plan B is adopted, there is a 2/5 probability of saving all 600 lives, and a 3/5 probability that no one can be saved.

B1.1.2 Single-Play Condition Question

Suppose the plan you choose will be applied only once. Which plan would you prefer?

B1.1.3 Repeated-Play Condition Question

Suppose the plan you choose will be applied 100 times. Which plan would you prefer?

B1.2 Financial Context

B1.2.1 Financial Context Task

For handling a certain financial emergency, there are two optional emergency plans. Suppose scientifically estimated outcomes of these two plans are:

If Plan A is adopted, you will receive ¥5,000.
If Plan B is adopted, there is a 0.1% probability of receiving ¥6,000,000 and a 99.9% probability of receiving nothing.

B1.2.2 Single-Play Condition Question

Suppose the plan you choose will be applied only once. Which plan would you prefer?

B1.2.3 Repeated-Play Condition Question

Suppose the plan you choose will be applied 100 times. Which plan would you prefer?

B1.3 Content Creation Context

B1.3.1 Content Creation Context Task

Suppose you are a self-media account creator currently preparing new video releases. To increase follower count, you have two production plan options for new video content. For each new video, the implementation effects of the two plans are:

If Plan A is adopted, the account will gain 300 followers.
If Plan B is adopted, there is a 40% probability of gaining 1000 followers, but also a 60% probability of gaining no followers.

B1.3.2 Single-Play Condition Question

Suppose you can only release 1 new video. You need to choose one of the two plans to implement for this 1 video. Which plan would you prefer?

B1.3.3 Repeated-Play Condition Question

Suppose you can release 100 new videos. You need to choose one of the two plans to implement for all 100 videos. Which plan would you prefer?

B1.4 E-commerce Marketing Context

B1.4.1 E-commerce Marketing Context Task

Suppose you currently work at an e-commerce company responsible for daily operations of the company's e-commerce stores. To increase store revenue, the company has formulated two different marketing plans. For each store, the implementation effects of the two plans are:

If Plan A is adopted, each store will stably earn ¥20,000 monthly.
If Plan B is adopted, each store has a 30% probability of earning ¥100,000 monthly, but also a 70% probability of earning nothing.

B1.4.2 Single-Play Condition Question

Suppose you currently only operate 1 store. You need to choose one of the two plans to implement for this 1 store. Which plan would you prefer?

B1.4.3 Repeated-Play Condition Question

Suppose you currently operate 100 stores. You need to choose one of the two plans to implement for all 100 stores. Which plan would you prefer?

B2 Strategy Text Generation Prompt Design

B2.1 Single-Play Condition: Prompts to Encourage Risky Option Choice

Please write a persuasive message (≤150 characters) encouraging individuals to choose Option B in the task context, focusing on single-application scenarios without mentioning multiple applications.

(1) Medical context
Theme content: Present A2.1 content
Task context: Present B1.1.1 + B1.1.2 content

(2) Financial context
Theme content: Present A2.1 content
Task context: Present B1.2.1 + B1.2.2 content

(3) Content creation context
Theme content: Present A2.1 content
Task context: Present B1.3.1 + B1.3.2 content

(4) E-commerce marketing context
Theme content: Present A2.1 content
Task context: Present B1.4.1 + B1.4.2 content

B2.2 Repeated-Play Condition: Prompts to Encourage Certain Option Choice

Please write a persuasive message (≤150 characters) encouraging individuals to choose Option A in the task context, focusing on repeated-application scenarios without mentioning single applications.

(1) Medical context
Theme content: Present A2.2 content
Task context: Present B1.1.1 + B1.1.3 content

(2) Financial context
Theme content: Present A2.2 content
Task context: Present B1.2.1 + B1.2.3 content

(3) Content creation context
Theme content: Present A2.2 content
Task context: Present B1.3.1 + B1.3.3 content

(4) E-commerce marketing context
Theme content: Present A2.2 content
Task context: Present B1.4.1 + B1.4.3 content

B3 GPT-4o Generated Strategy Texts

B3.1 Medical Context

B3.1.1 Single-Play: Encouraging Certain Option A

In this situation, Plan A's expected value calculation saves 200 lives, significantly ensuring a stable outcome. Although Plan B offers high return potential (possibly saving 600 lives), its 60% failure probability means no one can be saved. Therefore, choosing Plan A's rational decision is based on higher certainty, maximizing expected life-saving benefits, reducing potential risk, and conforming to expected value maximization principles.

B3.1.2 Single-Play: Encouraging Risky Option B

Choosing Plan B allows rational decision-making based on expected value calculation. Although Plan B carries high risk (3/5 probability of saving no one), it offers a 2/5 probability of saving all 600 lives, with expected value = 2/5×600 = 240 lives, significantly higher than Plan A's 200 lives. Despite the risk, Plan B's expected benefit is more desirable; rational decision-makers should consider expected value rather than single-outcome fluctuations.

B3.1.3 Repeated-Play: Encouraging Certain Option A

In repeated-application contexts, Plan A is more rational. Choosing Plan A each time saves 200 lives; long-term cumulative life-saving effects are more stable. Although Plan B has high-return potential, each loss risk is substantial, and expected effects may not materialize in repeated applications. Choosing Plan A maximizes long-term expected benefits, ensuring stable returns each time and avoiding uncertainty from probability fluctuations.

B3.1.4 Repeated-Play: Encouraging Risky Option B

In repeated-application contexts, choosing Plan B maximizes expected benefits. Despite 3/5 probability of saving no one, 2/5 high probability can save all 600 lives, bringing huge positive returns. Based on expected value calculation, Plan B's expected benefit is 2/5×600 = 240 lives, while Plan A's fixed benefit is only 200 lives. Choosing Plan B can ensure long-term life-saving benefits through repeated applications, despite short-term fluctuations.

B3.2 Financial Context

B3.2.1 Single-Play: Encouraging Certain Option A

Reason for choosing Plan A: Although Plan B may bring extremely high returns, its extremely low probability (0.1%) means getting nothing in most cases. In contrast, Plan A provides certain ¥5,000 return, ensuring stable income and avoiding psychological pressure and potential losses from excessive risk. Therefore, from expected value perspective, Plan A is the rational decision preference.

B3.2.2 Single-Play: Encouraging Risky Option B

Based on mathematical rational decision-making of expected value, choosing Plan B is more advantageous. Although Plan B's high-return probability is extremely low, its expected return calculation is 0.001 × 6,000,000 = 6,000 yuan, significantly higher than Plan A's 5,000 yuan. Therefore, based on expected return maximization principle, Plan B offers greater potential returns, suitable for decision-makers willing to accept minimal risk for high returns.

B3.2.3 Repeated-Play: Encouraging Certain Option A

When facing financial emergencies, choosing Plan A ensures you receive stable ¥5,000 return each time, while Plan B, despite extremely low-probability high returns, yields nothing in most cases. Considering long-term stability and risk control, Plan A provides clear returns, conforming to rational decision-making of expected value maximization. Therefore, choosing Plan A helps maintain stable financial status long-term.

B3.2.4 Repeated-Play: Encouraging Risky Option B

Choosing Plan B will bring higher long-term returns in repeated applications. Although each play only has 0.1% success probability, once successful, the return will be significant ¥6 million, sufficient to cover losses from multiple failures. Through repeated applications, Plan B's expected benefits will significantly exceed Plan A's fixed ¥5,000. As application times increase, success probability will gradually manifest, providing higher expected returns for rational decision-makers long-term.

B3.3 Content Creation Context

B3.3.1 Single-Play: Encouraging Certain Option A

In scenarios with only one choice opportunity, Plan A brings 300 followers with 100% probability, stable return, zero risk, making it more rational. In contrast, Plan B, despite high-return potential, has 60% failure probability meaning likely getting nothing. When unable to bear failure loss yet wanting to ensure growth, choosing A better conforms to mathematical rationality and robust decision-making principles.

B3.3.2 Single-Play: Encouraging Risky Option B

Although Plan B carries some risk, its expected benefit is significantly higher than Plan A. Calculation shows Plan B's expected follower increase is 0.4×1000 + 0.6×0 = 400 people, exceeding Plan A's 300 people. This means that in the current single-release opportunity context, choosing B better maximizes potential benefits, bringing greater exposure and growth opportunities to the account, making it a more rational decision.

B3.3.3 Repeated-Play: Encouraging Certain Option A

In contexts requiring repeated video releases, choosing Plan A is wiser. Each video stably brings 300 followers, meaning 100 videos will accumulate 30,000 followers without bearing uncertainty from fluctuations. Plan B's expected value is only 400 followers, seemingly higher, but 60% failure probability will frequently bring "zero growth" disappointment and increase psychological pressure. In repeated applications, stable returns outweigh uncertain gambles; choosing A is dual guarantee of rationality and robustness.

B3.3.4 Repeated-Play: Encouraging Risky Option B

In multiple releases, choosing Plan B will significantly enhance account's overall growth potential. Although single video has 60% probability of no follower increase, 40% success rate can bring 1000 followers, with single expected value of 400 people. If releasing 100 videos continuously, expected acquisition is about 40,000 followers, far exceeding Plan A's 30,000 people. In long-term operations, high returns cover low returns; rational choices should be based on expected value maximization and cumulative effects, making Plan B the optimal solution.

B3.4 E-commerce Marketing Context

B3.4.1 Single-Play: Encouraging Certain Option A

Choosing Plan A is the optimal choice for rational decision-makers in single tasks. Although Plan B's high returns seem attractive, its 70% probability brings zero returns with high risk. In contrast, Plan A provides stable, predictable ¥20,000 income without bearing huge uncertainty. When only responsible for 1 store, rationality should prioritize guaranteed returns to avoid fatal blow to overall performance from one decision failure.

B3.4.2 Single-Play: Encouraging Risky Option B

Choosing Plan B is a rational and worthwhile decision. Although 70% probability of no income exists, 30% high return (¥100,000) makes Plan B's single expected return as high as ¥30,000, far exceeding Plan A's fixed ¥20,000. When currently only responsible for 1 store, choosing the option with higher expected value is expected to maximize benefits in one decision, representing smart operators' judgment.

B3.4.3 Repeated-Play: Encouraging Certain Option A

In repeated application contexts, Plan A's stable returns far exceed Plan B's volatile results. Assuming operation of 100 stores, choosing Plan A brings stable ¥2 million monthly income; choosing Plan B, expected return is only 100×(0.3×100,000) = ¥3 million, seemingly higher, but its 70% failure probability brings huge uncertainty and income fluctuations. Rational decision-making should be based on expected value while considering risk controllability. Choosing Plan A better facilitates stable enterprise operation and long-term development.

B3.4.4 Repeated-Play: Encouraging Risky Option B

When facing long-term operation of 100 stores, choosing Plan B has more rational advantages. Although monthly failure probability is 70%, each store's expected return is 0.3×100,000 = ¥30,000, far exceeding Plan A's ¥20,000. Viewed in repeated applications, expected value stabilizes under law of large numbers, achieving higher total income overall. Only some stores need to achieve high returns to cover losses at other stores, making it a rational choice for profit maximization.

B4 Control Group Texts

B4.1 Medical Context

B4.1.1 Certain Option A Control Text
Under Plan A implementation, scientific assessment indicates that after intervention measures, 200 lives are expected to be saved. This number reflects the plan's potential effect in responding to disease outbreaks.

B4.1.2 Risky Option B Control Text
Under Plan B implementation, two outcomes will occur according to probability distribution: when 40% probability event occurs, all 600 people will be successfully rescued; when 60% probability event occurs, no rescue will be achieved.

B4.2 Financial Context

B4.2.1 Certain Option A Control Text
Under Plan A implementation, certain funds of ¥5,000 will be obtained. This amount is fixed and unaffected by probabilistic factors. This number reflects the effect of choosing Plan A.

B4.2.2 Risky Option B Control Text
When adopting Plan B, there is 0.1% probability of obtaining ¥6 million and 99.9% probability of no financial gain. This assessment result is based on probability calculations of event occurrence.

B4.3 Content Creation Context

B4.3.1 Certain Option A Control Text
Under Plan A implementation, new video releases can bring 300 followers. This follower count is a determined value unaffected by other factors.

B4.3.2 Risky Option B Control Text
Under Plan B implementation, new video releases have 40% probability of gaining 1000 followers and 60% probability of gaining no followers.

B4.4 E-commerce Marketing Context

B4.4.1 Certain Option A Control Text
Under Plan A implementation, each store can obtain stable ¥20,000 monthly income. This value remains constant across months with fixed return performance.

B4.4.2 Risky Option B Control Text
Under Plan B implementation, each store has 30% probability of obtaining ¥100,000 monthly income and 70% probability of obtaining no income.

C Study 2 Experimental Procedures and Materials

C1 Study 2 Experiment 1 Procedure Schematic

The experimental procedure is as follows: First, participants read instructions and confirm voluntary participation. Then they enter decision contexts. Decision contexts consist of four types ("medical/finance × single/repeated gamble") that participants complete in random order. Based on their choices in each task, the questionnaire presents a corresponding GPT-generated strategy text, requiring participants to rate its similarity to their actual thinking process. Finally, participants fill in basic demographic information. The procedure is shown in Figure S1.

Figure S1 Study 2 Experiment 1 questionnaire flow schematic

C2 Study 2 Experiment 1 Materials

C2.1 Instructions

Hello! We are the Behavioral Decision-Making Research Team from Guangdong University of Technology's School of Management. Thank you very much for participating in this experiment!

Please carefully read the instructions and complete some choice tasks or fill out electronic questionnaires as required. All task answers have no right or wrong之分; please respond according to your true situation.

You will receive compensation after the experiment. All compensation will be distributed uniformly after completing all tasks. You have the right to withdraw from the experiment at any time without consequences. However, withdrawing midway means you will not receive corresponding compensation.

Information collected in this experiment will only be used for scientific research. We will keep your data confidential. Public publications will not contain your identity information.

Informed consent: If you continue to the next page to start answering, it means:
- You have read the above information
- You voluntarily agree to participate and will take this experiment seriously
Otherwise, please close the link to exit this experiment.

C2.2 Decision Tasks

(Present tasks for participants to decide, using Likert 4-point scale: 1 = very likely to choose Option A, 2 = likely to choose Option A, 3 = likely to choose Option B, 4 = very likely to choose Option B. Specific content see Supplementary Material B1, including "medical/finance contexts × single/repeated gambles")

C2.3 Strategy Acceptance Evaluation

Task example: Medical context × single-play gamble
In the disease problem (1) you just completed, you chose Option A:

If Plan A is adopted, 200 lives will be saved.
If Plan B is adopted, there is a 2/5 probability of saving all 600 lives, and a 3/5 probability that no one can be saved.

Suppose the plan you choose will be applied only once. Which plan would you prefer?

How similar do you think the following decision rationale is to your actual thinking process when answering the disease problem?

"In this situation, Plan A's expected value calculation saves 200 lives, significantly ensuring a stable outcome. Although Plan B offers high return potential (possibly saving 600 lives), its 60% failure probability means no one can be saved. Therefore, choosing Plan A's rational decision is based on higher certainty, maximizing expected life-saving benefits, reducing potential risk, and conforming to expected value maximization principles."

(1 = very dissimilar, 7 = very similar)
☐ 1 ☐ 2 ☐ 3 ☐ 4 ☐ 5 ☐ 6 ☐ 7

C2.4 Demographic Information Collection

Your gender:
☐ Male ☐ Female
Your age: ____ years (please enter integer)
Your monthly disposable income:
☐ Below ¥1,000 ☐ ¥1,001-1,500 ☐ ¥1,501-2,000 ☐ Above ¥2,001
Please select your education level:
☐ Junior high or below ☐ High school/technical secondary/vocational ☐ Junior college ☐ Bachelor's ☐ Master's or above
Your major category:
☐ Science/engineering ☐ Economics/management ☐ Arts ☐ Medicine ☐ Law ☐ Other

C3 Study 2 Experiment 2 Procedure Schematic

The experimental procedure is as follows: First, participants read instructions and confirm voluntary participation. Then they enter decision contexts. Decision contexts consist of four types ("content creation/e-commerce marketing × single/repeated gamble"), with context as between-subjects design and gamble frequency as within-subjects design. Participants are randomly assigned to either context and complete single-play and repeated-play gambles in random order. Based on their choices in each task, the questionnaire presents a corresponding GPT-generated strategy text, requiring participants to rate its similarity to their actual thinking process. Finally, participants fill in basic demographic information. The questionnaire flow is shown in Figure S2.

Figure S2 Study 2 Experiment 2 questionnaire flow schematic

C4 Study 2 Experiment 2 Materials

C4.1 Instructions

(Content identical to Supplementary Material C2.1, omitted here)

C4.2 Decision Tasks

(Present tasks for participants to decide. Specific content see Supplementary Material B1, including "content creation/e-commerce marketing contexts × single/repeated gambles")

C4.3 Strategy Acceptance Evaluation

Task example: Content creation context × single-play gamble
In the creation problem (1) you just completed, you chose Option A:

If Plan A is adopted, the account will gain 300 followers.
If Plan B is adopted, there is a 40% probability of gaining 1000 followers, but also a 60% probability of gaining no followers.

Suppose you can only release 1 new video. You need to choose one of the two plans to implement for this 1 video. Which plan would you prefer?

How similar do you think the following decision rationale is to your actual thinking process when answering the creation problem?

"In scenarios with only one choice opportunity, Plan A brings 300 followers with 100% probability, stable return, zero risk, making it more rational. In contrast, Plan B, despite high-return potential, has 60% failure probability meaning likely getting nothing. When unable to bear failure loss yet wanting to ensure growth, choosing A better conforms to mathematical rationality and robust decision-making principles."

(1 = very dissimilar, 7 = very similar)
☐ 1 ☐ 2 ☐ 3 ☐ 4 ☐ 5 ☐ 6 ☐ 7

C4.4 Demographic Information Collection

Your gender:
☐ Male ☐ Female
Your age: ____ years (please enter integer)
Have you graduated:
☐ Yes ☐ No (if "No," skip to "6. Please select your education level")
Your work experience:
☐ 2 years or less ☐ 3-5 years ☐ 6 years or more
Your current/last job's enterprise nature:
☐ Government/public institution ☐ State-owned enterprise ☐ Wholly foreign-owned enterprise
☐ Joint venture ☐ Other (please specify): __
Please select your education level:
☐ Junior high or below ☐ High school/technical secondary/vocational ☐ Junior college
☐ Bachelor's ☐ Master's or above
Your major category:
☐ Science/engineering ☐ Economics/management ☐ Arts ☐ Medicine ☐ Law ☐ Other

D Study 3 Experimental Procedures and Materials

D1 Study 3 Experiment 1 Procedure Schematic

Participants first read the context. The questionnaire then presents control or intervention group texts based on their group assignment. Participants make choices after reading the texts. The procedure is shown in Figure S3.

Figure S3 Study 3 Experiment 1 questionnaire flow schematic

D2 Study 3 Experiment 1 Materials

D2.1 Instructions

(Content identical to Supplementary Material C2.1, omitted here)

D2.2 Decision Tasks

Example context: Medical context × single-play gamble (intervention direction: encourage risky Option B)
Please imagine you are an emergency management decision-maker. A disease outbreak is expected to cause 600 deaths. You can choose either of two emergency plans to respond to the disease. Suppose the following are scientifically estimated outcomes of these two plans:

If Plan A is adopted, 200 lives will be saved.
If Plan B is adopted, there is a 2/5 probability of saving all 600 lives, and a 3/5 probability that no one can be saved.
Control group text (between-subjects):
Under Plan B implementation, two outcomes will occur according to probability distribution: when 40% probability event occurs, all 600 people will be successfully rescued; when 60% probability event occurs, no rescue will be achieved.
Intervention group text (between-subjects):
Choosing Plan B allows rational decision-making based on expected value calculation. Although Plan B carries high risk (3/5 probability of saving no one), it offers a 2/5 probability of saving all 600 lives, with expected value = 2/5×600 = 240 lives, significantly higher than Plan A's 200 lives. Despite the risk, Plan B's expected benefit is more desirable; rational decision-makers should consider expected value rather than single-outcome fluctuations.

Suppose the plan you choose will be applied only once. Which plan would you prefer?

D2.3 Demographic Information Collection

(Content identical to Supplementary Material C2.4, omitted here)

D3 Study 3 Experiment 2 Procedure Schematic

This experiment uses between-subjects design of control vs. intervention groups to test whether LLM-generated strategy texts can significantly influence individual decision-making behavior, adding "content creation context" and "e-commerce marketing context" to enhance generalizability and external validity. Participants first read the context. The questionnaire then presents control or intervention group texts based on group assignment. Participants make choices after reading the texts. The procedure is shown in Figure S4.

Figure S4 Study 3 Experiment 2 questionnaire flow schematic

D4 Study 3 Experiment 2 Materials

D4.1 Instructions

(Content identical to Supplementary Material C2.1, omitted here)

D4.2 Decision Tasks

Example context: Content creation context × single-play gamble (intervention direction: encourage risky Option B)
Suppose you are a self-media account creator currently preparing new video releases. To increase follower count, you have two production plan options for new video content. For each new video, the implementation effects of the two plans are:

If Plan A is adopted, the account will gain 300 followers.
If Plan B is adopted, there is a 40% probability of gaining 1000 followers, but also a 60% probability of gaining no followers.
Control group text (between-subjects):
Under Plan B implementation, new video releases have 40% probability of gaining 1000 followers and 60% probability of gaining no followers.
Intervention group text (between-subjects):
Although Plan B carries some risk, its expected benefit is significantly higher than Plan A. Calculation shows Plan B's expected follower increase is 0.4×1000 + 0.6×0 = 400 people, exceeding Plan A's 300 people. This means that in the current single-release opportunity context, choosing B better maximizes potential benefits, bringing greater exposure and growth opportunities to the account, making it a more rational decision.

Suppose you can only release 1 new video. You need to choose one of the two plans to implement for this 1 video. Which plan would you prefer?

D4.3 Demographic Information Collection

(Content identical to Supplementary Material C4.4, omitted here)

E Additional Data and Analysis Results

Table S1 Comparison of DeepSeek-R1 with other mainstream models

Evaluation Benchmark (Metric) Claude-3.5-Sonnet-1022 GPT-4o DeepSeek-R1 OpenAI o1-mini OpenAI o1-1217 AIME2024 (Pass@1) MATH-500 (Pass@1) LiveCodeBench (Pass@1-COT) Codeforces (Percentile) AlpacaEval2.0 (LC-winrate) ArenaHard (GPT-4-1106) CLUEWSC (EM) C-Eval (EM)

Table S2 Demographic distribution of participants in content evaluation

Age Group Medical/Finance (%) Content Creation/E-commerce (%) 19 and below 6.67/7.32 20-22 26.67/17.07 23-25 56.67/70.73 26 and above 10.00/4.88 Work Experience Medical/Finance (%) Content Creation/E-commerce (%) 2 years or less 20.00/17.65 3-5 years 15.00/20.59 6 years or more 65.00/61.76 Education Level Medical/Finance (%) Content Creation/E-commerce (%) College or below 6.67/14.63 Bachelor's 63.33/56.10 Master's or above 30.00/29.27

Note: "Work experience" excludes student participants. In each column, "/" separates medical/finance group from content creation/e-commerce group.

Table S3 Content evaluation results

Dimension Medical/Finance Content Creation/E-commerce t p Overall rating 39.48/68.91 <0.001/<0.001 Rationality 19.81/32.72 <0.001/<0.001 Professionalism 20.14/32.44 <0.001/<0.001 Logicality 16.52/43.91 <0.001/<0.001 Readability 16.39/42.64 <0.001/<0.001 Persuasiveness 17.64/43.27 <0.001/<0.001

Note: In each column, "/" separates medical/finance context results from content creation/e-commerce context results.

Table S4 Demographic distribution of participants in Study 2 Experiment 1

Age Group % 19 and below 20-22 23-25 26 and above Monthly Disposable Income % Below ¥1,000 ¥1,001-1,500 ¥1,501-2,000 Above ¥2,001 Education Level % High school/technical/vocational College Bachelor's Master's or above

Table S5 Strategy acceptance ratings in Study 2 Experiment 1

Condition t(df) p Cohen's d Medical-single t(348) = 63.04 <0.001 Medical-repeated t(348) = 37.42 <0.001 Finance-single t(348) = 36.00 <0.001 Finance-repeated t(348) = 45.87 <0.001 Overall t(348) = 46.16 <0.001

Table S6 Demographic distribution of participants in Study 2 Experiment 2

Age Group % 19 and below 20-22 23-25 26 and above Work Experience % 2 years or less 3-5 years 6 years or more Education Level % College or below Bachelor's Master's or above

Note: "Work experience" excludes 1/3 student participants.

Table S7 Strategy acceptance ratings in Study 2 Experiment 2

Condition t(df) p Cohen's d Creation-single t(209) = 41.78 <0.001 Creation-repeated t(104) = 23.12 <0.001 Marketing-single t(104) = 19.64 <0.001 Marketing-repeated t(104) = 25.14 <0.001 Overall t(104) = 29.60 <0.001

Table S8 Demographic distribution of participants in Study 3 Experiment 1

Table S9 Demographic distribution of participants in Study 3 Experiment 2

Age Group % 19 and below 20-22 23-25 26 and above Work Experience % 2 years or less 3-5 years 6 years or more Education Level % College or below Bachelor's Master's or above

Note: "Work experience" excludes 1/3 student participants.

Submission history

[v1] 2025-12-07