ChinaRxiv

The Hallucination Challenge in Large Model Agents: Causes, Risks, and Countermeasures (Postprint)

Xu Qi, Sun Zhipu

Submitted 2025-07-09 | ChinaXiv: chinaxiv-202507.00319

Note: Figures in this paper have not yet been translated.

Abstract

Purpose: The hallucination and risk issues of large model agents are becoming increasingly prominent, and an in-depth analysis of their causes, risk manifestations, and countermeasures holds significant theoretical and practical importance.

Methods: Addressing the theoretical and practical needs in the field of journalism and communication, this study is primarily based on interdisciplinary literature research and theoretical analysis.

Results: Agent hallucination refers to a series of unavoidable errors at the model generation level, where generated content is illogical or unfaithful to the provided source content. These can be mainly categorized into two types: factual hallucination and faithfulness hallucination. The former includes factual errors, fabrication, and omission, while the latter encompasses inconsistency in intent, context, and logic. In downstream applications, hallucination risks widely exist in tasks such as machine translation, question answering systems, dialogue, summarization, knowledge graphs, and visual question answering, manifesting as translation deviation, incomplete answers, information distortion, etc., thereby jeopardizing content authenticity and accuracy.

Conclusion: To address the hallucination challenge, the media industry must first strengthen risk awareness and technological literacy at the cognitive level. Technically, Retrieval-Augmented Generation (RAG) and factual decoding strategies can be adopted. Procedurally, human-machine collaboration workflows should be improved, and verification and multi-dimensional evaluation systems enhanced to balance agent effectiveness and reliability.

Full Text

Preamble

The Hallucination Challenge in Large Model-Based AI Agents: Causes, Risks, and Countermeasures

Qi Xu, Zhipu Sun
(State Key Laboratory of Media Convergence and Communication, New Media Research Institute, Communication University of China, Beijing 100024)

Abstract

Purpose: As hallucinations and associated risks in large model-based AI agents become increasingly prominent, a thorough analysis of their causes, risk manifestations, and countermeasures holds significant theoretical and practical importance. Method: Addressing theoretical and application needs in journalism and communication, this study draws primarily on interdisciplinary literature review and theoretical analysis. Results: AI agent hallucination refers to a series of inevitable generation errors at the model layer where outputs become illogical or unfaithful to source content. These are mainly categorized into factual hallucinations and faithfulness hallucinations. The former includes factual errors, fabrication, and neglect, while the latter encompasses inconsistencies in intent, context, and logic. In downstream applications, hallucination risks pervade tasks such as machine translation, question answering, dialogue systems, summarization, knowledge graphs, and visual question answering, manifesting as translation deviations, incomplete responses, information distortion, and other issues that jeopardize content authenticity and accuracy. Conclusion: To address the hallucination challenge, the media industry must first strengthen risk awareness and technical literacy at the cognitive level. Technically, retrieval-augmented generation and factuality-enhanced decoding strategies should be employed. At the process level, human-machine collaboration workflows must be improved, with enhanced verification and multi-dimensional evaluation systems to balance agent effectiveness and reliability.

Keywords: AI agent; large model; hallucination; intelligent media; intelligent communication
Classification Code: G222
Document Code: A
Article ID: 1671-0134(2025)05-07-08
DOI: 10.19483/j.cnki.11-4653/n.2025.05.001
Citation Format: Xu Q, Sun Z. The Hallucination Challenge in Large Model-Based AI Agents: Causes, Risks, and Countermeasures [J]. China Media Technology, 2025, 32(5): 7-14.

1. Problem Statement: The "Hallucination" of the AI Agent Brain

When asked to explain routing issues around Xi'an's Anding Gate, the high-performance DeepSeek-R1 model fabricated a "silent zone" concept from the Xi'an Historical and Cultural City Protection Plan and even invented a vibration control standard (GB/T 5845-2019) \cite{1}. This phenomenon—where large models generate text that is illogical or unfaithful to provided source content—is termed "hallucination," a persistent problem in large models \cite{2}. In Vectara HHEM tests, DeepSeek-V3 exhibited a hallucination rate of 3.9%, while DeepSeek-R1's rate reached 14.3% \cite{3}. Although large models represent a leap in natural language capabilities, testers note that they produce more subtle errors in practical use \cite{4}. Behind enhanced fluency and logical reasoning lies a more insidious cognitive risk: hallucinations have evolved from easily detectable commonsense errors and fabricated citations to less perceptible forms such as invented technical jargon, forged document numbers, and cross-disciplinary knowledge pastiche.

For the media industry, most hallucination research focuses on peripheral issues like cognitive risks \cite{5}, cultural consequences \cite{6}, and misinformation governance \cite{7}, rather than the essential nature of hallucinations. This study introduces computer science perspectives on hallucination to provide theoretical advances in the epistemology of hallucination, offering a more comprehensive scientific understanding of the problem and recommendations for how media practitioners should understand and address hallucinations in practice.

2.1 Architecture Traceability: The Behavioral Decision-Making Mechanism of Large Model-Based AI Agents

Large models, also known as foundation models, are large-scale foundation models typically built on deep neural network architectures \cite{8}, pretrained with billions to hundreds of billions of diverse parameters. Supported by substantial computational resources, they learn extensive knowledge and exhibit strong generalization capabilities, producing versatile models capable of handling multiple tasks including natural language processing and question answering with significant accuracy. Large models are essentially large language models (LLMs) with language as their output modality. Initially focused on language understanding and reasoning, recent efforts have developed multimodal large language models (MLLMs) that accept images, video, audio, and text to solve more complex language understanding and reasoning tasks \cite{9}.

AI agents represent the concrete implementation of large models at the application layer, serving as the "brain" or controller in agent architectures \cite{10}. Large models guide various agent components during perception, decision-making, and action, enabling not only natural language dialogue but also strong adaptability and generalization when facing multi-task scenarios and novel situations, allowing for highly autonomous and flexible behavioral decisions \cite{11}. Different modality-specific large models exhibit processing biases, offering possibilities for expanding agent capabilities. For instance, large vision-language models (LVLMs) leverage large-scale image-text pretraining to directly match any given image and text for zero-shot prediction \cite{12}, providing agents with language-aligned universal visual encoders and zero-shot visual recognition capabilities.

Consequently, AI agent hallucinations are essentially extensions of large model hallucinations. Agents inherit both the generative capabilities and the hallucination tendencies of their underlying models.

2.2 Defining Hallucination: Model Endogeneity and Manifestations

In general contexts, hallucination originates from pathology and psychology, defined as a perceptual experience that feels real—"a perception experienced by a conscious individual in the absence of appropriate external stimulation" \cite{13}. Large models producing untrue or meaningless text share similar characteristics with such psychological phenomena, leading computer scientists to adopt "hallucination" to explain quality-level errors in model generation.

Specifically, AI agent hallucination refers to inevitable generation errors at the model layer where outputs become illogical or unfaithful to provided source content \cite{14}. More concretely, hallucinations should be understood as fictional, misleading, or fabricated details, facts, or claims generated by large models rather than authentic or reliable text \cite{15}. Hallucinations are model outputs that conflict with constraints, are incorrect, involve flawed reasoning about generated text, produce unsubstantiated or mis-cited claims, deviate from expected deployment behavior, or are entirely task-irrelevant yet syntactically plausible (i.e., sounding coherent).

Notably, the term "hallucination" is not universally accepted. Some scholars argue these errors, and indeed large language models' overall behavior, are best understood as Frankfurt's concept of "bullshit" \cite{17}—the model's indifference to the truthfulness of its output. This suggests "hallucination" may not be an irreplaceable concept.

2.3 Classification Overview: Analyzing Factual and Faithfulness Hallucinations

Hallucination has long concerned traditional natural language generation tasks, but the problem is more complex in large models. Generally, NLG hallucinations fall into two main types: intrinsic hallucinations (contradicting source content) and extrinsic hallucinations (unverifiable from source content) \cite{18}. Extrinsic hallucinations are those where we can neither find evidence in the source nor assert they are wrong \cite{19}.

For large models, this classification, focused on NLP scenarios, inadequately captures their multifunctionality and task complexity, revealing limitations of task-specific taxonomies. This study adopts Huang et al.'s context-based classification \cite{20}, dividing hallucinations into factual hallucination and faithfulness hallucination, while incorporating fact neglect in multimodal models \cite{21} into the factual category.

Factual hallucination refers to generated content that contradicts or cannot be verified against real-world facts, comprising factual errors, fabrication, and neglect. Factual errors stem from mistakes in capturing, storing, and expressing factual knowledge, where outputs contradict real-world information—specifically, event information errors (incorrect constituent elements) and event relationship errors (incorrect relationships between elements, such as mismatches between events and spatiotemporal contexts). Fabrication involves outputs that cannot be validated against real knowledge, lacking real-world basis, including overgeneralizations lacking universal validity due to subjective bias. Fact neglect occurs when multimodal models ignore portions of original text, such as omitting disaster environment descriptions when captioning disaster news imagery.

Faithfulness hallucination refers to outputs inconsistent with user instructions or internal logic, subdivided into intent inconsistency, context inconsistency, and logical inconsistency. Intent inconsistency means outputs deviate from user instructions (excluding safety-motivated deviations). For example, a model might mistakenly perform question answering when translation was intended. Context inconsistency means outputs contradict user-provided context. Logical inconsistency refers to internal contradictions in model outputs, commonly observed in reasoning tasks where steps contradict each other or the final answer.

Cross-modally, despite LLMs' enhanced language capabilities, LVLMs still exhibit "object hallucinations" \cite{29}—common in visual question answering, image captioning, and report generation—where generated descriptions mismatch actual visual content, revealing persistent alignment gaps between visual information and text.

3. Risk Landscape: Hallucination Penetration in Downstream Applications

In downstream applications, AI agents face various tasks, notably machine translation, question answering, dialogue systems, summarization, knowledge graphs, and visual question answering. Hallucinations manifest differently across tasks, with risks lurking beneath.

In machine translation, hallucinations appear as translation deviation (complete thematic divergence from source text while remaining fluent), over-generation (producing excessive, unnecessary content making translations verbose and complex), or translation failure (inability to generate reasonable translations due to input complexity or model limitations) \cite{22}.

In question answering, models rely on external knowledge and memorized prompts. When this knowledge is flawed or recall is insufficient, models provide incomplete yet plausible answers \cite{23}. Without available information, models still attempt to answer, producing inaccurate or partial responses \cite{24}. Without accurate, reliable, and accessible sources for stored memory, models may generate answers based on erroneous or outdated information that is difficult to verify.

In dialogue, conversational models primarily mimic data distribution characteristics rather than generate faithful outputs. This means models may simply copy or recite training patterns without truly understanding context. Due to discourse phenomena, some models produce "uncooperative" responses \cite{25}, outputting complete evidentiary texts instead of precise answers, or introducing informational biases and inaccurate details \cite{26}.

In summarization, while LLM-generated summaries are fluent, they often lack faithful representation of original documents, underperforming traditional summarization models in human evaluations. Summaries may distort existing information or introduce extraneous information absent from the source \cite{27}.

In knowledge graph construction and generation, models not only cover input information but may incorporate redundant details from internal memory, causing knowledge hallucinations \cite{28}. Users must distinguish between "correctly generated knowledge" and "knowledge hallucinations."

Object hallucinations are common in cross-modal tasks like visual question answering, indicating that even with strong language generation, models struggle to align visual and textual information, generating descriptions inconsistent with actual image content \cite{29}.

For content production, hallucinations mean constant vigilance against untruthful generation. For users, hallucination propagation poses potential knowledge dissemination risks.

4. Mechanism Analysis: Triple Defects in Data, Training, and Inference

Hallucinations stem from inherent generation mechanisms and knowledge update difficulties in models, reflecting LLMs' fundamental limitations in knowledge storage, fact verification, and logical reasoning rather than application-level design flaws. Hallucinations can emerge at every stage: data mismatches, errors, and biases plant seeds; training objective design, knowledge boundaries, and inadequate human feedback adaptation amplify them; decoding strategies, attention mechanisms, and logical inference capabilities manifest them in final outputs.

4.1 Data Processing Bias: Root Causes of Hallucination

Large models learn knowledge through statistical patterns in massive multi-source pretraining data. While data forms the foundation of model capabilities, its sheer scale from internet sources makes quality assurance impossible, inevitably causing models to absorb and reproduce untrue information. Variations in training data scale and coverage, plus difficulties in acquiring specialized knowledge, all create vulnerabilities \cite{30}.

Source-target mismatch is a primary issue. In large dataset construction, "source" (input information) and "target" (expected output) are key concepts. Mismatch means models learn inaccurate associations, generating hallucinations during text generation \cite{31}. Additionally, repeated examples in pretraining corpora cause models to memorize and reproduce phrases, creating inappropriate "repetitive" hallucinations in downstream tasks \cite{32}.

Innate divergence in data also causes hallucinations. In NLG tasks, especially open-domain dialogue systems requiring natural, fluent, and engaging responses, models are allowed to generate diverse replies that may include subjective opinions, chitchat, or content lacking precise factual support. This task characteristic frees models from strict factual alignment with source information \cite{33}.

Pretraining data from the internet inevitably contains misinformation and biases \cite{34}. Neural networks' tendency to memorize training data means models may recall false information and output false statements, causing imitative falsehoods \cite{35}. Social biases embedded in social media platforms may also be learned and propagated, with certain biases related to gender, nationality, etc., closely linked to hallucinations.

Training data cannot cover all domains or latest information (e.g., recent scientific findings, legal texts), leaving models ill-equipped for long-tail domains or cutting-edge topics. When users ask beyond models' known scope, they may fabricate answers \cite{36}.

Inferior alignment data also affects hallucination \cite{37}. Foundation models undergo supervised fine-tuning (SFT) for downstream applications. If alignment data lacks quality or is overly complex and diverse, hallucinations intensify. Moreover, models forced to "learn" new knowledge during SFT may misalign with existing knowledge boundaries, causing factually misplaced generation \cite{38}.

4.2 Training Mechanism Defects: Inherent Limitations in Capability Acquisition

Limitations in pretraining, supervised fine-tuning, and reinforcement learning also induce hallucinations.

Autoregressive language model constraints and exposure bias significantly impact generation quality. GPT-style models use causal autoregressive prediction, where each word is predicted based only on preceding words. This makes it difficult to capture relationships between distant words, causing logical coherence issues \cite{39}. As sequence length increases, attention mechanisms may disperse, destabilizing long-range dependency reasoning.

Exposure bias arises from training-inference inconsistency, causing cumulative errors and hallucinations \cite{40}. During training, models use teacher-forcing maximum likelihood estimation (MLE), predicting each word based on ground-truth prefix sequences. During inference, predictions are based on model-generated sequences. This discrepancy can cause model-generated sequences to diverge from training-time sequences, creating cumulative errors. Once a wrong token is generated, subsequent content may compound the error in a "snowball effect," gradually deviating from truth \cite{41}.

SFT knowledge boundaries and over-fitting on new knowledge also cause hallucinations. SFT often requires models to output content beyond their original knowledge boundaries through human instructions. If models cannot effectively absorb this new knowledge, they may fabricate facts. Traditional training data cannot cover all domains, and SFT typically demands answers to every instruction without encouraging expressions of uncertainty \cite{42}. This lack of rejection mechanisms means when questions exceed model knowledge, models prefer fabricating answers over refusing, triggering frequent hallucinations.

In reinforcement learning from human feedback (RLHF), even if models internally judge an answer as potentially wrong, they may still output content contradicting their internal judgment to please human evaluators and obtain higher rewards—exhibiting sycophancy \cite{43}.

4.3 Inference Strategy Limitations: Dynamic Instability in Generation

The decoding stage involves models predicting probable words based on input prompts, generating answers token-by-token. To enhance diversity and creativity, randomness is often introduced (e.g., top-k sampling, temperature) \cite{44}. While helpful for diversity, this correlates positively with hallucination risk.

Attention mechanisms in long sequences often focus on local text, neglecting global context. Since models learn that local information is typically more important for next-word prediction, they may ignore overall input context when generating long texts, causing instruction forgetting or information confusion \cite{45}. Models may generate seemingly correct but factually baseless content based on local fluency.

The softmax bottleneck also contributes to hallucinations. When target outputs have multiple valid answers, the softmax function's probability distribution may present multiple peaks. However, combining softmax with distributed word vectors limits expressing this multimodal distribution, causing models to inadequately allocate probability among reasonable answers and potentially select inappropriate words, leading to distorted or hallucinated content \cite{46}.

Even when LLMs possess necessary knowledge, complex reasoning tasks like multi-hop question answering may exceed their reasoning limits, causing incorrect answers \cite{47}. Models may correctly answer "A is B" but fail to logically infer "B is A," or miss intermediate connections in multi-hop reasoning, causing factual deviations. Overly complex or diverse instruction designs (e.g., multiple constraints) also significantly increase hallucination probability.

5. Countermeasures: Cognitive-Technical-Process Synergy

Hallucination poses a critical challenge for the media industry in the intelligent era, but through proper cognitive reframing, process optimization, technical selection, and research exploration, media organizations can balance leveraging intelligent technology while ensuring content authenticity and credibility.

5.1 Cognitive Elevation: Risk Awareness and Technical Literacy

The media industry must treat hallucination as an inherent risk in intelligent technology application and adopt corresponding risk management strategies. For AI agent model layers, large model data security, model security, and technical architecture security are critical, requiring standard cybersecurity practices and large model-specific protections \cite{48}. Data security involves robust measures against data poisoning and privacy leakage. Model security requires monitoring for subtle errors that are logically coherent but factually fabricated. Infrastructure security demands protection through firewalls, encryption, and physical safeguards against cyber and physical threats \cite{49}.

Media practitioners need fundamental technical literacy to understand agent workings, use cases, capabilities, and limitations, minimizing hallucinations in practice. Currently, practitioners must master prompt engineering with appropriate constraints, avoid overly complex reasoning, and maintain critical thinking to rigorously verify agent-generated content.

5.2 Technical Correction: Enhanced Generation and Dynamic Fact Constraints

Since hallucinations relate to data, training, and inference—and are partially unavoidable \cite{50}—and since LLMs compress trillions of word relationships into billions of parameters with inevitable information loss, data, training, and inference issues both cause problems and point to solutions.

At the data level, filtering can remove errors, biases, and inaccuracies, reducing misinformation in pretraining data. Model editing techniques can correct internal knowledge to prevent error solidification. Retrieval-Augmented Generation (RAG) combines external knowledge bases during generation, providing factual grounding.

At the training level, improved pretraining processes and optimized objectives can reduce exposure bias and hallucination risks from long-tail knowledge and vague concepts. During SFT, task and instruction design should adapt models to new information without encouraging fabrication. For RLHF, training strategies should prevent models from becoming overconfident and untruthful to please evaluators.

At the inference level, factuality-enhanced decoding strategies control sampling temperature and constrain probability distributions to ensure outputs align with facts. Faithfulness-enhanced decoding strengthens faithful expression of input instructions and context, avoiding logical or informational deviation.

5.3 Process Safeguards: Human Patching and Human-Machine Collaboration

Understanding agents' strengths and limitations reveals they are not omnipotent. The media industry must redefine intelligent technology's role. AI technology remains imperfect and requires human intervention and patching—Human Fix \cite{51}—to function properly. Automation depends on operators' rich experience and knowledge within specific social contexts, often overlooked yet essential for error correction, contextual interpretation, and decision-making.

Future research should explore more human-machine collaboration theories and methods \cite{52} to ensure accuracy and credibility. Current evaluation metrics struggle to comprehensively capture hallucinations. Future systems should be more nuanced and multi-dimensional, assessing not only surface grammatical fluency but also semantic consistency and factual accuracy.

In news writing, for example, agents' creative capabilities have surpassed traditional template-based generation for structured news, approaching professional journalist levels. However, unlike human creativity, large model generation is probability-based, with data as its foundation and hallucination as an inherent risk. This justifies concerns about repetitive work replacement \cite{53} while highlighting the "value-added" core of journalism: in-depth reporting and fact-checking. The industry should treat agents as assistive tools for productivity and creative support, with human journalists maintaining deep involvement and gatekeeping for critical reporting and fact-checking.

The hallucination challenge in large model-based AI agents reveals the deep contradiction between AI's "capability leap" and "trustworthy deployment." Its essence lies in the systematic coupling of data bias, training defects, and inference instability—not "program bugs." While current technical approaches like RAG and dynamic decoding constraints can mitigate some hallucinations, complete elimination remains constrained by models' inherent probabilistic generation logic and open-domain task complexity. Paradoxically, hallucination is a double-edged sword: some scholars describe large model creativity as generating tokens that are both original and diverse while maintaining contextual plausibility \cite{54}, suggesting hallucination may enhance creativity by exploring beyond the most probable token sequences. Hallucination may also inspire new ideas and perspectives, becoming a "collaborative creative partner" \cite{55}. This underscores the need for continued research on understanding hallucination and its cognitive impacts to move toward trustworthy human-machine symbiosis.

References

\cite{1} Juemingzi. DeepSeek is Building a "Hallucination Great Wall" on the Chinese Internet [EB/OL]. (2025-02-07) [2025-04-25]. https://mp.weixin.qq.com/s/aMy99RcCq62D9JvTgTUi7A.

\cite{2} Kalai A T, Vempala S S. Calibrated language models must hallucinate[C]. Proceedings of the 56th Annual ACM Symposium on Theory of Computing, 2024: 160-171.

\cite{3} Vectara. DeepSeek-R1 hallucinates more than DeepSeek-V3[EB/OL]. (2025-01-30) [2025-04-25], https://www.vectara.com/blog/deepseek-r1-hallucinates-more-than-deepseek-v3.

\cite{4} Nicola J. AI hallucinations can't be stopped—but these techniques can limit their damage[J]. Nature. 2025, 637(8047): 778-780.

\cite{5} Zhang Zheng, Liu Chenxu. Large Model Hallucinations: Cognitive Risks and Co-Governance Possibilities in Human-Machine Communication[J]. Journal of Soochow University (Philosophy & Social Sciences Edition), 2024, 45(5): 171-180.

\cite{6} Jing Yulun, Zhang Dianyuan. The Manufacturing Logic of Generative AI Illusions and Their Cultural Consequences of Hyperreality Construction[J]. Journal of Shandong Normal University (Social Sciences Edition), 2024, 69(5): 113-126.

\cite{7} Zhang Xinsheng, Wang Runzhou, Ma Yulong. Research on Challenges, Opportunities, and Strategies for Misinformation Governance in the AIGC Context[J/OL]. Information Science, 1-23[2025-06-05]. http://kns.cnki.net/kcms/detail/22.1264.G2.20241111.1002.024.html.

\cite{8} Chakraborty N, Ornik M, Driggs-Campbell K. Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art[J]. ACM Computing Surveys, 2025, 52(7): 1-35.

\cite{9} Wu J, Gan W, Chen Z, et al. Multimodal large language models: A survey[C]. 2023 IEEE International Conference on Big Data. IEEE, 2023: 2247-2256.

\cite{10} Xi Z, Chen W, Guo X, et al. The rise and potential of large language model based agents: A survey[J]. Science China Information Sciences, 2025, 68(2): 101-121.

\cite{11} Gong R, Huang Q, Ma X, et al. MindAgent: Emergent Gaming Interaction[C]. Findings of the Association for Computational Linguistics: NAACL 2024, 2024: 3154-3165.

\cite{12} Zhang J, Huang J, Jin S, Lu S. Vision-language models for vision tasks: A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(8): 5271-5291.

\cite{13} El-Mallakh R S, Walker K L. Hallucinations, psuedohallucinations, and parahallucinations[J]. Psychiatry: Interpersonal and Biological Processes, 2010, 73(1): 34-42.

\cite{14} Chakraborty N, Ornik M, Driggs-Campbell K. Hallucination detection in foundation models for decision-making: A flexible definition and review of the state of the art[J]. ACM Computing Surveys, 2025, 52(7): 1-35.

\cite{15} Sahoo P, Meharia P, Ghosh A, et al. A Comprehensive Survey of Hallucination in Large Language, Image, Video and Audio Foundation Models[C]. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024: 11709-11724.

\cite{16} Chen X, Wang C, Xue Y, et al. Unified Hallucination Detection for Multimodal Large Language Models[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 1: 3235-3252.

\cite{17} Hicks M T, Humphries J, Slater J. ChatGPT is bullshit[J]. Ethics and Information Technology, 2024, 26(2): 1-11.

\cite{18} Huang L, Yu W, Ma W, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55.

\cite{19} Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation[J]. ACM computing surveys, 2023, 55(12): 1-38.

\cite{20} Huang L, Yu W, Ma W, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55.

\cite{21} Chen X, Wang C, Xue Y, et al. Unified Hallucination Detection for Multimodal Large Language Models[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 2024, 1: 3235-3252.

\cite{22} Guerreiro N M, Alves D M, Waldendorf J, et al. Hallucinations in large multilingual translation models[J]. Transactions of the Association for Computational Linguistics, 2023, 11: 1500-1517.

\cite{23} Zheng L, Chiang W L, Sheng Y, et al. Judging llm-as-a-judge with mt-bench and chatbot arena[J]. Advances in Neural Information Processing Systems, 2023, 36: 49025-49043.

\cite{24} Adlakha V, Ghader B P, Lu X H, et al. Evaluating correctness and faithfulness of instruction-following models for question answering[J]. Transactions of the Association for Computational Linguistics 2024, 12: 681-699.

\cite{25} Dziri N, Milton S, Yu M, et al. On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?[C]. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022: 5271-5285.

\cite{26} Das S, Saha S, Srihari R K. Diving Deep into Modes of Fact Hallucinations in Dialogue Systems[C]. Findings of the Association for Computational Linguistics: EMNLP 2022, 2022: 684-699.

\cite{27} Qiu Y, Ziser Y, Korhonen A, et al. Detecting and Mitigating Hallucinations in Multilingual Summarisation[C]. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023: 8914-8932.

\cite{28} Yuan S, Faerber M. Evaluating Generative Models for Graph-to-Text Generation[C]. Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023: 1256-1264.

\cite{29} Li Y, Du Y, Zhou K, et al. Evaluating Object Hallucination in Large Vision-Language Models[C]. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023: 292-305.

\cite{30} Liu Zeyuan, Wang Pengjiang, Song Xiaobin, et al. A Survey on Hallucination Problems in Large Language Models[J]. Journal of Software, 2025, 36(3): 1152-1185.

\cite{31} Lebret R, Grangier D, Auli M. Neural Text Generation from Structured Data with Application to the Biography Domain[C]. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016: 1203-1213.

\cite{32} Lee K, Ippolito D, Nystrom A, et al. Deduplicating Training Data Makes Language Models Better[C]. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, 1: 8424-8445.

\cite{33} Rashkin H, Reitter D, Tomar G S, et al. Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, 1: 704-718.

\cite{34} Das B C, Amini M H, Wu Y. Security and privacy challenges of large language models: A survey[J]. ACM Computing Surveys, 2025, 57(6): 1-39.

\cite{35} Lin S, Hilton J, Evans O. TruthfulQA: Measuring How Models Mimic Human Falsehoods[C]. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022, 1: 3214-3252.

\cite{36} Kasai J, Sakaguchi K, Le Bras R, et al. Realtime qa: What's the answer right now?[J]. Advances in neural information processing systems, 2023, 36: 49025-49043.

\cite{37} Paullada A, Raji I D, Bender E M, et al. Data and its (dis)contents: A survey of dataset development and use in machine learning research[J]. Patterns, 2021, 2(11): 100227.

\cite{38} Gekhman Z, Yona G, Aharoni R, et al. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?[C]. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024: 12446-12467.

\cite{39} Bhattacharya P, Prasad V K, Verma A, et al. Demystifying ChatGPT: An in-depth survey of OpenAI's robust large language models[J]. Archives of Computational Methods in Engineering, 2024: 1-44.

\cite{40} Wang C, Sennrich R. On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3544-3552.

\cite{41} Zhang M, Press O, Merrill W, et al. How Language Model Hallucinations Can Snowball[C]. International Conference on Machine Learning, 2024: 59670-59684.

\cite{42} Yang Y, Chern E, Qiu X, et al. Alignment for honesty[J]. Advances in Neural Information Processing Systems, 2024, 37: 63565-63598.

\cite{43} Cotra, Ajeya. Why AI alignment could be hard with modern deep learning[EB/OL]. (2025-09-21)[2025-04-25]. Cold Takes. https://www.cold-takes.com/why-ai-alignment-could-be-hard-with-modern-deep-learning/.

\cite{44} Fan A, Lewis M, Dauphin Y. Hierarchical Neural Story Generation[C]. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, 1: 889-898.

\cite{45} Alves D, Guerreiro N, Alves J, et al. Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning[C]. Findings of the Association for Computational Linguistics: EMNLP 2023, 2023: 11127-11144.

\cite{46} Yang Z, Dai Z, Salakhutdinov R, et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model[C]. International Conference on Learning Representations, 2018: 1-18.

\cite{47} Yuan Y, Wang W, Guo Q, et al. Does chatgpt know that it does not know? evaluating the black-box calibration of chatgpt[C]. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024: 5191-5201.

\cite{48} Tihanyi N, Bisztray T, Ferrag M A, et al. How secure is AI-generated code: a large-scale comparison of large language models[J]. Empirical Software Engineering, 2025, 30(2): 1-42.

\cite{49} Quan Hui. Impact, Convergence, and Collaboration: Discussion on ChatGPT's Influence on the Media Industry[J]. China Radio & TV Academic Journal, 2023(09): 17-21.

\cite{50} Nicola J. AI hallucinations can't be stopped—but these techniques can limit their damage[J]. Nature. 2025, 637(8047): 778-780.

\cite{51} Katzenbach C, Pentzold C, Otero P V. Smoothing out smart tech's rough edges: Imperfect automation and the human fix[J]. Human-Machine Communication, 2024, 7: 1-23.

\cite{52} Guo Quanzhong, Su Liurunwei, Peng Zitao. Report on Large Model Applications in the Media Industry 2023-2024[J]. China Media Technology, 2025(1): 6-10.

\cite{53} Li Zitian. Instrumental Benefits and Systemic Risks: Journalists' Perceptions of AI News Technology[J]. Journalism University, 2022(11): 29-42+117.

\cite{54} Lee M. A mathematical investigation of hallucination and creativity in GPT models[J]. Mathematics, 2023, 11(10): 2242.

\cite{55} Huang L, Yu W, Ma W, et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2): 1-55.

Author Biographies:
Qi Xu (1982—), female, Associate Researcher and Master's Supervisor at the State Key Laboratory of Media Convergence and Communication, New Media Research Institute, Communication University of China. Research interests: intelligent communication, media convergence, digital humanities, and new media.
Sun Zhipu (2001—), male, Master's student. Research interests: intelligent media, human-machine communication, media convergence.

(Editor: Li Jing)

Submission history

[v1] 2025-07-09

Abstract

Full Text

Preamble

Abstract

1. Problem Statement: The "Hallucination" of the AI Agent Brain

2.1 Architecture Traceability: The Behavioral Decision-Making Mechanism of Large Model-Based AI Agents

2.2 Defining Hallucination: Model Endogeneity and Manifestations

2.3 Classification Overview: Analyzing Factual and Faithfulness Hallucinations

3. Risk Landscape: Hallucination Penetration in Downstream Applications

4. Mechanism Analysis: Triple Defects in Data, Training, and Inference

4.1 Data Processing Bias: Root Causes of Hallucination

4.2 Training Mechanism Defects: Inherent Limitations in Capability Acquisition

4.3 Inference Strategy Limitations: Dynamic Instability in Generation

5. Countermeasures: Cognitive-Technical-Process Synergy

5.1 Cognitive Elevation: Risk Awareness and Technical Literacy

5.2 Technical Correction: Enhanced Generation and Dynamic Fact Constraints

5.3 Process Safeguards: Human Patching and Human-Machine Collaboration

References

Submission history

Access Paper

Citation

Share

Related Papers

Feedback

The Hallucination Challenge in Large Model Agents: Causes, Risks, and Countermeasures (Postprint)