Knowledge Graphs
Knowledge Graphs
Publish Date | Title | Authors | Homepage | Code |
---|---|---|---|---|
2025-02-21 | Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs | Tingting Chen et.al. | 2502.15224v1 | null |
2025-02-21 | Scale-Free Graph-Language Models | Jianglin Lu et.al. | 2502.15189v1 | null |
2025-02-20 | A Socratic RAG Approach to Connect Natural Language Queries on Research Topics with Knowledge Organization Systems | Lew Lefton et.al. | 2502.15005v1 | null |
2025-02-20 | GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Jianwen Luo et.al. | 2502.14848v1 | null |
2025-02-20 | From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2502.14802v1 | link |
2025-02-20 | Plan-over-Graph: Towards Parallelable LLM Agent Schedule | Shiqi Zhang et.al. | 2502.14563v1 | link |
2025-02-20 | Narrative-Driven Travel Planning: Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization | Ran Ding et.al. | 2502.14456v1 | link |
2025-02-20 | Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection | Han Zhang et.al. | 2502.14932v1 | null |
2025-02-20 | Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment | Jiaxi Li et.al. | 2502.14275v1 | null |
2025-02-20 | Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering | Rongzhi Zhu et.al. | 2502.14245v1 | null |
2025-02-20 | NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM | Jiayin Lan et.al. | 2502.14192v1 | null |
2025-02-19 | Object-centric Binding in Contrastive Language-Image Pretraining | Rim Assouel et.al. | 2502.14113v1 | null |
2025-02-19 | Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning | Cole Gawin et.al. | 2502.14086v1 | null |
2025-02-19 | Neurosymbolic artificial intelligence via large language models and coherence-driven inference | Steve Huntsman et.al. | 2502.13953v1 | null |
2025-02-19 | Complex Ontology Matching with Large Language Model Embeddings | Guilherme Sousa et.al. | 2502.13619v1 | null |
2025-02-19 | Are Large Language Models In-Context Graph Learners? | Jintang Li et.al. | 2502.13562v1 | null |
2025-02-19 | Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs | Yushi Feng et.al. | 2502.13555v1 | link |
2025-02-19 | PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference | Burc Gokden et.al. | 2502.13502v1 | link |
2025-02-19 | Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction | Yanbang Sun et.al. | 2502.13412v1 | null |
2025-02-19 | Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval | Aditya Sharma et.al. | 2502.13369v1 | null |
2025-02-19 | Craw4LLM: Efficient Web Crawling for LLM Pretraining | Shi Yu et.al. | 2502.13347v1 | link |
2025-02-18 | K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction | Tassallah Abdullahi et.al. | 2502.13344v1 | link |
2025-02-18 | Grounding LLM Reasoning with Knowledge Graphs | Alfonso Amayuelas et.al. | 2502.13247v1 | null |
2025-02-18 | Learning to Defer for Causal Discovery with Imperfect Experts | Oscar Clivio et.al. | 2502.13132v1 | null |
2025-02-18 | Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | Markus J. Buehler et.al. | 2502.13025v1 | null |
2025-02-18 | Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge | Mohammad Reza Rezaei et.al. | 2502.13010v1 | null |
2025-02-18 | R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs | Sumin Jo et.al. | 2502.12767v1 | null |
2025-02-18 | PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths | Boyu Chen et.al. | 2502.14902v1 | null |
2025-02-18 | Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research | Xiang Liu et.al. | 2502.12669v1 | null |
2025-02-18 | G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation | Yuhan Li et.al. | 2502.12586v1 | link |
2025-02-17 | A-MEM: Agentic Memory for LLM Agents | Wujiang Xu et.al. | 2502.12110v1 | link |
2025-02-17 | KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs | Qi Zhao et.al. | 2502.12029v1 | null |
2025-02-17 | Atom of Thoughts for Markov LLM Test-Time Scaling | Fengwei Teng et.al. | 2502.12018v1 | null |
2025-02-17 | Generating Text from Uniform Meaning Representation | Emma Markle et.al. | 2502.11973v1 | link |
2025-02-17 | GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs | Yi Fang et.al. | 2502.11925v1 | null |
2025-02-17 | Exploring LLM-based Student Simulation for Metacognitive Cultivation | Haoxuan Li et.al. | 2502.11678v1 | null |
2025-02-17 | Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering | Runxuan Liu et.al. | 2502.11491v1 | null |
2025-02-17 | GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion | Kangyang Luo et.al. | 2502.11471v1 | null |
2025-02-16 | Large Language-Geometry Model: When LLM meets Equivariance | Zongzhao Li et.al. | 2502.11149v2 | null |
2025-02-16 | Beyond Pairwise: Global Zero-shot Temporal Graph Generation | Alon Eirew et.al. | 2502.11114v1 | null |
2025-02-16 | Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications | Alexandru Lecu et.al. | 2502.11108v1 | link |
2025-02-16 | Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data Selection | Yang Zhao et.al. | 2502.11062v1 | null |
2025-02-16 | CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models | Yuefei Chen et.al. | 2502.11008v1 | null |
2025-02-16 | RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation | Pengcheng Jiang et.al. | 2502.10996v1 | link |
2025-02-15 | Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia | Rohith Perumandla et.al. | 2502.10896v1 | null |
2025-02-15 | Evaluating improvements on using Large Language Models (LLMs) for property extraction in the Open Research Knowledge Graph (ORKG) | Sandra Schaftner et.al. | 2502.10768v1 | null |
2025-02-15 | K-Edit: Language Model Editing with Contextual Knowledge Awareness | Elan Markowitz et.al. | 2502.10626v1 | null |
2025-02-15 | ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis | Xueshen Li et.al. | 2502.10620v1 | null |
2025-02-14 | GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs | Shima Khoshraftar et.al. | 2502.10522v1 | null |
2025-02-14 | Do Large Language Models Reason Causally Like Us? Even Better? | Hanna M. Dettki et.al. | 2502.10215v1 | null |
2025-02-14 | Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages | Daniil Gurgurov et.al. | 2502.10140v1 | null |
2025-02-14 | Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models | Chenrui Tie et.al. | 2502.10090v1 | null |
2025-02-14 | Decision Information Meets Large Language Models: The Future of Explainable Operations Research | Yansen Zhang et.al. | 2502.09994v1 | null |
2025-02-14 | KGGen: Extracting Knowledge Graphs from Plain Text with Language Models | Belinda Mo et.al. | 2502.09956v1 | null |
2025-02-14 | ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation | Shu Wang et.al. | 2502.09891v1 | null |
2025-02-13 | Visual Graph Question Answering with ASP and LLMs for Language Parsing | Jakob Johannes Bauer et.al. | 2502.09211v1 | null |
2025-02-12 | Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data | Doudou Zhou et.al. | 2502.08547v1 | null |
2025-02-12 | Trustworthy GNNs with LLMs: A Systematic Review and Taxonomy | Ruizhan Xue et.al. | 2502.08353v1 | null |
2025-02-12 | Graph Foundation Models for Recommendation: A Comprehensive Survey | Bin Wu et.al. | 2502.08346v3 | null |
2025-02-12 | Self-Evaluation for Job-Shop Scheduling | Imanol Echeverria et.al. | 2502.08684v1 | null |
2025-02-12 | Improving Existing Optimization Algorithms with LLMs | Camilo Chacón Sartori et.al. | 2502.08298v1 | null |
2025-02-12 | LLM4GNAS: A Large Language Model Based Toolkit for Graph Neural Architecture Search | Yang Gao et.al. | 2502.10459v1 | null |
2025-02-12 | ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning | Vy Vo et.al. | 2502.08148v1 | null |
2025-02-12 | Neuro-Conceptual Artificial Intelligence: Integrating OPM with Deep Learning to Enhance Question Answering Quality | Xin Kang et.al. | 2502.09658v1 | null |
2025-02-12 | GCoT: Chain-of-Thought Prompt Learning for Graphs | Xingtong Yu et.al. | 2502.08092v1 | null |
2025-02-12 | Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach | Régnier Avice et.al. | 2502.10453v1 | link |
2025-02-11 | Deep Semantic Graph Learning via LLM based Node Enhancement | Chuanqi Shi et.al. | 2502.07982v1 | null |
2025-02-10 | Cardiverse: Harnessing LLMs for Novel Card Game Prototyping | Danrui Li et.al. | 2502.07128v1 | null |
2025-02-10 | GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units | Arghadip Das et.al. | 2502.06921v2 | link |
2025-02-10 | Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language | Zhiqiang Zhong et.al. | 2502.06634v1 | null |
2025-02-10 | KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment | Yuxing Lu et.al. | 2502.06472v1 | link |
2025-02-10 | RoToR: Towards More Reliable Responses for Order-Invariant Inputs | Soyoung Yoon et.al. | 2502.08662v1 | null |
2025-02-10 | K-ON: Stacking Knowledge On the Head Layer of Large Language Model | Lingbing Guo et.al. | 2502.06257v1 | null |
2025-02-10 | LegalViz: Legal Text Visualization by Text To Diagram Generation | Eri Onami et.al. | 2502.06147v2 | null |
2025-02-09 | Deconstructing Depression Stigma: Integrating AI-driven Data Collection and Analysis with Causal Knowledge Graphs | Han Meng et.al. | 2502.06075v1 | null |
2025-02-09 | LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification | Shubham Kumar Nigam et.al. | 2502.05836v1 | null |
2025-02-08 | LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning | Hanqing Yang et.al. | 2502.05453v1 | null |
2025-02-08 | SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation | Xingtong Yu et.al. | 2502.05424v1 | null |
2025-02-08 | Graph-based Molecular In-context Learning Grounded on Morgan Fingerprints | Ali Al-Lawati et.al. | 2502.05414v1 | null |
2025-02-08 | Knowledge Graph-Guided Retrieval Augmented Generation | Xiangrong Zhu et.al. | 2502.06864v1 | link |
2025-02-07 | Can Large Language Models Understand Intermediate Representations? | Hailong Jiang et.al. | 2502.06854v1 | null |
2025-02-07 | GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | Yang Zhou et.al. | 2502.05252v1 | link |
2025-02-07 | Causality can systematically address the monsters under the bench(marks) | Felix Leeb et.al. | 2502.05085v1 | null |
2025-02-07 | Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures | Tushar Pandey et.al. | 2502.05078v1 | link |
2025-02-07 | Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics | Hussam Ghanem et.al. | 2502.05239v1 | null |
2025-02-07 | Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research | Junde Wu et.al. | 2502.04644v1 | link |
2025-02-07 | Position-aware Automatic Circuit Discovery | Tal Haklay et.al. | 2502.04577v1 | link |
2025-02-06 | Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems | Shangbin Feng et.al. | 2502.04510v1 | null |
2025-02-06 | MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot | Xuejiao Zhao et.al. | 2502.04413v1 | link |
2025-02-06 | Ontology-Guided, Hybrid Prompt Learning for Generalization in Knowledge Graph Question Answering | Longquan Jiang et.al. | 2502.03992v1 | link |
2025-02-06 | Multimodal Medical Code Tokenizer | Xiaorui Su et.al. | 2502.04397v2 | null |
2025-02-06 | Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents | Chenyang Shao et.al. | 2502.04392v1 | null |
2025-02-06 | Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models | Rui Cai et.al. | 2502.03715v1 | null |
2025-02-05 | A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) | Yiye Chen et.al. | 2502.03450v1 | null |
2025-02-05 | SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs | Ben Liu et.al. | 2502.03283v2 | null |
2025-02-05 | Analyze Feature Flow to Enhance Interpretation and Steering in Language Models | Daniil Laptev et.al. | 2502.03032v2 | null |
2025-02-05 | A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs | Bradley P. Allen et.al. | 2502.02896v1 | null |
2025-02-05 | Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization | Chanhui Lee et.al. | 2502.02810v1 | null |
2025-02-05 | Leveraging the true depth of LLMs | Ramón Calvo González et.al. | 2502.02790v1 | null |
2025-02-04 | Modular Training of Neural Networks aids Interpretability | Satvik Golechha et.al. | 2502.02470v2 | null |
Abstracts
Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs
2502.15224v1 by Tingting Chen, Srinivas Anumasa, Beibei Lin, Vedant Shah, Anirudh Goyal, Dianbo Liu
Given the remarkable performance of Large Language Models (LLMs), an important question arises: Can LLMs conduct human-like scientific research and discover new knowledge, and act as an AI scientist? Scientific discovery is an iterative process that demands efficient knowledge updating and encoding. It involves understanding the environment, identifying new hypotheses, and reasoning about actions; however, no standardized benchmark specifically designed for scientific discovery exists for LLM agents. In response to these limitations, we introduce a novel benchmark, \textit{Auto-Bench}, that encompasses necessary aspects to evaluate LLMs for scientific discovery in both natural and social sciences. Our benchmark is based on the principles of causal graph discovery. It challenges models to uncover hidden structures and make optimal decisions, which includes generating valid justifications. By engaging interactively with an oracle, the models iteratively refine their understanding of underlying interactions, the chemistry and social interactions, through strategic interventions. We evaluate state-of-the-art LLMs, including GPT-4, Gemini, Qwen, Claude, and Llama, and observe a significant performance drop as the problem complexity increases, which suggests an important gap between machine and human intelligence that future development of LLMs need to take into consideration.
摘要:鉴于大型语言模型 (LLM) 的卓越性能,一个重要的问题出现了:LLM 能否进行类人科学研究并发现新知识,并充当人工智能科学家?科学发现是一个迭代过程,需要高效的知识更新和编码。它涉及理解环境、识别新假设和推理行为;然而,目前不存在专门为科学发现设计的标准基准,适用于 LLM 代理。为了应对这些限制,我们引入了一个新的基准,\textit{Auto-Bench},它包含了评估 LLM 在自然科学和社会科学中进行科学发现所需的方面。我们的基准基于因果图发现的原理。它挑战模型去发现隐藏的结构并做出最佳决策,其中包括生成有效的证明。通过与神谕交互,这些模型通过战略干预迭代地完善了它们对底层交互、化学和社会交互的理解。我们评估了最先进的 LLM,包括 GPT-4、Gemini、Qwen、Claude 和 Llama,并观察到随着问题复杂性的增加,性能大幅下降,这表明机器智能和人类智能之间存在一个重要的差距,未来 LLM 的发展需要考虑这一点。
Scale-Free Graph-Language Models
2502.15189v1 by Jianglin Lu, Yixuan Liu, Yitian Zhang, Yun Fu
Graph-language models (GLMs) have demonstrated great potential in graph-based semi-supervised learning. A typical GLM consists of two key stages: graph generation and text embedding, which are usually implemented by inferring a latent graph and finetuning a language model (LM), respectively. However, the former often relies on artificial assumptions about the underlying edge distribution, while the latter requires extensive data annotations. To tackle these challenges, this paper introduces a novel GLM that integrates graph generation and text embedding within a unified framework. Specifically, for graph generation, we leverage an inherent characteristic of real edge distribution--the scale-free property--as a structural prior. We unexpectedly find that this natural property can be effectively approximated by a simple k-nearest neighbor (KNN) graph. For text embedding, we develop a graph-based pseudo-labeler that utilizes scale-free graphs to provide complementary supervision for improved LM finetuning. Extensive experiments on representative datasets validate our findings on the scale-free structural approximation of KNN graphs and demonstrate the effectiveness of integrating graph generation and text embedding with a real structural prior. Our code is available at https://github.com/Jianglin954/SFGL.
摘要:圖語言模型 (GLM) 已在基於圖形的半監督學習中展現出極大的潛力。典型的 GLM 包含兩個關鍵階段:圖形生成和文字嵌入,它們通常分別透過推斷潛在圖形和微調語言模型 (LM) 來實作。然而,前者通常依賴於對底層邊緣分佈的人工假設,而後者需要大量的資料標註。為了應對這些挑戰,本文介紹了一種新的 GLM,它將圖形生成和文字嵌入整合在一個統一的架構中。具體來說,對於圖形生成,我們利用真實邊緣分佈的內在特性——無尺度屬性——作為結構先驗。我們意外地發現,這個自然屬性可以用一個簡單的 k-最近鄰 (KNN) 圖形來有效近似。對於文字嵌入,我們開發了一個基於圖形的偽標籤器,它利用無尺度圖形來提供互補監督,以改善 LM 微調。在代表性資料集上進行的大量實驗驗證了我們對 KNN 圖形的無尺度結構近似的發現,並證明了將圖形生成和文字嵌入與真實結構先驗整合的有效性。我們的程式碼可在 https://github.com/Jianglin954/SFGL 獲得。
A Socratic RAG Approach to Connect Natural Language Queries on Research Topics with Knowledge Organization Systems
2502.15005v1 by Lew Lefton, Kexin Rong, Chinar Dankhara, Lila Ghemri, Firdous Kausar, A. Hannibal Hamdallahi
In this paper, we propose a Retrieval Augmented Generation (RAG) agent that maps natural language queries about research topics to precise, machine-interpretable semantic entities. Our approach combines RAG with Socratic dialogue to align a user's intuitive understanding of research topics with established Knowledge Organization Systems (KOSs). The proposed approach will effectively bridge "little semantics" (domain-specific KOS structures) with "big semantics" (broad bibliometric repositories), making complex academic taxonomies more accessible. Such agents have the potential for broad use. We illustrate with a sample application called CollabNext, which is a person-centric knowledge graph connecting people, organizations, and research topics. We further describe how the application design has an intentional focus on HBCUs and emerging researchers to raise visibility of people historically rendered invisible in the current science system.
摘要:在本文中,我們提出了一種檢索增強生成 (RAG) 代理,它將關於研究主題的自然語言查詢對應到精確的、機器可解譯的語義實體。我們的做法結合了 RAG 與蘇格拉底式對話,以將使用者對研究主題的直覺理解與已建立的知識組織系統 (KOS) 對齊。所提出的方法將有效地將「小語義」(特定領域的 KOS 結構)與「大語義」(廣泛的書目資料庫)聯繫起來,使複雜的學術分類法更容易理解。此類代理有廣泛應用的潛力。我們以一個名為 CollabNext 的範例應用程式來說明,它是一個以人為中心的知識圖譜,連接人、組織和研究主題。我們進一步描述應用程式的設計如何有意地關注 HBCU 和新興研究人員,以提高在當前科學體系中歷來被視為隱形的人們的能見度。
GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks
2502.14848v1 by Jianwen Luo, Yiming Huang, Jinxiang Meng, Fangyu Lei, Shizhu He, Xiao Liu, Shanshan Jiang, Bin Dong, Jun Zhao, Kang Liu
Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenarios. We evaluate GATE on open-ended tasks (Minecraft), agent-based tasks (TextCraft, DABench), and code generation tasks (MATH, Date, TabMWP). Our results show that GATE achieves up to 4.3x faster milestone completion in Minecraft compared to the previous SOTA, and provides an average improvement of 9.23% over existing tool-making methods in code generation tasks and 10.03% in agent tasks. GATE demonstrates the power of adaptive evolution, balancing tool quantity, complexity, and functionality while maintaining high efficiency. Code and data are available at \url{https://github.com/ayanami2003/GATE}.
摘要:大型語言模型 (LLM) 在工具製作方面展現出極大的潛力,然而現有的框架經常難以有效地建構可靠的工具組,並且僅限於單一任務設定。為了應對這些挑戰,我們提出了 GATE(基於圖形的自適應工具演化),這是一個自適應框架,可跨多個場景動態建構和演化可重複使用的工具階層圖。我們在開放式任務(Minecraft)、基於代理的任務(TextCraft、DABench)和程式碼生成任務(MATH、Date、TabMWP)上評估了 GATE。我們的結果顯示,與先前的 SOTA 相比,GATE 在 Minecraft 中實現了高達 4.3 倍的里程碑完成速度,並且在程式碼生成任務中提供了比現有工具製作方法平均提升 9.23%,在代理任務中提升了 10.03%。GATE 展示了自適應演化的力量,在保持高效率的同時,平衡了工具數量、複雜性和功能性。程式碼和資料可在 \url{https://github.com/ayanami2003/GATE} 取得。
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
2502.14802v1 by Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, Yu Su
Our ability to continuously acquire, organize, and leverage knowledge is a key feature of human intelligence that AI systems must approximate to unlock their full potential. Given the challenges in continual learning with large language models (LLMs), retrieval-augmented generation (RAG) has become the dominant way to introduce new information. However, its reliance on vector retrieval hinders its ability to mimic the dynamic and interconnected nature of human long-term memory. Recent RAG approaches augment vector embeddings with various structures like knowledge graphs to address some of these gaps, namely sense-making and associativity. However, their performance on more basic factual memory tasks drops considerably below standard RAG. We address this unintended deterioration and propose HippoRAG 2, a framework that outperforms standard RAG comprehensively on factual, sense-making, and associative memory tasks. HippoRAG 2 builds upon the Personalized PageRank algorithm used in HippoRAG and enhances it with deeper passage integration and more effective online use of an LLM. This combination pushes this RAG system closer to the effectiveness of human long-term memory, achieving a 7% improvement in associative memory tasks over the state-of-the-art embedding model while also exhibiting superior factual knowledge and sense-making memory capabilities. This work paves the way for non-parametric continual learning for LLMs. Our code and data will be released at https://github.com/OSU-NLP-Group/HippoRAG.
摘要:我們持續獲取、組織和利用知識的能力是人類智慧的一項關鍵特徵,而人工智慧系統必須近似於此才能發揮其全部潛力。由於大型語言模型 (LLM) 持續學習的挑戰,檢索增強生成 (RAG) 已成為引入新資訊的主流方式。然而,它依賴向量檢索阻礙了它模擬人類長期記憶的動態和相互連結的本質。最近的 RAG 方法用各種結構(如知識圖譜)增強向量嵌入,以解決其中一些差距,即意義建構和聯想性。然而,它們在更基本的實際記憶任務上的表現遠低於標準 RAG。我們解決了這種意外的惡化,並提出了 HippoRAG 2,這是一個在實際、意義建構和聯想記憶任務上全面優於標準 RAG 的框架。HippoRAG 2 建立在 HippoRAG 中使用的 Personalized PageRank 演算法之上,並透過更深入的段落整合和更有效的 LLM 線上使用來增強它。這種組合將此 RAG 系統推向更接近人類長期記憶的效能,在聯想記憶任務上比最先進的嵌入模型提升了 7%,同時也展現出優異的實際知識和意義建構記憶能力。這項工作為 LLM 的非參數持續學習鋪平了道路。我們的程式碼和資料將在 https://github.com/OSU-NLP-Group/HippoRAG 上發布。
Plan-over-Graph: Towards Parallelable LLM Agent Schedule
2502.14563v1 by Shiqi Zhang, Xinbei Ma, Zouying Cao, Zhuosheng Zhang, Hai Zhao
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. However, challenges remain under-explored for parallel schedules. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution. To enhance the planning capability of complex, scalable graphs, we design an automated and controllable pipeline to generate synthetic graphs and propose a two-stage training scheme. Experimental results show that our plan-over-graph method significantly improves task performance on both API-based LLMs and trainable open-sourced LLMs. By normalizing complex tasks as graphs, our method naturally supports parallel execution, demonstrating global efficiency. The code and data are available at https://github.com/zsq259/Plan-over-Graph.
摘要:大型語言模型 (LLM) 已展現出在任務規劃推理方面的非凡能力。然而,對於並行時程表的挑戰仍未充分探討。本文介紹了一個新穎的範例,即圖形規劃,其中模型首先將現實生活中的文字任務分解為可執行的子任務,並建構一個抽象任務圖。然後,模型將此任務圖理解為輸入,並產生一個並行執行的計畫。為了增強複雜、可擴充圖形的規劃能力,我們設計了一個自動化且可控的管道來產生合成圖形,並提出了一個兩階段訓練方案。實驗結果表明,我們的圖形規劃方法顯著提升了基於 API 的 LLM 和可訓練的開源 LLM 的任務效能。透過將複雜任務標準化為圖形,我們的模型自然支援並行執行,展現出整體效率。程式碼和資料可在 https://github.com/zsq259/Plan-over-Graph 取得。
Narrative-Driven Travel Planning: Geoculturally-Grounded Script Generation with Evolutionary Itinerary Optimization
2502.14456v1 by Ran Ding, Ziyu Zhang, Ying Zhu, Ziqian Kong, Peilan Xu
To enhance tourists' experiences and immersion, this paper proposes a narrative-driven travel planning framework called NarrativeGuide, which generates a geoculturally-grounded narrative script for travelers, offering a novel, role-playing experience for their journey. In the initial stage, NarrativeGuide constructs a knowledge graph for attractions within a city, then configures the worldview, character setting, and exposition based on the knowledge graph. Using this foundation, the knowledge graph is combined to generate an independent scene unit for each attraction. During the itinerary planning stage, NarrativeGuide models narrative-driven travel planning as an optimization problem, utilizing a genetic algorithm (GA) to refine the itinerary. Before evaluating the candidate itinerary, transition scripts are generated for each pair of adjacent attractions, which, along with the scene units, form a complete script. The weighted sum of script coherence, travel time, and attraction scores is then used as the fitness value to update the candidate solution set. Experimental results across four cities, i.e., Nanjing and Yangzhou in China, Paris in France, and Berlin in Germany, demonstrate significant improvements in narrative coherence and cultural fit, alongside a notable reduction in travel time and an increase in the quality of visited attractions. Our study highlights that incorporating external evolutionary optimization effectively addresses the limitations of large language models in travel planning.Our codes are available at https://github.com/Evan01225/Narrative-Driven-Travel-Planning.
摘要:為了增強遊客的體驗和沉浸感,本文提出了一個名為 NarrativeGuide 的敘事驅動旅遊規劃框架,它會為旅客產生一個以地理文化為基礎的敘事腳本,為他們的旅程提供一個新穎的角色扮演體驗。在初始階段,NarrativeGuide 會為城市內的景點建立一個知識圖譜,然後根據知識圖譜配置世界觀、角色設定和說明。利用這個基礎,知識圖譜會與每個景點結合,為其產生一個獨立的場景單元。在行程規劃階段,NarrativeGuide 將敘事驅動的旅遊規劃建模為一個最佳化問題,利用遺傳演算法 (GA) 來優化行程。在評估候選行程之前,會為每對相鄰景點產生過場腳本,這些腳本會與場景單元一起形成一個完整的腳本。接著,將腳本連貫性、旅遊時間和景點分數的加權和用作適應值,以更新候選解集。在四個城市(即中國的南京和揚州、法國的巴黎和德國的柏林)進行的實驗結果顯示,敘事連貫性和文化契合度都有顯著的提升,同時旅遊時間大幅減少,且所參觀景點的品質也提升了。我們的研究強調,納入外部演化最佳化能有效解決大型語言模型在旅遊規劃中的限制。我們的程式碼可在 https://github.com/Evan01225/Narrative-Driven-Travel-Planning 取得。
Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection
2502.14932v1 by Han Zhang, Langshi Zhou, Hanfang Yang
Extensive research has investigated the integration of large language models (LLMs) with knowledge graphs to enhance the reasoning process. However, understanding how models perform reasoning utilizing structured graph knowledge remains underexplored. Most existing approaches rely on LLMs or retrievers to make binary judgments regarding the utilization of knowledge, which is too coarse. Meanwhile, there is still a lack of feedback mechanisms for reflection and correction throughout the entire reasoning path. This paper proposes an Active self-Reflection framework for knowledge Graph reasoning ARG, introducing for the first time an end-to-end training approach to achieve iterative reasoning grounded on structured graphs. Within the framework, the model leverages special tokens to \textit{actively} determine whether knowledge retrieval is necessary, performs \textit{reflective} critique based on the retrieved knowledge, and iteratively reasons over the knowledge graph. The reasoning paths generated by the model exhibit high interpretability, enabling deeper exploration of the model's understanding of structured knowledge. Ultimately, the proposed model achieves outstanding results compared to existing baselines in knowledge graph reasoning tasks.
摘要:大量研究調查了大語言模型 (LLM) 與知識圖譜的整合,以增強推理過程。然而,瞭解模型如何利用結構化圖譜知識進行推理仍未得到充分探討。現有的方法大多依賴於 LLM 或檢索器來對知識的利用做出二元判斷,這太過粗略。同時,整個推理路徑中仍缺乏用於反思和修正的回饋機制。本文提出了一個主動自省知識圖推理框架 ARG,首次引入端到端訓練方法,以實現基於結構化圖譜的迭代推理。在這個框架中,模型利用特殊標記來主動確定是否需要知識檢索,根據檢索到的知識進行反思性批判,並對知識圖譜進行迭代推理。模型生成的推理路徑具有很高的可解釋性,可以更深入地探索模型對結構化知識的理解。最終,與知識圖推理任務中現有的基準相比,所提出的模型取得了傑出的成果。
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment
2502.14275v1 by Jiaxi Li, Yiwei Wang, Kai Zhang, Yujun Cai, Bryan Hooi, Nanyun Peng, Kai-Wei Chang, Jin Lu
Large language models (LLMs) have been widely adopted in various downstream task domains. However, their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. Given the high-stakes nature of medical applications, where incorrect information can have critical consequences, it is essential to evaluate how well LLMs encode, retain, and recall fundamental medical facts. To bridge this gap, we introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge. MKJ is constructed from the Unified Medical Language System (UMLS), a large-scale repository of standardized biomedical vocabularies and knowledge graphs. We frame knowledge assessment as a binary judgment task, requiring LLMs to verify the correctness of medical statements extracted from reliable and structured knowledge sources. Our experiments reveal that LLMs struggle with factual medical knowledge retention, exhibiting significant performance variance across different semantic categories, particularly for rare medical conditions. Furthermore, LLMs show poor calibration, often being overconfident in incorrect answers. To mitigate these issues, we explore retrieval-augmented generation, demonstrating its effectiveness in improving factual accuracy and reducing uncertainty in medical decision-making.
摘要:大型語言模型 (LLM) 已廣泛應用於各種下游 任務領域。然而,它們直接回憶和應用事實 醫學知識的能力仍未得到充分探索。大多數現有的醫療問答基準 評估複雜推理或多跳躍推論,這使得難以將 LLM 內在的醫學知識從其推理能力中分離出來。鑑於 醫療應用具有高風險,其中不正確的資訊可能會 造成嚴重後果,因此評估 LLM 編碼、 保留和回憶基本醫學事實的能力至關重要。 為了彌合這一差距,我們引入了醫學知識判斷,這是一個專門設計用於測量 LLM 的一跳事實醫學知識的數據集。MKJ 是由統一醫學語言系統 (UMLS) 構建的,UMLS 是標準化生物醫學詞彙和知識圖譜的大型庫。我們 將知識評估構建為二元判斷任務,要求 LLM 驗證從可靠且結構化的 知識來源中提取的醫學陳述的正確性。 我們的實驗表明,LLM 難以保留事實醫學知識,在不同的 語義類別中表現出顯著的性能差異,特別是對於罕見的醫療狀況。此外, LLM 表現出校準不佳,通常對不正確的答案過於自信。為了 減輕這些問題,我們探索了檢索增強生成,證明了其在提高事實準確性和降低不確定性方面的有效性 在醫療決策制定中。
Mitigating Lost-in-Retrieval Problems in Retrieval Augmented Multi-Hop Question Answering
2502.14245v1 by Rongzhi Zhu, Xiangyu Liu, Zequn Sun, Yiwei Wang, Wei Hu
In this paper, we identify a critical problem, "lost-in-retrieval", in retrieval-augmented multi-hop question answering (QA): the key entities are missed in LLMs' sub-question decomposition. "Lost-in-retrieval" significantly degrades the retrieval performance, which disrupts the reasoning chain and leads to the incorrect answers. To resolve this problem, we propose a progressive retrieval and rewriting method, namely ChainRAG, which sequentially handles each sub-question by completing missing key entities and retrieving relevant sentences from a sentence graph for answer generation. Each step in our retrieval and rewriting process builds upon the previous one, creating a seamless chain that leads to accurate retrieval and answers. Finally, all retrieved sentences and sub-question answers are integrated to generate a comprehensive answer to the original question. We evaluate ChainRAG on three multi-hop QA datasets$\unicode{x2013}$MuSiQue, 2Wiki, and HotpotQA$\unicode{x2013}$using three large language models: GPT4o-mini, Qwen2.5-72B, and GLM-4-Plus. Empirical results demonstrate that ChainRAG consistently outperforms baselines in both effectiveness and efficiency.
摘要:
NLP-AKG: Few-Shot Construction of NLP Academic Knowledge Graph Based on LLM
2502.14192v1 by Jiayin Lan, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin
Large language models (LLMs) have been widely applied in question answering over scientific research papers. To enhance the professionalism and accuracy of responses, many studies employ external knowledge augmentation. However, existing structures of external knowledge in scientific literature often focus solely on either paper entities or domain concepts, neglecting the intrinsic connections between papers through shared domain concepts. This results in less comprehensive and specific answers when addressing questions that combine papers and concepts. To address this, we propose a novel knowledge graph framework that captures deep conceptual relations between academic papers, constructing a relational network via intra-paper semantic elements and inter-paper citation relations. Using a few-shot knowledge graph construction method based on LLM, we develop NLP-AKG, an academic knowledge graph for the NLP domain, by extracting 620,353 entities and 2,271,584 relations from 60,826 papers in ACL Anthology. Based on this, we propose a 'sub-graph community summary' method and validate its effectiveness on three NLP scientific literature question answering datasets.
摘要:大型语言模型 (LLM) 已广泛应用于科学研究论文的问答中。为了提高响应的专业性和准确性,许多研究采用外部知识增强。然而,科学文献中现有外部知识的结构通常仅关注论文实体或领域概念,而忽略了论文之间通过共享领域概念而形成的内在联系。这导致在解决结合论文和概念的问题时,答案不够全面和具体。为了解决这个问题,我们提出了一种新颖的知识图谱框架,该框架捕获了学术论文之间的深层概念关系,通过论文内部语义元素和论文之间的引用关系构建关系网络。我们使用基于 LLM 的少量知识图谱构建方法,从 ACL Anthology 中的 60,826 篇论文中提取了 620,353 个实体和 2,271,584 个关系,开发了 NLP 领域的学术知识图谱 NLP-AKG。在此基础上,我们提出了一种“子图社区摘要”方法,并在三个 NLP 科学文献问答数据集上验证了其有效性。
Object-centric Binding in Contrastive Language-Image Pretraining
2502.14113v1 by Rim Assouel, Pietro Astolfi, Florian Bordes, Michal Drozdzal, Adriana Romero-Soriano
Recent advances in vision language models (VLM) have been driven by contrastive models such as CLIP, which learn to associate visual information with their corresponding text descriptions. However, these models have limitations in understanding complex compositional scenes involving multiple objects and their spatial relationships. To address these challenges, we propose a novel approach that diverges from commonly used strategies, which rely on the design of hard-negative augmentations. Instead, our work focuses on integrating inductive biases into pre-trained CLIP-like models to improve their compositional understanding without using any additional hard-negatives. To that end, we introduce a binding module that connects a scene graph, derived from a text description, with a slot-structured image representation, facilitating a structured similarity assessment between the two modalities. We also leverage relationships as text-conditioned visual constraints, thereby capturing the intricate interactions between objects and their contextual relationships more effectively. Our resulting model not only enhances the performance of CLIP-based models in multi-object compositional understanding but also paves the way towards more accurate and sample-efficient image-text matching of complex scenes.
摘要:最近视觉语言模型 (VLM) 的进步是由对比模型(例如 CLIP)推动的,该模型学习将视觉信息与其对应的文本描述联系起来。然而,这些模型在理解涉及多个对象及其空间关系的复杂组合场景方面存在局限性。为了应对这些挑战,我们提出了一种新颖的方法,它偏离了常用的策略,即依赖于硬负增强设计。相反,我们的工作重点是将归纳偏差集成到预训练的类似 CLIP 的模型中,以提高其组合理解能力,而无需使用任何其他硬否定。为此,我们引入了一个绑定模块,它将从文本描述中派生的场景图与槽结构图像表示连接起来,从而促进了两种模式之间的结构化相似性评估。我们还利用关系作为文本条件的视觉约束,从而更有效地捕捉对象及其上下文关系之间的复杂交互。我们由此产生的模型不仅增强了基于 CLIP 的模型在多对象组合理解中的性能,而且还为复杂场景的更准确和样本高效的图像文本匹配铺平了道路。
Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning
2502.14086v1 by Cole Gawin, Yidan Sun, Mayank Kejriwal
Large language models (LLMs) have achieved remarkable performance in generating human-like text and solving reasoning tasks of moderate complexity, such as question-answering and mathematical problem-solving. However, their capabilities in tasks requiring deeper cognitive skills, such as common-sense understanding and abstract reasoning, remain under-explored. In this paper, we systematically evaluate abstract common-sense reasoning in LLMs using the ConceptNet knowledge graph. We propose two prompting approaches: instruct prompting, where models predict plausible semantic relationships based on provided definitions, and few-shot prompting, where models identify relations using examples as guidance. Our experiments with the gpt-4o-mini model show that in instruct prompting, consistent performance is obtained when ranking multiple relations but with substantial decline when the model is restricted to predicting only one relation. In few-shot prompting, the model's accuracy improves significantly when selecting from five relations rather than the full set, although with notable bias toward certain relations. These results suggest significant gaps still, even in commercially used LLMs' abstract common-sense reasoning abilities, compared to human-level understanding. However, the findings also highlight the promise of careful prompt engineering, based on selective retrieval, for obtaining better performance.
摘要:大型語言模型 (LLM) 在生成類人文本和解決中等複雜度推理任務方面取得了顯著的成果,例如問答和數學問題解決。然而,它們在需要更深層認知技能的任務中的能力,例如常識理解和抽象推理,仍然處於探索不足的階段。在本文中,我們使用 ConceptNet 知識圖系統地評估了 LLM 中的抽象常識推理。我們提出了兩種提示方法:指導提示,其中模型根據提供的定義預測合理的語義關係,以及少次提示,其中模型使用示例作為指導來識別關係。我們使用 gpt-4o-mini 模型進行的實驗表明,在指導提示中,在對多個關係進行排名時獲得了一致的性能,但在模型僅限於預測一個關係時大幅下降。在少次提示中,模型在從五個關係中選擇而不是從完整集合中選擇時,其準確性顯著提高,儘管對某些關係存在顯著偏差。這些結果表明,與人類層面的理解相比,即使在商業使用的 LLM 中,抽象常識推理能力仍然存在顯著差距。然而,這些發現也強調了基於選擇性檢索的仔細提示工程的希望,以獲得更好的性能。
Neurosymbolic artificial intelligence via large language models and coherence-driven inference
2502.13953v1 by Steve Huntsman, Jewell Thomas
We devise an algorithm to generate sets of propositions that objectively instantiate graphs that support coherence-driven inference. We then benchmark the ability of large language models (LLMs) to reconstruct coherence graphs from (a straightforward transformation of) propositions expressed in natural language, with promising results from a single prompt to models optimized for reasoning. Combining coherence-driven inference with consistency evaluations by neural models may advance the state of the art in machine cognition.
摘要:我們設計一種演算法,用來產生命題集合,以客觀地實例化支援連貫性驅動推論的圖形。接著,我們基準化大型語言模型 (LLM) 從以自然語言表達的命題(經過直接轉換)重建連貫性圖形的能力,結果顯示,單一提示就能從最佳化用於推理的模型中獲得有希望的結果。將連貫性驅動推論與神經模型的一致性評估結合起來,可能會提升機器認知的現有技術。
Complex Ontology Matching with Large Language Model Embeddings
2502.13619v1 by Guilherme Sousa, Rinaldo Lima, Cassia Trojahn
Ontology, and more broadly, Knowledge Graph Matching is a challenging task in which expressiveness has not been fully addressed. Despite the increasing use of embeddings and language models for this task, approaches for generating expressive correspondences still do not take full advantage of these models, in particular, large language models (LLMs). This paper proposes to integrate LLMs into an approach for generating expressive correspondences based on alignment need and ABox-based relation discovery. The generation of correspondences is performed by matching similar surroundings of instance sub-graphs. The integration of LLMs results in different architectural modifications, including label similarity, sub-graph matching, and entity matching. The performance word embeddings, sentence embeddings, and LLM-based embeddings, was compared. The results demonstrate that integrating LLMs surpasses all other models, enhancing the baseline version of the approach with a 45\% increase in F-measure.
摘要:本体论,更广泛地说,知识图谱匹配是一项具有挑战性的任务,其中表达力尚未得到充分解决。尽管越来越多地使用嵌入和语言模型来完成此任务,但生成表达性对应关系的方法仍然没有充分利用这些模型,特别是大型语言模型 (LLM)。本文提出将 LLM 集成到一种基于对齐需求和基于 ABox 的关系发现来生成表达性对应关系的方法中。对应关系的生成是通过匹配实例子图的相似周围环境来执行的。LLM 的集成导致了不同的架构修改,包括标签相似性、子图匹配和实体匹配。比较了单词嵌入、句子嵌入和基于 LLM 的嵌入的性能。结果表明,集成 LLM 超越了所有其他模型,通过 F-measure 提高了 45% 的基准版本的方法。
Are Large Language Models In-Context Graph Learners?
2502.13562v1 by Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Liang Chen, Zibin Zheng
Large language models (LLMs) have demonstrated remarkable in-context reasoning capabilities across a wide range of tasks, particularly with unstructured inputs such as language or images. However, LLMs struggle to handle structured data, such as graphs, due to their lack of understanding of non-Euclidean structures. As a result, without additional fine-tuning, their performance significantly lags behind that of graph neural networks (GNNs) in graph learning tasks. In this paper, we show that learning on graph data can be conceptualized as a retrieval-augmented generation (RAG) process, where specific instances (e.g., nodes or edges) act as queries, and the graph itself serves as the retrieved context. Building on this insight, we propose a series of RAG frameworks to enhance the in-context learning capabilities of LLMs for graph learning tasks. Comprehensive evaluations demonstrate that our proposed RAG frameworks significantly improve LLM performance on graph-based tasks, particularly in scenarios where a pretrained LLM must be used without modification or accessed via an API.
摘要:大型語言模型 (LLM) 在廣泛的任務中展示了非凡的語境推理能力,特別是對於語言或影像等非結構化輸入。然而,LLM 難以處理結構化資料,例如圖形,因為它們無法理解非歐幾何結構。因此,在沒有額外微調的情況下,它們在圖形學習任務中的表現遠遠落後於圖形神經網路 (GNN)。在本文中,我們展示了在圖形資料上學習可以被概念化為檢索增強生成 (RAG) 過程,其中特定實例(例如,節點或邊)充當查詢,而圖形本身則作為檢索的語境。基於這個見解,我們提出了一系列 RAG 架構,以增強 LLM 在圖形學習任務中的語境學習能力。全面的評估表明,我們提出的 RAG 架構顯著提升了 LLM 在基於圖形的任務上的表現,特別是在預訓練的 LLM 必須在不修改或透過 API 存取的情況下使用的場景中。
Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs
2502.13555v1 by Yushi Feng, Tsai Hor Chan, Guosheng Yin, Lequan Yu
Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as existing LLMs are mostly closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs -- DemoGraph. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.
摘要:由於圖表資料的稀少性和雜訊,資料擴充對於圖表表示學習來說是必要的。現有的擴充方法大多忽略了從資料集中繼承的背景資訊,因為它們僅依賴於圖表的結構進行擴充。儘管一些大型語言模型 (LLM) 基於圖表學習方法獲得成功,但它們大多是白盒,需要存取開放式 LLM 的權重或潛在特徵,由於現有的 LLM 主要基於商業考量而封閉原始碼,因此難以讓所有人都能使用。為了克服這些限制,我們提出了一個黑盒背景驅動圖表資料擴充方法,在 LLM 的指導下——DemoGraph。利用文字提示作為與背景相關的資訊,我們讓 LLM 產生知識圖譜 (KG),這讓我們能夠從文字輸出中擷取結構化互動。然後,我們設計了一個動態合併模式,在訓練期間將 LLM 產生的 KG 隨機整合到原始圖表中。為了控制擴充圖表的稀疏性,我們進一步設計了一個粒度感知提示策略和一個指令微調模組,它可以根據資料集的不同粒度層級無縫產生文字提示。在各種圖表學習任務上的大量實驗驗證了我們的方法比現有的圖表資料擴充方法更有效。值得注意的是,我們的做法在涉及電子健康記錄 (EHR) 的場景中表現出色,這驗證了它對上下文知識的最大利用,從而提高了預測效能和可解釋性。
PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference
2502.13502v1 by Burc Gokden
We show that Large Language Model from Power Law Decoder Representations (PLDR-LLM) is a foundational model whose deductive outputs are invariant tensors up to a small perturbation. PLDR-LLM learns a singularity condition for the deductive outputs that enable the once-inferred energy-curvature tensor $\mathbf{G}{LM}$ to replace the deep neural network of power law graph attention (PLGA) generating the deductive outputs at inference. We demonstrate that a cache for $\mathbf{G}$ (G-cache) and KV-cache can be implemented in a straightforward manner to improve the inference time. The invariance and generalizable nature of deductive outputs is at a very high fidelity where deductive outputs have same RMSE and determinant values up to 15 decimal places after caching, and zero-shot benchmark scores remain unchanged. Ablation studies show that learned deductive outputs have distinct loss and accuracy characteristics from models pretrained with transferred, randomly initialized or identity tensors as a constant tensor operator and an LLM with scaled-dot product attention (SDPA) is a special case of PLDR-LLM where $\mathbf{G}_{LM}$ is predefined as identity. The observed invariance characteristic introduces a novel asymmetry between training and inference phases with caching. We outline observed common characteristics of the deductive outputs for the learned singularity condition. We provide an implementation of a training and inference framework for PLDR-LLM with KV-cache and G-cache.
摘要:
Explore-Construct-Filter: An Automated Framework for Rich and Reliable API Knowledge Graph Construction
2502.13412v1 by Yanbang Sun, Qing Huang, Xiaoxue Ren, Zhenchang Xing, Xiaohong Li, Junjie Wang
The API Knowledge Graph (API KG) is a structured network that models API entities and their relations, providing essential semantic insights for tasks such as API recommendation, code generation, and API misuse detection. However, constructing a knowledge-rich and reliable API KG presents several challenges. Existing schema-based methods rely heavily on manual annotations to design KG schemas, leading to excessive manual overhead. On the other hand, schema-free methods, due to the lack of schema guidance, are prone to introducing noise, reducing the KG's reliability. To address these issues, we propose the Explore-Construct-Filter framework, an automated approach for API KG construction based on large language models (LLMs). This framework consists of three key modules: 1) KG exploration: LLMs simulate the workflow of annotators to automatically design a schema with comprehensive type triples, minimizing human intervention; 2) KG construction: Guided by the schema, LLMs extract instance triples to construct a rich yet unreliable API KG; 3) KG filtering: Removing invalid type triples and suspicious instance triples to construct a rich and reliable API KG. Experimental results demonstrate that our method surpasses the state-of-the-art method, achieving a 25.2% improvement in F1 score. Moreover, the Explore-Construct-Filter framework proves effective, with the KG exploration module increasing KG richness by 133.6% and the KG filtering module improving reliability by 26.6%. Finally, cross-model experiments confirm the generalizability of our framework.
摘要:API 知識圖譜 (API KG) 是一個結構化網路,用於建模 API 實體及其關係,提供基本語義見解,以執行 API 建議、程式碼產生和 API 誤用偵測等任務。然而,建構一個知識豐富且可靠的 API KG 會產生若干挑戰。現有的基於架構的方法嚴重依賴手動註解來設計 KG 架構,導致過度的手動開銷。另一方面,由於缺乏架構指導,無架構的方法容易引入雜訊,降低 KG 的可靠性。為了解決這些問題,我們提出了探索建構過濾架構,這是一種基於大型語言模型 (LLM) 的自動化 API KG 建構方法。此架構包含三個關鍵模組:1) KG 探索:LLM 模擬註解者的工作流程,自動設計具有完整類型三元組的架構,將人為干預降至最低;2) KG 建構:在架構的指導下,LLM 提取實例三元組來建構豐富但不可靠的 API KG;3) KG 過濾:移除無效的類型三元組和可疑的實例三元組,以建構豐富且可靠的 API KG。實驗結果表明,我們的方法優於最先進的方法,在 F1 分數上提高了 25.2%。此外,探索建構過濾架構被證明是有效的,其中 KG 探索模組將 KG 豐富度提高了 133.6%,而 KG 過濾模組將可靠性提高了 26.6%。最後,跨模型實驗證實了我們架構的泛化性。
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval
2502.13369v1 by Aditya Sharma, Luis Lara, Amal Zouaq, Christopher J. Pal
The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.
摘要:從自然語言問題中產生 SPARQL 查詢的能力對於確保從知識圖譜 (KG) 中有效率且準確地擷取結構化資料至關重要。儘管大型語言模型 (LLM) 已廣泛用於 SPARQL 查詢產生,但它們在根據內部參數化知識產生像統一資源識別碼 (URI) 等 KG 元素時,通常容易出現幻覺和分布外錯誤。這通常會導致內容看似合理,但事實上並不正確,對其在真實世界資訊檢索 (IR) 應用中的使用構成重大挑戰。這導致針對偵測和減輕此類錯誤的研究增加。在本文中,我們介紹 PGMR(後產生記憶體檢索),這是一個模組化架構,它結合了一個非參數記憶體模組來檢索 KG 元素並增強基於 LLM 的 SPARQL 查詢產生。我們的實驗結果表明,PGMR 在不同的資料集、資料分佈和 LLM 中始終提供強大的效能。值得注意的是,PGMR 大幅減輕了 URI 幻覺,在許多情況下幾乎消除了問題。
Craw4LLM: Efficient Web Crawling for LLM Pretraining
2502.13347v1 by Shi Yu, Zhiyuan Liu, Chenyan Xiong
Web crawl is a main source of large language models' (LLMs) pretraining data, but the majority of crawled web pages are discarded in pretraining due to low data quality. This paper presents Crawl4LLM, an efficient web crawling method that explores the web graph based on the preference of LLM pretraining. Specifically, it leverages the influence of a webpage in LLM pretraining as the priority score of the web crawler's scheduler, replacing the standard graph connectivity based priority. Our experiments on a web graph containing 900 million webpages from a commercial search engine's index demonstrate the efficiency of Crawl4LLM in obtaining high-quality pretraining data. With just 21% URLs crawled, LLMs pretrained on Crawl4LLM data reach the same downstream performances of previous crawls, significantly reducing the crawling waste and alleviating the burdens on websites. Our code is publicly available at https://github.com/cxcscmu/Crawl4LLM.
摘要:網路爬蟲是大型語言模型 (LLM) 預訓練資料的主要來源, 但大多數已爬取的網頁在預訓練中會因為資料品質低落而被捨棄。 本文提出 Crawl4LLM,這是一種有效率的網路爬取方法, 它會根據 LLM 預訓練的偏好來探索網路圖。 具體來說,它利用網頁在 LLM 預訓練中的影響力作為網路爬蟲排程器的優先分數, 取代標準的圖形連線優先順序。 我們在一個包含來自商業搜尋引擎索引的 9 億個網頁的網路圖上進行的實驗, 證明了 Crawl4LLM 在取得高品質預訓練資料方面的效率。 只爬取了 21% 的網址,以 Crawl4LLM 資料預訓練的 LLM 就達到了先前爬取的相同下游效能, 大幅減少了爬取浪費,並減輕了對網站的負擔。 我們的程式碼已公開於 https://github.com/cxcscmu/Crawl4LLM。
K-Paths: Reasoning over Graph Paths for Drug Repurposing and Drug Interaction Prediction
2502.13344v1 by Tassallah Abdullahi, Ioanna Gemou, Nihal V. Nayak, Ghulam Murtaza, Stephen H. Bach, Carsten Eickhoff, Ritambhara Singh
Drug discovery is a complex and time-intensive process that requires identifying and validating new therapeutic candidates. Computational approaches using large-scale biomedical knowledge graphs (KGs) offer a promising solution to accelerate this process. However, extracting meaningful insights from large-scale KGs remains challenging due to the complexity of graph traversal. Existing subgraph-based methods are tailored to graph neural networks (GNNs), making them incompatible with other models, such as large language models (LLMs). We introduce K-Paths, a retrieval framework that extracts structured, diverse, and biologically meaningful paths from KGs. Integrating these paths enables LLMs and GNNs to effectively predict unobserved drug-drug and drug-disease interactions. Unlike traditional path-ranking approaches, K-Paths retrieves and transforms paths into a structured format that LLMs can directly process, facilitating explainable reasoning. K-Paths employs a diversity-aware adaptation of Yen's algorithm to retrieve the K shortest loopless paths between entities in an interaction query, prioritizing biologically relevant and diverse relationships. Our experiments on benchmark datasets show that K-Paths improves the zero-shot performance of Llama 8.1B's F1-score by 12.45 points on drug repurposing and 13.42 points on interaction severity prediction. We also show that Llama 70B achieves F1-score gains of 6.18 and 8.46 points, respectively. K-Paths also improves the supervised training efficiency of EmerGNN, a state-of-the-art GNN, by reducing KG size by 90% while maintaining strong predictive performance. Beyond its scalability and efficiency, K-Paths uniquely bridges the gap between KGs and LLMs, providing explainable rationales for predicted interactions. These capabilities show that K-Paths is a valuable tool for efficient data-driven drug discovery.
摘要:
Grounding LLM Reasoning with Knowledge Graphs
2502.13247v1 by Alfonso Amayuelas, Joy Sain, Simerjot Kaur, Charese Smiley
Knowledge Graphs (KGs) are valuable tools for representing relationships between entities in a structured format. Traditionally, these knowledge bases are queried to extract specific information. However, question-answering (QA) over such KGs poses a challenge due to the intrinsic complexity of natural language compared to the structured format and the size of these graphs. Despite these challenges, the structured nature of KGs can provide a solid foundation for grounding the outputs of Large Language Models (LLMs), offering organizations increased reliability and control. Recent advancements in LLMs have introduced reasoning methods at inference time to improve their performance and maximize their capabilities. In this work, we propose integrating these reasoning strategies with KGs to anchor every step or "thought" of the reasoning chains in KG data. Specifically, we evaluate both agentic and automated search methods across several reasoning strategies, including Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT), using GRBench, a benchmark dataset for graph reasoning with domain-specific graphs. Our experiments demonstrate that this approach consistently outperforms baseline models, highlighting the benefits of grounding LLM reasoning processes in structured KG data.
摘要:知識圖譜 (KG) 是以結構化格式表示實體之間關係的寶貴工具。傳統上,這些知識庫會被查詢以萃取特定資訊。然而,由於自然語言與結構化格式之間的內在複雜性,以及這些圖譜的規模,在這些 KG 上進行問答 (QA) 會構成挑戰。儘管有這些挑戰,KG 的結構化特性可以為大型語言模型 (LLM) 的輸出提供穩固的基礎,為組織提供更高的可靠性和控制力。 LLM 的最新進展在推論時間引入了推理方法,以提升其效能並最大化其能力。在這項工作中,我們建議將這些推理策略與 KG 整合,以將推理鏈的每一步或「思考」錨定在 KG 資料中。具體來說,我們在多種推理策略中評估代理和自動化搜尋方法,包括思考鏈 (CoT)、思考樹 (ToT) 和思考圖 (GoT),使用 GRBench,這是一個針對圖形推理的基準資料集,其中包含特定領域的圖形。我們的實驗證明,這種方法始終優於基準模型,突顯了將 LLM 推理過程建立在結構化 KG 資料中的好處。
Learning to Defer for Causal Discovery with Imperfect Experts
2502.13132v1 by Oscar Clivio, Divyat Mahajan, Perouz Taslakian, Sara Magliacane, Ioannis Mitliagkas, Valentina Zantedeschi, Alexandre Drouin
Integrating expert knowledge, e.g. from large language models, into causal discovery algorithms can be challenging when the knowledge is not guaranteed to be correct. Expert recommendations may contradict data-driven results, and their reliability can vary significantly depending on the domain or specific query. Existing methods based on soft constraints or inconsistencies in predicted causal relationships fail to account for these variations in expertise. To remedy this, we propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. By adapting learning-to-defer (L2D) algorithms for pairwise causal discovery (CD), we learn a deferral function that selects whether to rely on classical causal discovery methods using numerical data or expert recommendations based on textual meta-data. We evaluate L2D-CD on the canonical T\"ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation. Moreover, our approach identifies domains where the expert's performance is strong or weak. Finally, we outline a strategy for generalizing this approach to causal discovery on graphs with more than two variables, paving the way for further research in this area.
摘要:整合专家知識,例如從大型語言模型中整合到因果發現演算法中,當知識無法保證正確時會很有挑戰性。專家建議可能會與資料驅動的結果相矛盾,而且他們的可靠性可能會根據領域或特定查詢而有顯著差異。現有的基於軟約束或預測因果關係中不一致的方法無法說明專業知識中的這些變化。為了補救這一點,我們提出了 L2D-CD,一種用於評估專家建議的正確性並將其與資料驅動的因果發現結果最佳結合的方法。透過調整學習延遲 (L2D) 演算法以進行成對因果發現 (CD),我們學習了一個延遲函數,用於選擇依賴使用數值資料的傳統因果發現方法或基於文字元資料的專家建議。我們在經典的 T\"ubingen 對資料集上評估 L2D-CD,並證明其與單獨使用的因果發現方法和專家相比具有優越的效能。此外,我們的做法識別出專家表現強或弱的領域。最後,我們概述了一種將此方法推廣到具有兩個以上變數的圖表上進行因果發現的策略,為此領域的進一步研究鋪平了道路。
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks
2502.13025v1 by Markus J. Buehler
We present an agentic, autonomous graph expansion framework that iteratively structures and refines knowledge in situ. Unlike conventional knowledge graph construction methods relying on static extraction or single-pass learning, our approach couples a reasoning-native large language model with a continually updated graph representation. At each step, the system actively generates new concepts and relationships, merges them into a global graph, and formulates subsequent prompts based on its evolving structure. Through this feedback-driven loop, the model organizes information into a scale-free network characterized by hub formation, stable modularity, and bridging nodes that link disparate knowledge clusters. Over hundreds of iterations, new nodes and edges continue to appear without saturating, while centrality measures and shortest path distributions evolve to yield increasingly distributed connectivity. Our analysis reveals emergent patterns, such as the rise of highly connected 'hub' concepts and the shifting influence of 'bridge' nodes, indicating that agentic, self-reinforcing graph construction can yield open-ended, coherent knowledge structures. Applied to materials design problems, we present compositional reasoning experiments by extracting node-specific and synergy-level principles to foster genuinely novel knowledge synthesis, yielding cross-domain ideas that transcend rote summarization and strengthen the framework's potential for open-ended scientific discovery. We discuss other applications in scientific discovery and outline future directions for enhancing scalability and interpretability.
摘要:
Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge
2502.13010v1 by Mohammad Reza Rezaei, Reza Saadati Fard, Jayson Parker, Rahul G. Krishnan, Milad Lankarany
Large Language Models (LLMs) have significantly advanced medical question-answering by leveraging extensive clinical data and medical literature. However, the rapid evolution of medical knowledge and the labor-intensive process of manually updating domain-specific resources pose challenges to the reliability of these systems. To address this, we introduce Adaptive Medical Graph-RAG (AMG-RAG), a comprehensive framework that automates the construction and continuous updating of medical knowledge graphs, integrates reasoning, and retrieves current external evidence, such as PubMed and WikiSearch. By dynamically linking new findings and complex medical concepts, AMG-RAG not only improves accuracy but also enhances interpretability in medical queries. Evaluations on the MEDQA and MEDMCQA benchmarks demonstrate the effectiveness of AMG-RAG, achieving an F1 score of 74.1 percent on MEDQA and an accuracy of 66.34 percent on MEDMCQA, outperforming both comparable models and those 10 to 100 times larger. Notably, these improvements are achieved without increasing computational overhead, highlighting the critical role of automated knowledge graph generation and external evidence retrieval in delivering up-to-date, trustworthy medical insights.
摘要:大型語言模型 (LLM) 透過利用廣泛的臨床資料和醫學文獻,大幅提升了醫療問題解答的進步。然而,醫療知識的快速演進和手動更新特定領域資源的繁複程序,對這些系統的可靠性構成挑戰。為了解決這個問題,我們引入了適應性醫療圖表 RAG (AMG-RAG),這是一個自動化建構和持續更新醫療知識圖表的綜合架構,整合推理並擷取 PubMed 和 WikiSearch 等最新的外部證據。透過動態連結新的發現和複雜的醫療概念,AMG-RAG 不僅提升了準確性,也增強了醫療查詢的可解釋性。在 MEDQA 和 MEDMCQA 基準上的評量證明了 AMG-RAG 的有效性,在 MEDQA 上達到了 74.1% 的 F1 分數,在 MEDMCQA 上達到了 66.34% 的準確度,優於其他同類模型以及那些大 10 到 100 倍的模型。值得注意的是,這些改進是在不增加運算負擔的情況下實現的,突顯了自動化知識圖表生成和外部證據擷取在提供最新、可信賴的醫療見解中扮演的重要角色。
R2-KG: General-Purpose Dual-Agent Framework for Reliable Reasoning on Knowledge Graphs
2502.12767v1 by Sumin Jo, Junseong Choi, Jiho Kim, Edward Choi
Recent studies have combined Large Language Models (LLMs) with Knowledge Graphs (KGs) to enhance reasoning, improving inference accuracy without additional training while mitigating hallucination. However, existing frameworks are often rigid, struggling to adapt to KG or task changes. They also rely heavily on powerful LLMs for reliable (i.e., trustworthy) reasoning. To address this, We introduce R2-KG, a plug-and-play, dual-agent framework that separates reasoning into two roles: an Operator (a low-capacity LLM) that gathers evidence and a Supervisor (a high-capacity LLM) that makes final judgments. This design is cost-efficient for LLM inference while still maintaining strong reasoning accuracy. Additionally, R2-KG employs an Abstention mechanism, generating answers only when sufficient evidence is collected from KG, which significantly enhances reliability. Experiments across multiple KG-based reasoning tasks show that R2-KG consistently outperforms baselines in both accuracy and reliability, regardless of the inherent capability of LLMs used as the Operator. Further experiments reveal that the single-agent version of R2-KG, equipped with a strict self-consistency strategy, achieves significantly higher-than-baseline reliability while reducing inference cost. However, it also leads to a higher abstention rate in complex KGs. Our findings establish R2-KG as a flexible and cost-effective solution for KG-based reasoning. It reduces reliance on high-capacity LLMs while ensuring trustworthy inference.
摘要:
PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths
2502.14902v1 by Boyu Chen, Zirui Guo, Zidan Yang, Yuluo Chen, Junze Chen, Zhenghao Liu, Chuan Shi, Cheng Yang
Retrieval-augmented generation (RAG) improves the response quality of large language models (LLMs) by retrieving knowledge from external databases. Typical RAG approaches split the text database into chunks, organizing them in a flat structure for efficient searches. To better capture the inherent dependencies and structured relationships across the text database, researchers propose to organize textual information into an indexing graph, known asgraph-based RAG. However, we argue that the limitation of current graph-based RAG methods lies in the redundancy of the retrieved information, rather than its insufficiency. Moreover, previous methods use a flat structure to organize retrieved information within the prompts, leading to suboptimal performance. To overcome these limitations, we propose PathRAG, which retrieves key relational paths from the indexing graph, and converts these paths into textual form for prompting LLMs. Specifically, PathRAG effectively reduces redundant information with flow-based pruning, while guiding LLMs to generate more logical and coherent responses with path-based prompting. Experimental results show that PathRAG consistently outperforms state-of-the-art baselines across six datasets and five evaluation dimensions. The code is available at the following link: https://github.com/BUPT-GAMMA/PathRAG
摘要:檢索增強生成(RAG)透過從外部資料庫中檢索知識來提升大型語言模型(LLM)的回應品質。典型的 RAG 方法會將文字資料庫分割成塊,並以扁平結構組織起來以利於有效率的搜尋。為了更有效地擷取文字資料庫中的內在相依關係和結構化關係,研究人員建議將文字資訊組織成索引圖,稱為基於圖形的 RAG。然而,我們認為目前基於圖形的 RAG 方法的限制在於檢索資訊的冗餘性,而非其不足。而且,先前的這些方法使用扁平結構來組織提示中的檢索資訊,導致次佳的效能。為了克服這些限制,我們提出 PathRAG,它會從索引圖中檢索關鍵的關係路徑,並將這些路徑轉換成文字形式以提示 LLM。具體來說,PathRAG 有效地減少了基於流的修剪中的冗餘資訊,同時引導 LLM 使用基於路徑的提示產生更具邏輯性和條理性的回應。實驗結果顯示,PathRAG 在六個資料集和五個評量面向中始終優於現有的基準。程式碼可在以下連結取得:https://github.com/BUPT-GAMMA/PathRAG
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research
2502.12669v1 by Xiang Liu, Penglei Sun, Shuyan Chen, Longhan Zhang, Peijie Dong, Huajie You, Yongqi Zhang, Chang Yan, Xiaowen Chu, Tong-yi Zhang
The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.
摘要:由於 perovskite 太陽能電池 (PSC) 快速進展,導致研究出版物呈指數成長,迫切需要在這領域建立有效的知識管理和推理系統。我們提出一個結合三項關鍵元件的 PSC 全面知識增強系統。首先,我們開發出 Perovskite-KG,一個由 1,517 篇研究論文建構而成、包含 23,789 個實體和 22,272 個關係的領域特定知識圖譜。其次,我們建立兩個互補的資料集:Perovskite-Chat,包含透過一個新穎的多代理架構產生 55,101 個高品質問答配對;以及 Perovskite-Reasoning,包含 2,217 個仔細策展的材料科學問題。第三,我們推出兩個專門化大型語言模型:針對領域特定知識協助的 Perovskite-Chat-LLM,以及針對科學推理任務的 Perovskite-Reasoning-LLM。實驗結果顯示,我們的系統在領域特定知識擷取和科學推理任務上都明顯優於現有模型,為研究人員提供有效的工具,用於 PSC 研究中的文獻回顧、實驗設計和複雜問題解決。
G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation
2502.12586v1 by Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, Jia Li
Explainable recommendation has demonstrated significant advantages in informing users about the logic behind recommendations, thereby increasing system transparency, effectiveness, and trustworthiness. To provide personalized and interpretable explanations, existing works often combine the generation capabilities of large language models (LLMs) with collaborative filtering (CF) information. CF information extracted from the user-item interaction graph captures the user behaviors and preferences, which is crucial for providing informative explanations. However, due to the complexity of graph structure, effectively extracting the CF information from graphs still remains a challenge. Moreover, existing methods often struggle with the integration of extracted CF information with LLMs due to its implicit representation and the modality gap between graph structures and natural language explanations. To address these challenges, we propose G-Refer, a framework using graph retrieval-augmented large language models (LLMs) for explainable recommendation. Specifically, we first employ a hybrid graph retrieval mechanism to retrieve explicit CF signals from both structural and semantic perspectives. The retrieved CF information is explicitly formulated as human-understandable text by the proposed graph translation and accounts for the explanations generated by LLMs. To bridge the modality gap, we introduce knowledge pruning and retrieval-augmented fine-tuning to enhance the ability of LLMs to process and utilize the retrieved CF information to generate explanations. Extensive experiments show that G-Refer achieves superior performance compared with existing methods in both explainability and stability. Codes and data are available at https://github.com/Yuhan1i/G-Refer.
摘要:可解釋建議已證明在告知使用者建議背後的邏輯方面具有顯著優點,從而提高系統透明度、有效性和可信度。為了提供個人化且可解釋的說明,現有作品通常結合大型語言模型 (LLM) 的生成能力與協同過濾 (CF) 資訊。從使用者項目互動圖形中提取的 CF 資訊會擷取使用者行為和偏好,這對於提供資訊性說明至關重要。然而,由於圖形結構的複雜性,從圖形中有效提取 CF 資訊仍然是一個挑戰。此外,現有方法通常難以將提取的 CF 資訊與 LLM 整合,因為其隱含表示和圖形結構與自然語言說明之間的模式差距。為了應對這些挑戰,我們提出 G-Refer,一個使用圖形檢索增強型大型語言模型 (LLM) 的可解釋建議架構。具體來說,我們首先採用混合圖形檢索機制,從結構和語義角度檢索明確的 CF 訊號。檢索到的 CF 資訊由建議的圖形翻譯明確表述為人類可以理解的文字,並說明 LLM 生成的解釋。為了彌合模式差距,我們引入了知識修剪和檢索增強微調,以增強 LLM 處理和利用檢索到的 CF 資訊以產生解釋的能力。廣泛的實驗表明,與現有方法相比,G-Refer 在可解釋性和穩定性方面都取得了卓越的效能。程式碼和資料可在 https://github.com/Yuhan1i/G-Refer 取得。
A-MEM: Agentic Memory for LLM Agents
2502.12110v1 by Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang
While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code is available at https://github.com/WujiangXu/AgenticMemory.
摘要:大型語言模型 (LLM) 代理雖然能有效地使用外部工具來執行複雜的真實世界任務,但它們需要記憶體系統來利用歷史經驗。目前的記憶體系統能進行基本的儲存和檢索,但缺乏精密的記憶體組織,儘管最近嘗試納入圖形資料庫。此外,這些系統固定的運作和結構限制了它們在不同任務中的適應性。為了解決這個限制,本文提出了一種新的代理記憶體系統,供 LLM 代理動態地以代理的方式組織記憶體。遵循 Zettelkasten 方法的基本原則,我們設計我們的記憶體系統,透過動態索引和連結來建立相互連結的知識網路。當加入新的記憶體時,我們會產生包含多個結構化屬性的綜合筆記,包括脈絡描述、關鍵字和標籤。然後,系統會分析歷史記憶體以找出相關連結,在有意義的相似性時建立連結。此外,這個程序能讓記憶體演化,因為當整合新的記憶體時,它們會觸發對現有歷史記憶體的脈絡表示和屬性的更新,讓記憶體網路能持續精進它的理解。我們的做法結合了 Zettelkasten 的結構化組織原則和代理驅動決策制定的靈活性,能進行更具適應性和脈絡感知的記憶體管理。在六個基礎模型上的經驗實驗顯示出比現有的 SOTA 基準線有顯著的進步。原始碼可以在 https://github.com/WujiangXu/AgenticMemory 找到。
KnowPath: Knowledge-enhanced Reasoning via LLM-generated Inference Paths over Knowledge Graphs
2502.12029v1 by Qi Zhao, Hongyu Yang, Qi Song, Xinwei Yao, Xiangyang Li
Large language models (LLMs) have demonstrated remarkable capabilities in various complex tasks, yet they still suffer from hallucinations. Introducing external knowledge, such as knowledge graph, can enhance the LLMs' ability to provide factual answers. LLMs have the ability to interactively explore knowledge graphs. However, most approaches have been affected by insufficient internal knowledge excavation in LLMs, limited generation of trustworthy knowledge reasoning paths, and a vague integration between internal and external knowledge. Therefore, we propose KnowPath, a knowledge-enhanced large model framework driven by the collaboration of internal and external knowledge. It relies on the internal knowledge of the LLM to guide the exploration of interpretable directed subgraphs in external knowledge graphs, better integrating the two knowledge sources for more accurate reasoning. Extensive experiments on multiple real-world datasets confirm the superiority of KnowPath.
摘要:大型語言模型 (LLM) 已在各種複雜任務中展現出卓越的能力,但仍會出現幻覺。引入外部知識(例如知識圖譜)可以增強 LLM 提供事實答案的能力。LLM 有能力互動式地探索知識圖譜。然而,大多數方法都受到 LLM 中內部知識挖掘不足、可信賴知識推理路徑生成受限,以及內部和外部知識之間的整合模糊的影響。因此,我們提出 KnowPath,這是一個由內部和外部知識的協作驅動的知識增強型大型模型框架。它依賴於 LLM 的內部知識來指導對外部知識圖譜中可解釋的有向子圖的探索,更好地整合兩個知識來源以進行更準確的推理。對多個真實世界資料集進行的大量實驗證實了 KnowPath 的優越性。
Atom of Thoughts for Markov LLM Test-Time Scaling
2502.12018v1 by Fengwei Teng, Zhaoyang Yu, Quan Shi, Jiayi Zhang, Chenglin Wu, Yuyu Luo
Large Language Models (LLMs) achieve superior performance through training-time scaling, and test-time scaling further enhances their capabilities by conducting effective reasoning during inference. However, as the scale of reasoning increases, existing test-time scaling methods suffer from accumulated historical information, which not only wastes computational resources but also interferes with effective reasoning. To address this issue, we observe that complex reasoning progress is often achieved by solving a sequence of independent subquestions, each being self-contained and verifiable. These subquestions are essentially atomic questions, relying primarily on their current state rather than accumulated history, similar to the memoryless transitions in a Markov process. Based on this observation, we propose Atom of Thoughts (AoT), where each state transition in the reasoning process consists of decomposing the current question into a dependency-based directed acyclic graph and contracting its subquestions, forming a new atomic question state. This iterative decomposition-contraction process continues until reaching directly solvable atomic questions, naturally realizing Markov transitions between question states. Furthermore, these atomic questions can be seamlessly integrated into existing test-time scaling methods, enabling AoT to serve as a plug-in enhancement for improving reasoning capabilities. Experiments across six benchmarks demonstrate the effectiveness of AoT both as a standalone framework and a plug-in enhancement. Notably, on HotpotQA, when applied to gpt-4o-mini, AoT achieves an 80.6% F1 score, surpassing o3-mini by 3.4% and DeepSeek-R1 by 10.6%. The code will be available at https://github.com/qixucen/atom.
摘要:大型語言模型 (LLM) 透過訓練時間擴充來達成卓越的效能,而測試時間擴充透過在推論期間進行有效的推理,進一步提升其能力。然而,隨著推理規模的擴大,現有的測試時間擴充方法會受到累積的歷史資訊影響,這不僅會浪費運算資源,還會干擾有效的推理。為了解決這個問題,我們觀察到複雜的推理進程通常是透過解決一系列獨立的子問題來達成,每個子問題都是獨立且可驗證的。這些子問題本質上是原子問題,主要依賴於它們的當前狀態,而不是累積的歷史,類似於馬可夫過程中的無記憶轉換。基於這個觀察,我們提出了思想原子 (AoT),其中推理過程中每個狀態轉換都包含將當前問題分解為基於依賴關係的有向無環圖,並收縮其子問題,形成新的原子問題狀態。這個反覆的分解收縮過程會持續進行,直到達到可直接解決的原子問題,自然地實現問題狀態之間的馬可夫轉換。此外,這些原子問題可以無縫整合到現有的測試時間擴充方法中,讓 AoT 可以作為外掛程式強化功能,以改善推理能力。橫跨六個基準的實驗證明了 AoT 作為獨立架構和外掛程式強化的有效性。值得注意的是,在 HotpotQA 上,當應用於 gpt-4o-mini 時,AoT 達到了 80.6% 的 F1 分數,比 o3-mini 高出 3.4%,比 DeepSeek-R1 高出 10.6%。程式碼將在 https://github.com/qixucen/atom 上提供。
Generating Text from Uniform Meaning Representation
2502.11973v1 by Emma Markle, Reihaneh Iranmanesh, Shira Wein
Uniform Meaning Representation (UMR) is a recently developed graph-based semantic representation, which expands on Abstract Meaning Representation (AMR) in a number of ways, in particular through the inclusion of document-level information and multilingual flexibility. In order to effectively adopt and leverage UMR for downstream tasks, efforts must be placed toward developing a UMR technological ecosystem. Though still limited amounts of UMR annotations have been produced to date, in this work, we investigate the first approaches to producing text from multilingual UMR graphs: (1) a pipeline conversion of UMR to AMR, then using AMR-to-text generation models, (2) fine-tuning large language models with UMR data, and (3) fine-tuning existing AMR-to-text generation models with UMR data. Our best performing model achieves a multilingual BERTscore of 0.825 for English and 0.882 for Chinese when compared to the reference, which is a promising indication of the effectiveness of fine-tuning approaches for UMR-to-text generation with even limited amounts of UMR data.
摘要:統一語意表示 (UMR) 是一種最近開發的基於圖形的語意表示,它在許多方面擴展了抽象語意表示 (AMR),特別是透過納入文件層級資訊和多語言靈活性。為了有效採用和利用下游任務的 UMR,必須投入精力開發 UMR 技術生態系統。雖然到目前為止產生的 UMR 標註數量仍然有限,但在這項工作中,我們探討了從多語言 UMR 圖形產生文字的第一種方法:(1) 將 UMR 轉換為 AMR 的管道,然後使用 AMR 轉文字生成模型,(2) 使用 UMR 資料微調大型語言模型,以及 (3) 使用 UMR 資料微調現有的 AMR 轉文字生成模型。與參考相比,我們效能最好的模型在英文中達到 0.825 的多語言 BERT 分數,在中文中達到 0.882,這表示使用 UMR 資料進行 UMR 轉文字生成的微調方法具有良好的效果,即使 UMR 資料數量有限。
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
2502.11925v1 by Yi Fang, Bowen Jin, Jiacheng Shen, Sirui Ding, Qiaoyu Tan, Jiawei Han
The rapid development of Multimodal Large Language Models (MLLMs) has enabled the integration of multiple modalities, including texts and images, within the large language model (LLM) framework. However, texts and images are usually interconnected, forming a multimodal attributed graph (MMAG). It is underexplored how MLLMs can incorporate the relational information (\textit{i.e.}, graph structure) and semantic information (\textit{i.e.,} texts and images) on such graphs for multimodal comprehension and generation. In this paper, we propose GraphGPT-o, which supports omni-multimodal understanding and creation on MMAGs. We first comprehensively study linearization variants to transform semantic and structural information as input for MLLMs. Then, we propose a hierarchical aligner that enables deep graph encoding, bridging the gap between MMAGs and MLLMs. Finally, we explore the inference choices, adapting MLLM to interleaved text and image generation in graph scenarios. Extensive experiments on three datasets from different domains demonstrate the effectiveness of our proposed method. Datasets and codes will be open-sourced upon acceptance.
摘要:多模态大语言模型 (MLLM) 的快速发展,促进了文本和图像等多种模态在大型语言模型 (LLM) 框架内的整合。然而,文本和图像通常是相互关联的,形成多模态属性图 (MMAG)。对于 MLLM 如何整合此类图上的关系信息(即图结构)和语义信息(即文本和图像)以进行多模态理解和生成,目前仍未得到充分探索。在本文中,我们提出了 GraphGPT-o,它支持在 MMAG 上进行全方位多模态理解和创建。我们首先全面研究了线性化变体,以将语义和结构信息转换为 MLLM 的输入。然后,我们提出了一个分层对齐器,它支持深度图编码,弥合了 MMAG 和 MLLM 之间的差距。最后,我们探索了推理选择,使 MLLM 适应图场景中交错的文本和图像生成。来自不同领域的三组数据集上的大量实验表明了我们提出的方法的有效性。数据集和代码将在被接受后开源。
Exploring LLM-based Student Simulation for Metacognitive Cultivation
2502.11678v1 by Haoxuan Li, Jifan Yu, Xin Cong, Yang Dang, Yisi Zhan, Huiqin Liu, Zhiyuan Liu
Metacognitive education plays a crucial role in cultivating students' self-regulation and reflective thinking, providing essential support for those with learning difficulties through academic advising. Simulating students with insufficient learning capabilities using large language models offers a promising approach to refining pedagogical methods without ethical concerns. However, existing simulations often fail to authentically represent students' learning struggles and face challenges in evaluation due to the lack of reliable metrics and ethical constraints in data collection. To address these issues, we propose a pipeline for automatically generating and filtering high-quality simulated student agents. Our approach leverages a two-round automated scoring system validated by human experts and employs a score propagation module to obtain more consistent scores across the student graph. Experimental results demonstrate that our pipeline efficiently identifies high-quality student agents, and we discuss the traits that influence the simulation's effectiveness. By simulating students with varying degrees of learning difficulties, our work paves the way for broader applications in personalized learning and educational assessment.
摘要:元認知教育在培養學生的自我調節和反思性思考中發揮著至關重要的作用,通過學術諮詢為有學習困難的人提供必要的支持。使用大型語言模型模擬學習能力不足的學生提供了一種有前途的方法,可以在沒有道德問題的情況下改進教學方法。然而,現有的模擬通常無法真實地反映學生的學習困難,並且由於缺乏可靠的指標和數據收集中的道德約束,在評估中面臨挑戰。為了解決這些問題,我們提出了一個自動生成和過濾高質量模擬學生代理的管道。我們的做法利用了由人類專家驗證的兩輪自動評分系統,並採用分數傳播模組來獲得跨學生圖表更一致的分數。實驗結果表明,我們的管道有效地識別了高質量的學生代理,並且我們討論了影響模擬效果的特質。通過模擬具有不同程度學習困難的學生,我們的研究為個性化學習和教育評估中的更廣泛應用鋪平了道路。
Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
2502.11491v1 by Runxuan Liu, Bei Luo, Jiaqi Li, Baoxin Wang, Ming Liu, Dayong Wu, Shijin Wang, Bing Qin
Large language models (LLMs) have shown remarkable capabilities in natural language processing. However, in knowledge graph question answering tasks (KGQA), there remains the issue of answering questions that require multi-hop reasoning. Existing methods rely on entity vector matching, but the purpose of the question is abstract and difficult to match with specific entities. As a result, it is difficult to establish reasoning paths to the purpose, which leads to information loss and redundancy. To address this issue, inspired by human reverse thinking, we propose Ontology-Guided Reverse Thinking (ORT), a novel framework that constructs reasoning paths from purposes back to conditions. ORT operates in three key phases: (1) using LLM to extract purpose labels and condition labels, (2) constructing label reasoning paths based on the KG ontology, and (3) using the label reasoning paths to guide knowledge retrieval. Experiments on the WebQSP and CWQ datasets show that ORT achieves state-of-the-art performance and significantly enhances the capability of LLMs for KGQA.
摘要:大型語言模型 (LLM) 在自然語言處理中展現出卓越的能力。然而,在知識圖譜問答任務 (KGQA) 中,仍然存在需要多跳推理才能回答問題的問題。現有方法依賴於實體向量匹配,但問題的目的是抽象的,難以與特定實體匹配。因此,很難建立推理路徑來達成目的,這會導致資訊遺失和冗餘。為了解決這個問題,在人類逆向思維的啟發下,我們提出了基於本体的逆向思維 (ORT),這是一個創新的架構,可以從目的建構推理路徑,再回推到條件。ORT 運作在三個關鍵階段:(1) 使用 LLM 萃取目的標籤和條件標籤,(2) 基於 KG 本体建構標籤推理路徑,以及 (3) 使用標籤推理路徑來引導知識擷取。在 WebQSP 和 CWQ 資料集上的實驗顯示,ORT 達到了最先進的效能,並顯著增強了 LLM 對 KGQA 的能力。
GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion
2502.11471v1 by Kangyang Luo, Yuzhuo Bai, Cheng Gao, Shuzheng Si, Yingli Shen, Zhu Liu, Zhitong Wang, Cunliang Kong, Wenhao Li, Yufei Huang, Ye Tian, Xuantang Xiong, Lei Han, Maosong Sun
Knowledge Graph Completion (KGC), which aims to infer missing or incomplete facts, is a crucial task for KGs. However, integrating the vital structural information of KGs into Large Language Models (LLMs) and outputting predictions deterministically remains challenging. To address this, we propose a new method called GLTW, which encodes the structural information of KGs and merges it with LLMs to enhance KGC performance. Specifically, we introduce an improved Graph Transformer (iGT) that effectively encodes subgraphs with both local and global structural information and inherits the characteristics of language model, bypassing training from scratch. Also, we develop a subgraph-based multi-classification training objective, using all entities within KG as classification objects, to boost learning efficiency.Importantly, we combine iGT with an LLM that takes KG language prompts as input.Our extensive experiments on various KG datasets show that GLTW achieves significant performance gains compared to SOTA baselines.
摘要:知識圖譜補全 (KGC) 旨在推論遺失或不完整的 事實,是 KGs 的一項關鍵任務。然而,將 KGs 的重要結構 資訊整合至大型語言模型 (LLM),並確定性地輸出預測結果,仍然是一項挑戰。為了解決這個問題,我們提出了一種新的方法,稱為 GLTW,它編碼了 KGs 的結構資訊,並將其與 LLM 合併,以增強 KGC 的效能。具體來說,我們引進了一個改良的圖形轉換器 (iGT),它能有效地編碼具有局部和全域結構資訊的子圖,並繼承語言模型的特徵,繞過從頭開始的訓練。此外,我們開發了一個基於子圖的多分類訓練目標,使用 KG 中的所有實體作為 分類物件,以提升學習效率。重要的是,我們將 iGT 與一個將 KG 語言提示作為輸入的 LLM 結合起來。我們在各種 KG 資料集上進行的廣泛實驗顯示,與 SOTA 基準線相比,GLTW 獲得了顯著的效能提升。
Large Language-Geometry Model: When LLM meets Equivariance
2502.11149v2 by Zongzhao Li, Jiacheng Cen, Bing Su, Wenbing Huang, Tingyang Xu, Yu Rong, Deli Zhao
Accurately predicting 3D structures and dynamics of physical systems is crucial in scientific applications. Existing approaches that rely on geometric Graph Neural Networks (GNNs) effectively enforce $\mathrm{E}(3)$-equivariance, but they often fall in leveraging extensive broader information. While direct application of Large Language Models (LLMs) can incorporate external knowledge, they lack the capability for spatial reasoning with guaranteed equivariance. In this paper, we propose EquiLLM, a novel framework for representing 3D physical systems that seamlessly integrates E(3)-equivariance with LLM capabilities. Specifically, EquiLLM comprises four key components: geometry-aware prompting, an equivariant encoder, an LLM, and an equivariant adaptor. Essentially, the LLM guided by the instructive prompt serves as a sophisticated invariant feature processor, while 3D directional information is exclusively handled by the equivariant encoder and adaptor modules. Experimental results demonstrate that EquiLLM delivers significant improvements over previous methods across molecular dynamics simulation, human motion simulation, and antibody design, highlighting its promising generalizability.
摘要:
Beyond Pairwise: Global Zero-shot Temporal Graph Generation
2502.11114v1 by Alon Eirew, Kfir Bar, Ido Dagan
Temporal relation extraction (TRE) is a fundamental task in natural language processing (NLP) that involves identifying the temporal relationships between events in a document. Despite the advances in large language models (LLMs), their application to TRE remains limited. Most existing approaches rely on pairwise classification, in which event pairs are considered individually, leading to computational inefficiency and a lack of global consistency in the resulting temporal graph. In this work, we propose a novel zero-shot method for TRE that generates a document's complete temporal graph at once, then applies transitive constraints optimization to refine predictions and enforce temporal consistency across relations. Additionally, we introduce OmniTemp, a new dataset with complete annotations for all pairs of targeted events within a document. Through experiments and analyses, we demonstrate that our method significantly outperforms existing zero-shot approaches while achieving competitive performance with supervised models.
摘要:時間關係抽取 (TRE) 是自然語言處理 (NLP) 中的一項基本任務,涉及識別文件中事件之間的時間關係。儘管大型語言模型 (LLM) 取得進展,但它們在 TRE 中的應用仍然有限。現有的大多數方法依賴於成對分類,其中事件對被單獨考慮,導致計算效率低下且在生成的時序圖中缺乏全局一致性。在這項工作中,我們提出了一種新穎的 TRE 零次學習方法,它可以一次生成文件的完整時序圖,然後應用遞移約束最佳化來優化預測並強制關係之間的時間一致性。此外,我們引入了 OmniTemp,這是一個新的數據集,其中包含文件內所有目標事件對的完整註解。通過實驗和分析,我們證明了我們的方法明顯優於現有的零次學習方法,同時實現了與監督模型相當的性能。
Knowledge Graph-Driven Retrieval-Augmented Generation: Integrating Deepseek-R1 with Weaviate for Advanced Chatbot Applications
2502.11108v1 by Alexandru Lecu, Adrian Groza, Lezan Hawizy
Large language models (LLMs) have significantly advanced the field of natural language generation. However, they frequently generate unverified outputs, which compromises their reliability in critical applications. In this study, we propose an innovative framework that combines structured biomedical knowledge with LLMs through a retrieval-augmented generation technique. Our system develops a thorough knowledge graph by identifying and refining causal relationships and named entities from medical abstracts related to age-related macular degeneration (AMD). Using a vector-based retrieval process and a locally deployed language model, our framework produces responses that are both contextually relevant and verifiable, with direct references to clinical evidence. Experimental results show that this method notably decreases hallucinations, enhances factual precision, and improves the clarity of generated responses, providing a robust solution for advanced biomedical chatbot applications.
摘要:大型語言模型 (LLM) 已大幅推動自然語言生成的領域。然而,它們經常產生未經驗證的輸出,這會損害它們在關鍵應用中的可靠性。在本研究中,我們提出了一個創新的框架,透過檢索增強生成技術,將結構化的生物醫學知識與 LLM 結合。我們的系統透過識別和精煉與年齡相關性黃斑部病變 (AMD) 相關的醫學摘要中的因果關係和命名實體,開發一個徹底的知識圖譜。我們的框架使用基於向量的檢索流程和本地部署的語言模型,產生在脈絡上相關且可驗證的回應,並直接參考臨床證據。實驗結果顯示,此方法顯著減少了幻覺、增強了事實準確性,並改善了生成回應的清晰度,為先進的生物醫學聊天機器人應用程式提供了穩健的解決方案。
Beyond Similarity: A Gradient-based Graph Method for Instruction Tuning Data Selection
2502.11062v1 by Yang Zhao, Li Du, Xiao Ding, Yangou Ouyang, Hepeng Wang, Kai Xiong, Jinglong Gao, Zhouhao Sun, Dongliang Xu, Yang Qing, Dongchen Li, Bing Qin, Ting Liu
Large language models (LLMs) have shown great potential across various industries due to their remarkable ability to generalize through instruction tuning. However, the limited availability of domain-specific data significantly hampers their performance on specialized tasks. While existing methods primarily focus on selecting training data from general datasets that are similar to the target domain, they often fail to consider the joint distribution of instructions, resulting in inefficient learning and suboptimal knowledge transfer. To address these challenges, we introduce G2IS (Gradient-based Graph Instruction Selection), a novel method that constructs a mixed gradient-based instruction graph to capture the joint distribution and interdependencies between instructions. By accounting for the relationships between instructions, G2IS improves domain adaptation efficiency. Additionally, we propose a gradient walk algorithm to refine the data selection process, enhancing both training effectiveness and efficiency. Our experiments demonstrate that G2IS outperforms traditional methods across various domain adaptation tasks, yielding significant performance gains, particularly in complex, data-scarce scenarios. These results underscore the potential of G2IS in advancing the development of large, domain-specific models.
摘要:大型語言模型 (LLM) 因其透過指令微調而具備的卓越泛化能力,在各產業中展現出極大的潛力。然而,特定領域資料的取得有限,大幅影響其在專業任務上的表現。現有方法主要專注於從與目標領域類似的通用資料集中選取訓練資料,但它們通常未能考量指令的聯合分佈,導致學習效率不彰且知識傳遞不佳。為了應對這些挑戰,我們引進 G2IS(基於梯度的圖形指令選取),這是一種創新的方法,可建構一個混合的基於梯度的指令圖形,以擷取指令之間的聯合分佈和相互依賴性。透過考量指令之間的關係,G2IS 提升了領域適應的效率。此外,我們提出了一種梯度漫步演算法來優化資料選取程序,同時提升訓練效能和效率。我們的實驗證明,G2IS 在各種領域適應任務中優於傳統方法,產生顯著的效能提升,特別是在資料稀少的複雜場景中。這些結果突顯了 G2IS 在推動大型特定領域模型發展方面的潛力。
CounterBench: A Benchmark for Counterfactuals Reasoning in Large Language Models
2502.11008v1 by Yuefei Chen, Vivek K. Singh, Jing Ma, Ruxiang Tang
Counterfactual reasoning is widely recognized as one of the most challenging and intricate aspects of causality in artificial intelligence. In this paper, we evaluate the performance of large language models (LLMs) in counterfactual reasoning. In contrast to previous studies that primarily focus on commonsense causal reasoning, where LLMs often rely on prior knowledge for inference, we specifically assess their ability to perform counterfactual inference using a set of formal rules. To support this evaluation, we introduce a new benchmark dataset, CounterBench, comprising 1K counterfactual reasoning questions. The dataset is designed with varying levels of difficulty, diverse causal graph structures, distinct types of counterfactual questions, and multiple nonsensical name variants. Our experiments demonstrate that counterfactual reasoning poses a significant challenge for LLMs, with most models performing at levels comparable to random guessing. To enhance LLM's counterfactual reasoning ability, we propose a novel reasoning paradigm, CoIn, which guides LLMs through iterative reasoning and backtracking to systematically explore counterfactual solutions. Experimental results show that our method significantly improves LLM performance on counterfactual reasoning tasks and consistently enhances performance across different LLMs.Our dataset is available at https://huggingface.co/datasets/CounterBench/CounterBench.
摘要:反事實推理被廣泛認為是人工智慧中因果關係最具挑戰性和複雜的面向之一。在本文中,我們評估大型語言模型 (LLM) 在反事實推理中的表現。與主要關注常識因果推理,其中 LLM 經常依賴先驗知識來進行推理的先前研究不同,我們特別評估它們使用一組形式規則執行反事實推理的能力。為了支持此評估,我們引入了一個新的基準資料集 CounterBench,其中包含 1K 個反事實推理問題。資料集的設計具有不同的難度等級、多樣化的因果圖結構、不同類型的反事實問題和多種無意義的名稱變體。我們的實驗表明,反事實推理對 LLM 構成重大挑戰,大多數模型的表現與隨機猜測相當。為了增強 LLM 的反事實推理能力,我們提出了一種新穎的推理範例 CoIn,它引導 LLM 透過反覆推理和回溯系統性地探索反事實解。實驗結果表明,我們的方法顯著提升 LLM 在反事實推理任務上的表現,並持續增強不同 LLM 的表現。我們的資料集可在 https://huggingface.co/datasets/CounterBench/CounterBench 取得。
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation
2502.10996v1 by Pengcheng Jiang, Lang Cao, Ruike Zhu, Minhao Jiang, Yunyi Zhang, Jimeng Sun, Jiawei Han
Retrieval-augmented language models often struggle with knowledge-intensive tasks due to inefficient retrieval, unstructured knowledge integration, and single-pass architectures. We present Retrieval-And-Structuring (RAS), a novel framework that dynamically constructs and reasons over query-specific knowledge graphs through iterative retrieval and structuring. RAS introduces four key technical innovations: (1) a themescoped retrieval mechanism that efficiently narrows the search space while maintaining retrieval quality, (2) an action planning module that determines knowledge needs and generates focused sub-queries, (3) a dynamic knowledge structuring approach that converts retrieved text into an evolving knowledge graph, and (4) a graph-augmented answering component that leverages the accumulated structured information. Our framework achieves state-of-the-art performance, surpassing leading baselines by 6.4% with open-source language models and 7.0% with proprietary models on seven knowledge-intensive generation datasets across all evaluation metrics. Detailed ablation studies verify the contribution of each technical component to the overall system performance.
摘要:检索增强语言模型通常会因检索效率低、知识整合无结构和单次通过架构而难以胜任知识密集型任务。我们提出检索和结构化 (RAS),这是一个新颖的框架,通过迭代检索和结构化,动态构建和推理特定于查询的知识图谱。RAS 引入了四项关键技术创新:(1) 主题范围检索机制,在保持检索质量的同时有效缩小搜索空间,(2) 动作规划模块,确定知识需求并生成重点子查询,(3) 动态知识结构化方法,将检索到的文本转换为不断发展的知识图谱,以及 (4) 图谱增强型回答组件,利用累积的结构化信息。我们的框架实现了最先进的性能,在七个知识密集型生成数据集上,使用开源语言模型提高了 6.4%,使用专有模型提高了 7.0%,超越了领先的基线,且所有评估指标均如此。详细的消融研究验证了每个技术组件对整体系统性能的贡献。
Developing Conversational Speech Systems for Robots to Detect Speech Biomarkers of Cognition in People Living with Dementia
2502.10896v1 by Rohith Perumandla, Young-Ho Bae, Diego Izaguirre, Esther Hwang, Andrew Murphy, Long-Jing Hsu, Selma Sabanovic, Casey C. Bennett
This study presents the development and testing of a conversational speech system designed for robots to detect speech biomarkers indicative of cognitive impairments in people living with dementia (PLwD). The system integrates a backend Python WebSocket server and a central core module with a large language model (LLM) fine-tuned for dementia to process user input and generate robotic conversation responses in real-time in less than 1.5 seconds. The frontend user interface, a Progressive Web App (PWA), displays information and biomarker score graphs on a smartphone in real-time to human users (PLwD, caregivers, clinicians). Six speech biomarkers based on the existing literature - Altered Grammar, Pragmatic Impairments, Anomia, Disrupted Turn-Taking, Slurred Pronunciation, and Prosody Changes - were developed for the robot conversation system using two datasets, one that included conversations of PLwD with a human clinician (DementiaBank dataset) and one that included conversations of PLwD with a robot (Indiana dataset). We also created a composite speech biomarker that combined all six individual biomarkers into a single score. The speech system's performance was first evaluated on the DementiaBank dataset showing moderate correlation with MMSE scores, with the composite biomarker score outperforming individual biomarkers. Analysis of the Indiana dataset revealed higher and more variable biomarker scores, suggesting potential differences due to study populations (e.g. severity of dementia) and the conversational scenario (human-robot conversations are different from human-human). The findings underscore the need for further research on the impact of conversational scenarios on speech biomarkers and the potential clinical applications of robotic speech systems.
摘要:本研究展示了對話式語音系統的開發和測試,該系統專為機器人設計,用於偵測失智症患者(PLwD)認知障礙的語言生物標記。該系統整合了後端 Python WebSocket 伺服器和一個中央核心模組,其中包含針對失智症微調的大語言模型(LLM),以處理使用者輸入並在不到 1.5 秒的時間內產生機器人對話回應。前端使用者介面(漸進式網路應用程式,PWA)會在智慧型手機上即時向人類使用者(PLwD、照護者、臨床醫生)顯示資訊和生物標記評分圖表。根據現有文獻,針對機器人對話系統開發了六個語言生物標記:語法改變、實用障礙、失語症、輪流中斷、發音不清和韻律變化,使用了兩個資料集,一個包含 PLwD 與人類臨床醫生對話(DementiaBank 資料集),另一個包含 PLwD 與機器人對話(Indiana 資料集)。我們還建立了一個複合語言生物標記,將所有六個個別生物標記組合成一個單一評分。語言系統的效能首先在 DementiaBank 資料集上進行評估,顯示與 MMSE 評分有中等相關性,複合生物標記評分優於個別生物標記。對 Indiana 資料集的分析顯示出較高且變異性較大的生物標記評分,這表明由於研究族群(例如失智症的嚴重程度)和對話情境(人機對話與人際對話不同)而產生潛在差異。研究結果強調需要進一步研究對話情境對語言生物標記的影響,以及機器人語言系統的潛在臨床應用。
Evaluating improvements on using Large Language Models (LLMs) for property extraction in the Open Research Knowledge Graph (ORKG)
2502.10768v1 by Sandra Schaftner
Current research highlights the great potential of Large Language Models (LLMs) for constructing Scholarly Knowledge Graphs (SKGs). One particularly complex step in this process is relation extraction, aimed at identifying suitable properties to describe the content of research. This study builds directly on previous research of three Open Research Knowledge Graph (ORKG) team members who assessed the readiness of LLMs such as GPT-3.5, Llama 2, and Mistral for property extraction in scientific literature. Given the moderate performance observed, the previous work concluded that fine-tuning is needed to improve these models' alignment with scientific tasks and their emulation of human expertise. Expanding on this prior experiment, this study evaluates the impact of advanced prompt engineering techniques and demonstrates that these techniques can highly significantly enhance the results. Additionally, this study extends the property extraction process to include property matching to existing ORKG properties, which are retrieved via the API. The evaluation reveals that results generated through advanced prompt engineering achieve a higher proportion of matches with ORKG properties, further emphasizing the enhanced alignment achieved. Moreover, this lays the groundwork for addressing challenges such as the inconsistency of ORKG properties, an issue highlighted in prior studies. By assigning unique URIs and using standardized terminology, this work increases the consistency of the properties, fulfilling a crucial aspect of Linked Data and FAIR principles - core commitments of ORKG. This, in turn, significantly enhances the applicability of ORKG content for subsequent tasks such as comparisons of research publications. Finally, the study concludes with recommendations for future improvements in the overall property extraction process.
摘要:
K-Edit: Language Model Editing with Contextual Knowledge Awareness
2502.10626v1 by Elan Markowitz, Anil Ramakrishna, Ninareh Mehrabi, Charith Peris, Rahul Gupta, Kai-Wei Chang, Aram Galstyan
As the world changes, we need to be able to update our models and correct false information without costly retraining. Knowledge-based model editing enables precise modifications to the weights of large language models in order to modify the information encoded within. Recent approaches have seen success in enabling recall of edited information for thousands of edits at once. However, these approaches fail to produce edits that account for associated contextual information. We present K-Edit, an effective approach to generating contextually consistent knowledge edits. By using knowledge graphs, which maintain contextual consistency when an edge is edited, we are able to generate additional \textit{contextual edits} that ensure consistency of related information in the language model. Our experiments demonstrate significant improvements in multi-hop question answering while maintaining the general effectiveness and scalability of model edits.
摘要:隨著世界變化,我們需要能夠更新我們的模型,並在不進行昂貴的重新訓練的情況下更正錯誤資訊。基於知識的模型編輯能夠對大型語言模型的權重進行精確修改,以便修改其中編碼的資訊。最近的方法在一次啟用數千次編輯的編輯資訊的召回方面取得了成功。然而,這些方法無法產生考慮相關上下文資訊的編輯。我們提出 K-Edit,這是一種產生上下文一致的知識編輯的有效方法。通過使用知識圖,在編輯邊緣時保持上下文一致性,我們能夠產生額外的「上下文編輯」,以確保語言模型中相關資訊的一致性。我們的實驗證明了多跳問題回答的顯著改進,同時保持了模型編輯的一般有效性和可擴充性。
ProMRVL-CAD: Proactive Dialogue System with Multi-Round Vision-Language Interactions for Computer-Aided Diagnosis
2502.10620v1 by Xueshen Li, Xinlong Hou, Ziyi Huang, Yu Gan
Recent advancements in large language models (LLMs) have demonstrated extraordinary comprehension capabilities with remarkable breakthroughs on various vision-language tasks. However, the application of LLMs in generating reliable medical diagnostic reports remains in the early stages. Currently, medical LLMs typically feature a passive interaction model where doctors respond to patient queries with little or no involvement in analyzing medical images. In contrast, some ChatBots simply respond to predefined queries based on visual inputs, lacking interactive dialogue or consideration of medical history. As such, there is a gap between LLM-generated patient-ChatBot interactions and those occurring in actual patient-doctor consultations. To bridge this gap, we develop an LLM-based dialogue system, namely proactive multi-round vision-language interactions for computer-aided diagnosis (ProMRVL-CAD), to generate patient-friendly disease diagnostic reports. The proposed ProMRVL-CAD system allows proactive dialogue to provide patients with constant and reliable medical access via an integration of knowledge graph into a recommendation system. Specifically, we devise two generators: a Proactive Question Generator (Pro-Q Gen) to generate proactive questions that guide the diagnostic procedure and a Multi-Vision Patient-Text Diagnostic Report Generator (MVP-DR Gen) to produce high-quality diagnostic reports. Evaluating two real-world publicly available datasets, MIMIC-CXR and IU-Xray, our model has better quality in generating medical reports. We further demonstrate the performance of ProMRVL achieves robust under the scenarios with low image quality. Moreover, we have created a synthetic medical dialogue dataset that simulates proactive diagnostic interactions between patients and doctors, serving as a valuable resource for training LLM.
摘要:
GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs
2502.10522v1 by Shima Khoshraftar, Niaz Abedini, Amir Hajian
The application of large language models (LLMs) to graph data has attracted a lot of attention recently. LLMs allow us to use deep contextual embeddings from pretrained models in text-attributed graphs, where shallow embeddings are often used for the text attributes of nodes. However, it is still challenging to efficiently encode the graph structure and features into a sequential form for use by LLMs. In addition, the performance of an LLM alone, is highly dependent on the structure of the input prompt, which limits their effectiveness as a reliable approach and often requires iterative manual adjustments that could be slow, tedious and difficult to replicate programmatically. In this paper, we propose GraphiT (Graphs in Text), a framework for encoding graphs into a textual format and optimizing LLM prompts for graph prediction tasks. Here we focus on node classification for text-attributed graphs. We encode the graph data for every node and its neighborhood into a concise text to enable LLMs to better utilize the information in the graph. We then further programmatically optimize the LLM prompts using the DSPy framework to automate this step and make it more efficient and reproducible. GraphiT outperforms our LLM-based baselines on three datasets and we show how the optimization step in GraphiT leads to measurably better results without manual prompt tweaking. We also demonstrated that our graph encoding approach is competitive to other graph encoding methods while being less expensive because it uses significantly less tokens for the same task.
摘要:
Do Large Language Models Reason Causally Like Us? Even Better?
2502.10215v1 by Hanna M. Dettki, Brenden M. Lake, Charley M. Wu, Bob Rehder
Causal reasoning is a core component of intelligence. Large language models (LLMs) have shown impressive capabilities in generating human-like text, raising questions about whether their responses reflect true understanding or statistical patterns. We compared causal reasoning in humans and four LLMs using tasks based on collider graphs, rating the likelihood of a query variable occurring given evidence from other variables. We find that LLMs reason causally along a spectrum from human-like to normative inference, with alignment shifting based on model, context, and task. Overall, GPT-4o and Claude showed the most normative behavior, including "explaining away", whereas Gemini-Pro and GPT-3.5 did not. Although all agents deviated from the expected independence of causes - Claude the least - they exhibited strong associative reasoning and predictive inference when assessing the likelihood of the effect given its causes. These findings underscore the need to assess AI biases as they increasingly assist human decision-making.
摘要:因果推理是智能的核心組成部分。大型語言模型 (LLM) 在生成類人文本方面展現了令人印象深刻的能力,引發了關於它們的回應是否反映真實理解或統計模式的疑問。我們使用基於碰撞圖的任務比較了人類和四個 LLM 中的因果推理,根據其他變數的證據評估查詢變數發生的可能性。我們發現 LLM 沿著從類人到規範推論的光譜進行因果推理,對齊會根據模型、上下文和任務而改變。總體而言,GPT-4o 和 Claude 表現出最規範的行為,包括「解釋」,而 Gemini-Pro 和 GPT-3.5 則沒有。儘管所有代理都偏離了預期的原因獨立性 - Claude 最不偏離 - 但它們在評估給定原因的效果可能性時表現出強烈的關聯推理和預測推論。這些發現強調了評估 AI 偏差的必要性,因為它們越來越協助人類決策。
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages
2502.10140v1 by Daniil Gurgurov, Ivan Vykopal, Josef van Genabith, Simon Ostermann
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data. While current state-of-the-art large language models (LLMs) still struggle with LRLs, smaller multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes. This study systematically investigates parameter-efficient adapter-based methods for adapting mLMs to LRLs, evaluating three architectures: Sequential Bottleneck, Invertible Bottleneck, and Low-Rank Adaptation. Using unstructured text from GlotCC and structured knowledge from ConceptNet, we show that small adaptation datasets (e.g., up to 1 GB of free-text or a few MB of knowledge graph data) yield gains in intrinsic (masked language modeling) and extrinsic tasks (topic classification, sentiment analysis, and named entity recognition). We find that Sequential Bottleneck adapters excel in language modeling, while Invertible Bottleneck adapters slightly outperform other methods on downstream tasks due to better embedding alignment and larger parameter counts. Adapter-based methods match or outperform full fine-tuning while using far fewer parameters, and smaller mLMs prove more effective for LRLs than massive LLMs like LLaMA-3, GPT-4, and DeepSeek-R1-based distilled models. While adaptation improves performance, pre-training data size remains the dominant factor, especially for languages with extensive pre-training coverage.
摘要:低資源語言 (LRL) 由於資料有限,在自然語言處理 (NLP) 中面臨重大挑戰。雖然當前最先進的大型語言模型 (LLM) 仍難以處理 LRL,但較小的多語言模型 (mLMS),例如 mBERT 和 XLM-R,由於其容量更適合低訓練資料大小,因此提供了更大的希望。本研究系統性地探討了基於參數效率適配器的適配方法,以將 mLMS 適配到 LRL,評估了三種架構:順序瓶頸、可逆瓶頸和低秩適配。使用來自 GlotCC 的非結構化文本和來自 ConceptNet 的結構化知識,我們表明小型適配資料集(例如,高達 1 GB 的自由文本或幾 MB 的知識圖譜資料)在內在(遮蔽語言模型)和外在任務(主題分類、情緒分析和命名實體識別)中產生增益。我們發現順序瓶頸適配器在語言模型中表現出色,而可逆瓶頸適配器由於更好的嵌入對齊和更大的參數數量,在下游任務上略勝於其他方法。基於適配器的方法在使用更少參數的同時,可以匹配或優於完全微調,而較小的 mLM 被證明比 LLaMA-3、GPT-4 和基於 DeepSeek-R1 的蒸餾模型等大型 LLM 更適合 LRL。雖然適配可以提高效能,但預訓練資料大小仍然是主要因素,特別是對於預訓練覆蓋範圍廣泛的語言。
Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models
2502.10090v1 by Chenrui Tie, Shengxiang Sun, Jinxuan Zhu, Yiwei Liu, Jingxiang Guo, Yue Hu, Haonan Chen, Junting Chen, Ruihai Wu, Lin Shao
Humans possess an extraordinary ability to understand and execute complex manipulation tasks by interpreting abstract instruction manuals. For robots, however, this capability remains a substantial challenge, as they cannot interpret abstract instructions and translate them into executable actions. In this paper, we present Manual2Skill, a novel framework that enables robots to perform complex assembly tasks guided by high-level manual instructions. Our approach leverages a Vision-Language Model (VLM) to extract structured information from instructional images and then uses this information to construct hierarchical assembly graphs. These graphs represent parts, subassemblies, and the relationships between them. To facilitate task execution, a pose estimation model predicts the relative 6D poses of components at each assembly step. At the same time, a motion planning module generates actionable sequences for real-world robotic implementation. We demonstrate the effectiveness of Manual2Skill by successfully assembling several real-world IKEA furniture items. This application highlights its ability to manage long-horizon manipulation tasks with both efficiency and precision, significantly enhancing the practicality of robot learning from instruction manuals. This work marks a step forward in advancing robotic systems capable of understanding and executing complex manipulation tasks in a manner akin to human capabilities.
摘要:人類擁有理解並執行複雜操作任務的非凡能力,方法是詮釋抽象的說明手冊。然而,對機器人來說,這項能力仍然是一項重大的挑戰,因為它們無法詮釋抽象的指令並將其轉換為可執行的動作。在本文中,我們提出了 Manual2Skill,這是一個新穎的框架,使機器人能夠在高階手冊說明的指導下執行複雜的組裝任務。我們的做法利用視覺語言模型 (VLM) 從教學圖片中提取結構化資訊,然後使用此資訊來建構階層式組裝圖。這些圖表示零件、子組件以及它們之間的關係。為了促進任務執行,姿勢估計模型會預測每個組裝步驟中組件的相對 6D 姿勢。同時,動作規劃模組會產生適用於實際機器人實作的可操作順序。我們透過成功組裝幾個真實世界的 IKEA 家具來展示 Manual2Skill 的有效性。此應用程式突顯了它以高效率和高精準度管理長時程操作任務的能力,大幅提升機器人從說明手冊中學習的實用性。這項工作標誌著機器人系統在理解和執行複雜操作任務方面向前邁進了一步,其方式類似於人類的能力。
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
2502.09994v1 by Yansen Zhang, Qingcan Kang, Wing Yin Yu, Hailei Gong, Xiaojin Fu, Xiongwei Han, Tao Zhong, Chen Ma
Operations Research (OR) is vital for decision-making in many industries. While recent OR methods have seen significant improvements in automation and efficiency through integrating Large Language Models (LLMs), they still struggle to produce meaningful explanations. This lack of clarity raises concerns about transparency and trustworthiness in OR applications. To address these challenges, we propose a comprehensive framework, Explainable Operations Research (EOR), emphasizing actionable and understandable explanations accompanying optimization. The core of EOR is the concept of Decision Information, which emerges from what-if analysis and focuses on evaluating the impact of complex constraints (or parameters) changes on decision-making. Specifically, we utilize bipartite graphs to quantify the changes in the OR model and adopt LLMs to improve the explanation capabilities. Additionally, we introduce the first industrial benchmark to rigorously evaluate the effectiveness of explanations and analyses in OR, establishing a new standard for transparency and clarity in the field.
摘要:作業研究 (OR) 對許多產業的決策制定至關重要。雖然近期的 OR 方法已透過整合大型語言模型 (LLM) 在自動化和效率方面取得顯著的進步,但它們在產生有意義的解釋方面仍面臨挑戰。這種缺乏明確性的情況會對 OR 應用中的透明度和可信度造成疑慮。為了應對這些挑戰,我們提出一個全面的架構,即可解釋作業研究 (EOR),強調在最佳化過程中提供可操作且易於理解的解釋。EOR 的核心是決策資訊的概念,它源自假設分析,並專注於評估複雜約束條件 (或參數) 變更對決策制定的影響。具體來說,我們利用二部圖量化 OR 模型的變化,並採用 LLM 來改善解釋能力。此外,我們引入了第一個產業基準,以嚴格評估 OR 中解釋和分析的有效性,為該領域的透明度和清晰度建立新的標準。
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models
2502.09956v1 by Belinda Mo, Kyssen Yu, Joshua Kazdan, Proud Mpala, Lisa Yu, Chris Cundy, Charilaos Kanatsoulis, Sanmi Koyejo
Recent interest in building foundation models for KGs has highlighted a fundamental challenge: knowledge-graph data is relatively scarce. The best-known KGs are primarily human-labeled, created by pattern-matching, or extracted using early NLP techniques. While human-generated KGs are in short supply, automatically extracted KGs are of questionable quality. We present a solution to this data scarcity problem in the form of a text-to-KG generator (KGGen), a package that uses language models to create high-quality graphs from plaintext. Unlike other KG extractors, KGGen clusters related entities to reduce sparsity in extracted KGs. KGGen is available as a Python library (\texttt{pip install kg-gen}), making it accessible to everyone. Along with KGGen, we release the first benchmark, Measure of of Information in Nodes and Edges (MINE), that tests an extractor's ability to produce a useful KG from plain text. We benchmark our new tool against existing extractors and demonstrate far superior performance.
摘要:最近对于构建知识图谱基础模型的兴趣凸显了一个基本挑战:知识图谱数据相对稀缺。最知名的知识图谱主要为人标注,由模式匹配创建,或使用早期自然语言处理技术提取。虽然人生成的知识图谱供不应求,但自动提取的知识图谱质量堪忧。我们以文本到知识图谱生成器 (KGGen) 的形式为这一数据稀缺问题提供了一个解决方案,这是一个使用语言模型从纯文本创建高质量图表的包。与其他知识图谱提取器不同,KGGen 对相关实体进行聚类以减少提取的知识图谱中的稀疏性。KGGen 可用作 Python 库(\texttt{pip install kg-gen}),使其所有人都能访问。除了 KGGen,我们还发布了第一个基准测试,即节点和边信息度量 (MINE),它测试了提取器从纯文本生成有用知识图谱的能力。我们针对现有提取器对我们的新工具进行基准测试,并展示了远超其性能。
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation
2502.09891v1 by Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, Yuchi Ma
Retrieval-Augmented Generation (RAG) has proven effective in integrating external knowledge into large language models (LLMs) for question-answer (QA) tasks. The state-of-the-art RAG approaches often use the graph data as the external data since they capture the rich semantic information and link relationships between entities. However, existing graph-based RAG approaches cannot accurately identify the relevant information from the graph and also consume large numbers of tokens in the online retrieval process. To address these issues, we introduce a novel graph-based RAG approach, called Attributed Community-based Hierarchical RAG (ArchRAG), by augmenting the question using attributed communities, and also introducing a novel LLM-based hierarchical clustering method. To retrieve the most relevant information from the graph for the question, we build a novel hierarchical index structure for the attributed communities and develop an effective online retrieval method. Experimental results demonstrate that ArchRAG outperforms existing methods in terms of both accuracy and token cost.
摘要:檢索增強生成 (RAG) 已證明可將外部知識整合到大型語言模型 (LLM),用於問答 (QA) 任務。最先進的 RAG 方法通常使用圖形資料作為外部資料,因為它們擷取了豐富的語意資訊和實體之間的連結關係。然而,現有的基於圖形的 RAG 方法無法準確識別圖形中的相關資訊,而且在線上檢索過程中也會消耗大量的符號。為了解決這些問題,我們提出了一種新穎的基於圖形的 RAG 方法,稱為基於屬性社群的分層 RAG (ArchRAG),透過使用屬性社群來擴充問題,並引入一種新穎的基於 LLM 的分層聚類方法。為了從圖形中檢索與問題最相關的資訊,我們為屬性社群建立了一個新穎的分層索引結構,並開發了一種有效的線上檢索方法。實驗結果證明,ArchRAG 在準確性和符號成本方面都優於現有方法。
Visual Graph Question Answering with ASP and LLMs for Language Parsing
2502.09211v1 by Jakob Johannes Bauer, Thomas Eiter, Nelson Higuera Ruiz, Johannes Oetsch
Visual Question Answering (VQA) is a challenging problem that requires to process multimodal input. Answer-Set Programming (ASP) has shown great potential in this regard to add interpretability and explainability to modular VQA architectures. In this work, we address the problem of how to integrate ASP with modules for vision and natural language processing to solve a new and demanding VQA variant that is concerned with images of graphs (not graphs in symbolic form). Images containing graph-based structures are an ubiquitous and popular form of visualisation. Here, we deal with the particular problem of graphs inspired by transit networks, and we introduce a novel dataset that amends an existing one by adding images of graphs that resemble metro lines. Our modular neuro-symbolic approach combines optical graph recognition for graph parsing, a pretrained optical character recognition neural network for parsing labels, Large Language Models (LLMs) for language processing, and ASP for reasoning. This method serves as a first baseline and achieves an overall average accuracy of 73% on the dataset. Our evaluation provides further evidence of the potential of modular neuro-symbolic systems, in particular with pretrained models that do not involve any further training and logic programming for reasoning, to solve complex VQA tasks.
摘要:視覺問答(VQA)是一項具有挑戰性的問題,需要處理多模態輸入。答案集程式設計(ASP)在這方面顯示出巨大的潛力,可以為模組化 VQA 架構增加可解釋性和說明性。在這項工作中,我們探討如何將 ASP 與視覺和自然語言處理模組整合,以解決一個新的且要求嚴格的 VQA 變體,該變體與圖形影像(而非符號形式的圖形)有關。包含圖形結構的影像是一種普遍且流行的可視化形式。在這裡,我們處理受交通網路啟發的圖形特定問題,並引入一個新的資料集,透過新增類似地鐵路線的圖形影像來修正現有資料集。我們的模組化神經符號方法結合光學圖形辨識進行圖形解析、預先訓練的光學字元辨識神經網路進行標籤解析、大型語言模型(LLM)進行語言處理,以及 ASP 進行推理。此方法作為第一個基準,在資料集上達到 73% 的整體平均準確度。我們的評估進一步證明了模組化神經符號系統的潛力,特別是預先訓練的模型,這些模型不涉及任何進一步的訓練和邏輯程式設計進行推理,以解決複雜的 VQA 任務。
Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data
2502.08547v1 by Doudou Zhou, Han Tong, Linshanshan Wang, Suqi Liu, Xin Xiong, Ziming Gan, Romain Griffier, Boris Hejblum, Yun-Chung Liu, Chuan Hong, Clara-Lea Bonzel, Tianrun Cai, Kevin Pan, Yuk-Lam Ho, Lauren Costa, Vidul A. Panickan, J. Michael Gaziano, Kenneth Mandl, Vianney Jouhet, Rodolphe Thiebaut, Zongqi Xia, Kelly Cho, Katherine Liao, Tianxi Cai
The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of including multi-institutional patient-level data required to study similarities and differences across patient subgroups. To address these challenges, we developed the GAME algorithm. Tested and validated across 7 institutions and 2 languages, GAME integrates data in several levels: (1) at the institutional level with knowledge graphs to establish relationships between codes and existing knowledge sources, providing the medical context for standard codes and their relationship to each other; (2) between institutions, leveraging language models to determine the relationships between institution-specific codes with established standard codes; and (3) quantifying the strength of the relationships between codes using a graph attention network. Jointly trained embeddings are created using transfer and federated learning to preserve data privacy. In this study, we demonstrate the applicability of GAME in selecting relevant features as inputs for AI-driven algorithms in a range of conditions, e.g., heart failure, rheumatoid arthritis. We then highlight the application of GAME harmonized multi-institutional EHR data in a study of Alzheimer's disease outcomes and suicide risk among patients with mental health disorders, without sharing patient-level data outside individual institutions.
摘要:電子健康紀錄的採用擴大了在臨床照護和研究中利用資料驅動演算法的機會。在有效進行多機構電子健康紀錄研究時,一個主要的瓶頸是系統間資料異質性,其中有許多代碼在機構間不存在或表示不同的臨床概念。資料隱私的需求進一步限制了納入多機構患者層級資料的可行性,而這些資料對於研究患者亞群之間的相似性和差異性是必要的。為了應對這些挑戰,我們開發了 GAME 演算法。GAME 已在 7 個機構和 2 種語言中進行測試和驗證,它整合了多個層級的資料:(1) 在機構層級,使用知識圖表來建立代碼和現有知識來源之間的關係,為標準代碼及其彼此之間的關係提供醫療背景;(2) 在機構之間,利用語言模型來確定機構特定代碼與已建立的標準代碼之間的關係;(3) 使用圖形注意網路量化代碼之間關係的強度。使用遷移和聯合學習建立聯合訓練的嵌入,以保護資料隱私。在本研究中,我們展示了 GAME 在選擇相關特徵作為 AI 驅動演算法輸入時的適用性,適用於各種情況,例如心臟衰竭、類風濕性關節炎。然後,我們重點介紹了 GAME 和諧化多機構電子健康紀錄資料在阿茲海默症疾病結果和精神疾病患者自殺風險研究中的應用,而無需在個別機構之外共享患者層級資料。
Trustworthy GNNs with LLMs: A Systematic Review and Taxonomy
2502.08353v1 by Ruizhan Xue, Huimin Deng, Fang He, Maojun Wang, Zeyu Zhang
With the extensive application of Graph Neural Networks (GNNs) across various domains, their trustworthiness has emerged as a focal point of research. Some existing studies have shown that the integration of large language models (LLMs) can improve the semantic understanding and generation capabilities of GNNs, which in turn improves the trustworthiness of GNNs from various aspects. Our review introduces a taxonomy that offers researchers a clear framework for comprehending the principles and applications of different methods and helps clarify the connections and differences among various approaches. Then we systematically survey representative approaches along the four categories of our taxonomy. Through our taxonomy, researchers can understand the applicable scenarios, potential advantages, and limitations of each approach for the the trusted integration of GNNs with LLMs. Finally, we present some promising directions of work and future trends for the integration of LLMs and GNNs to improve model trustworthiness.
摘要:隨著圖神經網路 (GNN) 在各種領域的廣泛應用,其可信度已成為研究的焦點。一些現有研究表明,整合大型語言模型 (LLM) 可以提升 GNN 的語意理解和生成能力,進而從各方面提升 GNN 的可信度。我們的評論介紹了一種分類法,為研究人員提供了一個清晰的架構,用於理解不同方法的原理和應用,並有助於釐清各種方法之間的關聯和差異。然後,我們系統性地針對分類法的四個類別進行代表性方法的調查。研究人員透過我們的分類法,可以了解每種方法在 GNN 與 LLM 的可信整合中適用的場景、潛在優點和限制。最後,我們提出 LLM 與 GNN 整合的一些有前景的工作方向和未來趨勢,以提升模型的可信度。
Graph Foundation Models for Recommendation: A Comprehensive Survey
2502.08346v3 by Bin Wu, Yihang Wang, Yuanhao Zeng, Jiawei Liu, Jiashu Zhao, Cheng Yang, Yawen Li, Long Xia, Dawei Yin, Chuan Shi
Recommender systems (RS) serve as a fundamental tool for navigating the vast expanse of online information, with deep learning advancements playing an increasingly important role in improving ranking accuracy. Among these, graph neural networks (GNNs) excel at extracting higher-order structural information, while large language models (LLMs) are designed to process and comprehend natural language, making both approaches highly effective and widely adopted. Recent research has focused on graph foundation models (GFMs), which integrate the strengths of GNNs and LLMs to model complex RS problems more efficiently by leveraging the graph-based structure of user-item relationships alongside textual understanding. In this survey, we provide a comprehensive overview of GFM-based RS technologies by introducing a clear taxonomy of current approaches, diving into methodological details, and highlighting key challenges and future directions. By synthesizing recent advancements, we aim to offer valuable insights into the evolving landscape of GFM-based recommender systems.
摘要:推薦系統 (RS) 是用於導航廣闊的線上資訊的基本工具,深度學習的進步在提升排名準確度方面扮演著日益重要的角色。其中,圖形神經網路 (GNN) 擅長萃取高階結構資訊,而大型語言模型 (LLM) 則設計用於處理和理解自然語言,這使得這兩種方法都非常有效且廣泛採用。最近的研究專注於圖形基礎模型 (GFM),它整合了 GNN 和 LLM 的優點,透過利用使用者與項目關係的圖形化結構以及文字理解,更有效率地建構複雜的 RS 問題模型。在這項調查中,我們透過介紹當前方法的明確分類、深入探討方法論細節,以及強調關鍵挑戰和未來方向,提供了 GFM 為基礎的 RS 技術的全面概觀。透過綜合最近的進展,我們旨在提供對 GFM 為基礎的推薦系統不斷演變的版圖的寶貴見解。
Self-Evaluation for Job-Shop Scheduling
2502.08684v1 by Imanol Echeverria, Maialen Murua, Roberto Santana
Combinatorial optimization problems, such as scheduling and route planning, are crucial in various industries but are computationally intractable due to their NP-hard nature. Neural Combinatorial Optimization methods leverage machine learning to address these challenges but often depend on sequential decision-making, which is prone to error accumulation as small mistakes propagate throughout the process. Inspired by self-evaluation techniques in Large Language Models, we propose a novel framework that generates and evaluates subsets of assignments, moving beyond traditional stepwise approaches. Applied to the Job-Shop Scheduling Problem, our method integrates a heterogeneous graph neural network with a Transformer to build a policy model and a self-evaluation function. Experimental validation on challenging, well-known benchmarks demonstrates the effectiveness of our approach, surpassing state-of-the-art methods.
摘要:組合優化問題,例如排程和路線規劃,在各行各業中至關重要,但由於它們的 NP 難度,在計算上難以處理。神經組合優化方法利用機器學習來解決這些挑戰,但通常依賴於序貫決策制定,而序貫決策制定容易發生錯誤累積,因為小錯誤會在整個過程中傳播。受大型語言模型中的自我評估技術啟發,我們提出了一個新的框架,可生成和評估作業子集,超越傳統的分步方法。應用於工作車間排程問題,我們的方法將異質圖神經網路與 Transformer 整合在一起,以建立策略模型和自我評估函數。在具有挑戰性的著名基準上的實驗驗證證明了我們方法的有效性,超越了最先進的方法。
Improving Existing Optimization Algorithms with LLMs
2502.08298v1 by Camilo Chacón Sartori, Christian Blum
The integration of Large Language Models (LLMs) into optimization has created a powerful synergy, opening exciting research opportunities. This paper investigates how LLMs can enhance existing optimization algorithms. Using their pre-trained knowledge, we demonstrate their ability to propose innovative heuristic variations and implementation strategies. To evaluate this, we applied a non-trivial optimization algorithm, Construct, Merge, Solve and Adapt (CMSA) -- a hybrid metaheuristic for combinatorial optimization problems that incorporates a heuristic in the solution construction phase. Our results show that an alternative heuristic proposed by GPT-4o outperforms the expert-designed heuristic of CMSA, with the performance gap widening on larger and denser graphs. Project URL: https://imp-opt-algo-llms.surge.sh/
摘要:大型语言模型 (LLM) 与优化相结合,创造了一种强大的协同作用,开启了令人兴奋的研究机会。本文探讨了 LLM 如何增强现有的优化算法。利用其预先训练的知识,我们展示了它们提出创新启发式变体和实施策略的能力。为了评估这一点,我们应用了一种非平凡的优化算法,构建、合并、求解和适应 (CMSA)——一种用于组合优化问题的混合元启发式算法,它在求解构建阶段纳入了启发式算法。我们的结果表明,GPT-4o 提出的替代启发式算法优于 CMSA 的专家设计的启发式算法,并且随着图形变得更大、更密集,性能差距也在扩大。项目网址:https://imp-opt-algo-llms.surge.sh/
LLM4GNAS: A Large Language Model Based Toolkit for Graph Neural Architecture Search
2502.10459v1 by Yang Gao, Hong Yang, Yizhi Chen, Junxian Wu, Peng Zhang, Haishuai Wang
Graph Neural Architecture Search (GNAS) facilitates the automatic design of Graph Neural Networks (GNNs) tailored to specific downstream graph learning tasks. However, existing GNAS approaches often require manual adaptation to new graph search spaces, necessitating substantial code optimization and domain-specific knowledge. To address this challenge, we present LLM4GNAS, a toolkit for GNAS that leverages the generative capabilities of Large Language Models (LLMs). LLM4GNAS includes an algorithm library for graph neural architecture search algorithms based on LLMs, enabling the adaptation of GNAS methods to new search spaces through the modification of LLM prompts. This approach reduces the need for manual intervention in algorithm adaptation and code modification. The LLM4GNAS toolkit is extensible and robust, incorporating LLM-enhanced graph feature engineering, LLM-enhanced graph neural architecture search, and LLM-enhanced hyperparameter optimization. Experimental results indicate that LLM4GNAS outperforms existing GNAS methods on tasks involving both homogeneous and heterogeneous graphs.
摘要:圖形神經架構搜尋 (GNAS) 促進圖形神經網路 (GNN) 的自動設計,以符合特定下游圖形學習任務。然而,現有的 GNAS 方法通常需要手動調整至新的圖形搜尋空間,這需要大量的程式碼最佳化和領域特定知識。為了應對這項挑戰,我們提出 LLM4GNAS,一個利用大型語言模型 (LLM) 的生成能力的 GNAS 工具包。LLM4GNAS 包含一個基於 LLM 的圖形神經架構搜尋演算法函式庫,讓 GNAS 方法能夠透過修改 LLM 提示來適應新的搜尋空間。這種方法減少了演算法適應和程式碼修改中手動介入的需要。LLM4GNAS 工具包具有可擴充性和穩健性,整合了 LLM 增強的圖形特徵工程、LLM 增強的圖形神經架構搜尋和 LLM 增強的超參數最佳化。實驗結果表明,LLM4GNAS 在涉及同質和異質圖形的任務上優於現有的 GNAS 方法。
ACCESS : A Benchmark for Abstract Causal Event Discovery and Reasoning
2502.08148v1 by Vy Vo, Lizhen Qu, Tao Feng, Yuncheng Hua, Xiaoxi Kang, Songhai Fan, Tim Dwyer, Lay-Ki Soon, Gholamreza Haffari
Identifying cause-and-effect relationships is critical to understanding real-world dynamics and ultimately causal reasoning. Existing methods for identifying event causality in NLP, including those based on Large Language Models (LLMs), exhibit difficulties in out-of-distribution settings due to the limited scale and heavy reliance on lexical cues within available benchmarks. Modern benchmarks, inspired by probabilistic causal inference, have attempted to construct causal graphs of events as a robust representation of causal knowledge, where \texttt{CRAB} \citep{romanou2023crab} is one such recent benchmark along this line. In this paper, we introduce \texttt{ACCESS}, a benchmark designed for discovery and reasoning over abstract causal events. Unlike existing resources, \texttt{ACCESS} focuses on causality of everyday life events on the abstraction level. We propose a pipeline for identifying abstractions for event generalizations from \texttt{GLUCOSE} \citep{mostafazadeh-etal-2020-glucose}, a large-scale dataset of implicit commonsense causal knowledge, from which we subsequently extract $1,4$K causal pairs. Our experiments highlight the ongoing challenges of using statistical methods and/or LLMs for automatic abstraction identification and causal discovery in NLP. Nonetheless, we demonstrate that the abstract causal knowledge provided in \texttt{ACCESS} can be leveraged for enhancing QA reasoning performance in LLMs.
摘要:
Neuro-Conceptual Artificial Intelligence: Integrating OPM with Deep Learning to Enhance Question Answering Quality
2502.09658v1 by Xin Kang, Veronika Shteingardt, Yuhan Wang, Dov Dori
Knowledge representation and reasoning are critical challenges in Artificial Intelligence (AI), particularly in integrating neural and symbolic approaches to achieve explainable and transparent AI systems. Traditional knowledge representation methods often fall short of capturing complex processes and state changes. We introduce Neuro-Conceptual Artificial Intelligence (NCAI), a specialization of the neuro-symbolic AI approach that integrates conceptual modeling using Object-Process Methodology (OPM) ISO 19450:2024 with deep learning to enhance question-answering (QA) quality. By converting natural language text into OPM models using in-context learning, NCAI leverages the expressive power of OPM to represent complex OPM elements-processes, objects, and states-beyond what traditional triplet-based knowledge graphs can easily capture. This rich structured knowledge representation improves reasoning transparency and answer accuracy in an OPM-QA system. We further propose transparency evaluation metrics to quantitatively measure how faithfully the predicted reasoning aligns with OPM-based conceptual logic. Our experiments demonstrate that NCAI outperforms traditional methods, highlighting its potential for advancing neuro-symbolic AI by providing rich knowledge representations, measurable transparency, and improved reasoning.
摘要:知識表徵與推理是人工智慧 (AI) 中的重大挑戰,特別是在整合神經與符號方法以實現可解釋且透明的人工智慧系統時。傳統的知識表徵方法通常無法捕捉複雜的流程和狀態變化。我們引入了神經概念人工智慧 (NCAI),一種神經符號 AI 方法的專門化,它將使用物件流程方法 (OPM) ISO 19450:2024 的概念建模與深度學習整合在一起,以提升問答 (QA) 的品質。透過使用情境學習將自然語言文字轉換為 OPM 模型,NCAI 充分利用 OPM 的表達能力來表徵複雜的 OPM 元素(流程、物件和狀態),超越傳統的三元組知識圖表容易捕捉的範圍。這種豐富的結構化知識表徵改善了 OPM-QA 系統中的推理透明度和答案準確度。我們進一步提出了透明度評估指標,以量化測量預測推理與基於 OPM 的概念邏輯的吻合程度。我們的實驗證明,NCAI 優於傳統方法,突顯了它在透過提供豐富的知識表徵、可測量的透明度和改善的推理來推進神經符號 AI 的潛力。
GCoT: Chain-of-Thought Prompt Learning for Graphs
2502.08092v1 by Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, Yuan Fang
Chain-of-thought (CoT) prompting has achieved remarkable success in natural
language processing (NLP). However, its vast potential remains largely
unexplored for graphs. This raises an interesting question: How can we design
CoT prompting for graphs to guide graph models to learn step by step? On one
hand, unlike natural languages, graphs are non-linear and characterized by
complex topological structures. On the other hand, many graphs lack textual
data, making it difficult to formulate language-based CoT prompting. In this
work, we propose the first CoT prompt learning framework for text-free graphs,
GCoT. Specifically, we decompose the adaptation process for each downstream
task into a series of inference steps, with each step consisting of
prompt-based inference, thought'' generation, and thought-conditioned prompt
learning. While the steps mimic CoT prompting in NLP, the exact mechanism
differs significantly. Specifically, at each step, an input graph, along with a
prompt, is first fed into a pre-trained graph encoder for prompt-based
inference. We then aggregate the hidden layers of the encoder to construct a
thought'', which captures the working state of each node in the current step.
Conditioned on this thought, we learn a prompt specific to each node based on
the current state. These prompts are fed into the next inference step,
repeating the cycle. To evaluate and analyze the effectiveness of GCoT, we
conduct comprehensive experiments on eight public datasets, which demonstrate
the advantage of our approach.
摘要:
Linking Cryptoasset Attribution Tags to Knowledge Graph Entities: An LLM-based Approach
2502.10453v1 by Régnier Avice, Bernhard Haslhofer, Zhidong Li, Jianlong Zhou
Attribution tags form the foundation of modern cryptoasset forensics. However, inconsistent or incorrect tags can mislead investigations and even result in false accusations. To address this issue, we propose a novel computational method based on Large Language Models (LLMs) to link attribution tags with well-defined knowledge graph concepts. We implemented this method in an end-to-end pipeline and conducted experiments showing that our approach outperforms baseline methods by up to 37.4% in F1-score across three publicly available attribution tag datasets. By integrating concept filtering and blocking procedures, we generate candidate sets containing five knowledge graph entities, achieving a recall of 93% without the need for labeled data. Additionally, we demonstrate that local LLM models can achieve F1-scores of 90%, comparable to remote models which achieve 94%. We also analyze the cost-performance trade-offs of various LLMs and prompt templates, showing that selecting the most cost-effective configuration can reduce costs by 90%, with only a 1% decrease in performance. Our method not only enhances attribution tag quality but also serves as a blueprint for fostering more reliable forensic evidence.
摘要:歸因標籤構成現代加密資產鑑識的基礎。 然而,不一致或不正確的標籤會誤導調查,甚至導致錯誤的指控。為了解決這個問題,我們提出了一種基於大型語言模型 (LLM) 的新型計算方法,將歸因標籤與定義明確的知識圖譜概念連結起來。我們在端到端管道中實施了這種方法,並進行了實驗,結果顯示我們的做法在三個公開可用的歸因標籤資料集中,F1 分數比基線方法高出 37.4%。透過整合概念過濾和封鎖程序,我們生成了包含五個知識圖譜實體的候選集,在不需要標籤資料的情況下,達到了 93% 的召回率。 此外,我們證明了本機 LLM 模型可以達到 90% 的 F1 分數,與達到 94% 的遠端模型相當。我們也分析了各種 LLM 和提示範本的成本效益權衡,結果顯示選擇最具成本效益的設定可以將成本降低 90%,而效能只下降 1%。我們的做法不僅提升了歸因標籤的品質,也作為促進更可靠鑑識證據的藍圖。
Deep Semantic Graph Learning via LLM based Node Enhancement
2502.07982v1 by Chuanqi Shi, Yiyi Tao, Hang Zhang, Lun Wang, Shaoshuai Du, Yixian Shen, Yanxin Shen
Graph learning has attracted significant attention due to its widespread real-world applications. Current mainstream approaches rely on text node features and obtain initial node embeddings through shallow embedding learning using GNNs, which shows limitations in capturing deep textual semantics. Recent advances in Large Language Models (LLMs) have demonstrated superior capabilities in understanding text semantics, transforming traditional text feature processing. This paper proposes a novel framework that combines Graph Transformer architecture with LLM-enhanced node features. Specifically, we leverage LLMs to generate rich semantic representations of text nodes, which are then processed by a multi-head self-attention mechanism in the Graph Transformer to capture both local and global graph structural information. Our model utilizes the Transformer's attention mechanism to dynamically aggregate neighborhood information while preserving the semantic richness provided by LLM embeddings. Experimental results demonstrate that the LLM-enhanced node features significantly improve the performance of graph learning models on node classification tasks. This approach shows promising results across multiple graph learning tasks, offering a practical direction for combining graph networks with language models.
摘要:圖形學習因其廣泛的現實世界應用而備受關注。目前的熱門方法依賴於文本節點特徵,並通過使用 GNN 的淺層嵌入學習來獲取初始節點嵌入,這在捕捉深度文本語義方面表現出局限性。大語言模型 (LLM) 的最新進展已證明在理解文本語義方面具有優越的能力,轉換了傳統的文本特徵處理。本文提出了一種新的框架,將圖形轉換器架構與 LLM 增強的節點特徵相結合。具體來說,我們利用 LLM 來生成文本節點的豐富語義表示,然後在圖形轉換器中由多頭自我注意機制處理,以捕捉局部和全局圖形結構信息。我們的模型利用 Transformer 的注意機制來動態聚合鄰域信息,同時保留 LLM 嵌入提供的語義豐富性。實驗結果表明,LLM 增強的節點特徵顯著提高了圖形學習模型在節點分類任務上的性能。這種方法在多個圖形學習任務中顯示出有希望的結果,為將圖形網絡與語言模型相結合提供了實用的方向。
Cardiverse: Harnessing LLMs for Novel Card Game Prototyping
2502.07128v1 by Danrui Li, Sen Zhang, Sam S. Sohn, Kaidong Hu, Muhammad Usman, Mubbasir Kapadia
The prototyping of computer games, particularly card games, requires extensive human effort in creative ideation and gameplay evaluation. Recent advances in Large Language Models (LLMs) offer opportunities to automate and streamline these processes. However, it remains challenging for LLMs to design novel game mechanics beyond existing databases, generate consistent gameplay environments, and develop scalable gameplay AI for large-scale evaluations. This paper addresses these challenges by introducing a comprehensive automated card game prototyping framework. The approach highlights a graph-based indexing method for generating novel game designs, an LLM-driven system for consistent game code generation validated by gameplay records, and a gameplay AI constructing method that uses an ensemble of LLM-generated action-value functions optimized through self-play. These contributions aim to accelerate card game prototyping, reduce human labor, and lower barriers to entry for game developers.
摘要:電腦遊戲,尤其是卡牌遊戲的原型製作,需要大量的人力在創意構思和遊戲玩法評估上。大型語言模型 (LLM) 的最新進展提供了自動化和簡化這些流程的機會。然而,LLM 在設計超越現有資料庫的新穎遊戲機制、生成一致的遊戲環境,以及開發用於大規模評估的可擴充遊戲 AI 方面仍然面臨挑戰。本文通過引入一個全面的自動化卡牌遊戲原型製作框架來應對這些挑戰。該方法強調了一種基於圖表的索引方法,用於生成新穎的遊戲設計,一個由 LLM 驅動的系統,用於一致的遊戲程式碼生成,並由遊戲記錄驗證,以及一個遊戲 AI 構建方法,該方法使用由 LLM 生成的動作值函數的集合,通過自我對弈進行最佳化。這些貢獻旨在加速卡牌遊戲原型製作,減少人力,並降低遊戲開發人員的進入門檻。
GraNNite: Enabling High-Performance Execution of Graph Neural Networks on Resource-Constrained Neural Processing Units
2502.06921v2 by Arghadip Das, Shamik Kundu, Arnab Raha, Soumendu Ghosh, Deepak Mathaikutty, Vijay Raghunathan
Graph Neural Networks (GNNs) are vital for learning from graph-structured data, enabling applications in network analysis, recommendation systems, and speech analytics. Deploying them on edge devices like client PCs and laptops enhances real-time processing, privacy, and cloud independence. GNNs aid Retrieval-Augmented Generation (RAG) for Large Language Models (LLMs) and enable event-based vision tasks. However, irregular memory access, sparsity, and dynamic structures cause high latency and energy overhead on resource-constrained devices. While modern edge processors integrate CPUs, GPUs, and NPUs, NPUs designed for data-parallel tasks struggle with irregular GNN computations. We introduce GraNNite, the first hardware-aware framework optimizing GNN execution on commercial-off-the-shelf (COTS) SOTA DNN accelerators via a structured three-step methodology: (1) enabling NPU execution, (2) optimizing performance, and (3) trading accuracy for efficiency gains. Step 1 employs GraphSplit for workload distribution and StaGr for static aggregation, while GrAd and NodePad handle dynamic graphs. Step 2 boosts performance using EffOp for control-heavy tasks and GraSp for sparsity exploitation. Graph Convolution optimizations PreG, SymG, and CacheG reduce redundancy and memory transfers. Step 3 balances quality versus efficiency, where QuantGr applies INT8 quantization, and GrAx1, GrAx2, and GrAx3 accelerate attention, broadcast-add, and SAGE-max aggregation. On Intel Core Ultra AI PCs, GraNNite achieves 2.6X to 7.6X speedups over default NPU mappings and up to 8.6X energy gains over CPUs and GPUs, delivering 10.8X and 6.7X higher performance than CPUs and GPUs, respectively, across GNN models.
摘要:圖形神經網路 (GNN) 對於從圖形結構資料中學習至關重要,能應用於網路分析、推薦系統和語音分析。將其部署在邊緣裝置(例如用戶端電腦和筆電)上可增強即時處理、隱私和雲端獨立性。GNN 協助大型語言模型 (LLM) 的檢索增強生成 (RAG),並支援基於事件的視覺任務。然而,不規則的記憶體存取、稀疏性和動態結構會導致資源受限裝置上的高延遲和能源負擔。儘管現代邊緣處理器整合了 CPU、GPU 和 NPU,但針對資料平行任務所設計的 NPU 難以處理不規則的 GNN 計算。我們引入了 GraNNite,這是第一個硬體感知框架,透過結構化的三步驟方法最佳化商用現成 (COTS) SOTA DNN 加速器上的 GNN 執行:(1) 啟用 NPU 執行,(2) 最佳化效能,以及 (3) 以準確度換取效率提升。步驟 1 使用 GraphSplit 進行工作負載分配,並使用 StaGr 進行靜態聚合,而 GrAd 和 NodePad 則處理動態圖形。步驟 2 使用 EffOp 提升控制密集型任務的效能,並使用 GraSp 進行稀疏性利用。圖形卷積最佳化 PreG、SymG 和 CacheG 減少了冗餘和記憶體傳輸。步驟 3 平衡品質與效率,其中 QuantGr 適用 INT8 量化,而 GrAx1、GrAx2 和 GrAx3 則加速注意力、廣播加法和 SAGE-max 聚合。在 Intel Core Ultra AI PC 上,GraNNite 在預設 NPU 映射上實現了 2.6X 到 7.6X 的加速,在 CPU 和 GPU 上實現了高達 8.6X 的能源增益,在 GNN 模型中分別提供了比 CPU 和 GPU 高出 10.8X 和 6.7X 的效能。
Automatic Annotation Augmentation Boosts Translation between Molecules and Natural Language
2502.06634v1 by Zhiqiang Zhong, Simon Sataa-Yu Larsen, Haoyu Guo, Tao Tang, Kuangyu Zhou, Davide Mottin
Recent advancements in AI for biological research focus on integrating molecular data with natural language to accelerate drug discovery. However, the scarcity of high-quality annotations limits progress in this area. This paper introduces LA$^3$, a Language-based Automatic Annotation Augmentation framework that leverages large language models to augment existing datasets, thereby improving AI training. We demonstrate the effectiveness of LA$^3$ by creating an enhanced dataset, LaChEBI-20, where we systematically rewrite the annotations of molecules from an established dataset. These rewritten annotations preserve essential molecular information while providing more varied sentence structures and vocabulary. Using LaChEBI-20, we train LaMolT5 based on a benchmark architecture to learn the mapping between molecular representations and augmented annotations. Experimental results on text-based de novo molecule generation and molecule captioning demonstrate that LaMolT5 outperforms state-of-the-art models. Notably, incorporating LA$^3$ leads to improvements of up to 301% over the benchmark architecture. Furthermore, we validate the effectiveness of LA$^3$ notable applications in image, text and graph tasks, affirming its versatility and utility.
摘要:
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
2502.06472v1 by Yuxing Lu, Jinzhuo Wang
Maintaining comprehensive and up-to-date knowledge graphs (KGs) is critical for modern AI systems, but manual curation struggles to scale with the rapid growth of scientific literature. This paper presents KARMA, a novel framework employing multi-agent large language models (LLMs) to automate KG enrichment through structured analysis of unstructured text. Our approach employs nine collaborative agents, spanning entity discovery, relation extraction, schema alignment, and conflict resolution that iteratively parse documents, verify extracted knowledge, and integrate it into existing graph structures while adhering to domain-specific schema. Experiments on 1,200 PubMed articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1\% LLM-verified correctness and reducing conflict edges by 18.6\% through multi-layer assessments.
摘要:維護全面且最新的知識圖譜 (KG) 對現代 AI 系統至關重要,但手動策劃難以隨著科學文獻的快速增長而擴展。本文提出了 KARMA,一個採用多代理大型語言模型 (LLM) 的新框架,透過對非結構化文本的結構化分析來自動化 KG 豐富化。我們的做法採用九個協作代理,涵蓋實體發現、關係提取、架構比對和衝突解決,這些代理會反覆分析文件、驗證提取的知識,並將其整合到現有的圖結構中,同時遵守特定領域的架構。針對來自三個不同領域的 1,200 篇 PubMed 文章進行的實驗證明了 KARMA 在知識圖譜豐富化方面的有效性,識別出多達 38,230 個新實體,同時達到 83.1% 的 LLM 驗證正確性,並透過多層評估將衝突邊緣降低了 18.6%。
RoToR: Towards More Reliable Responses for Order-Invariant Inputs
2502.08662v1 by Soyoung Yoon, Dongha Ahn, Youngwon Lee, Minkyu Jung, HyungJoo Jang, Seung-won Hwang
Mitigating positional bias of language models (LMs) for listwise inputs is a well-known and important problem (e.g., lost-in-the-middle). While zero-shot order-invariant LMs have been proposed to solve this issue, their success on practical listwise problems has been limited. In this work, as a first contribution, we identify and overcome two limitations to make zero-shot invariant LMs more practical: (1) training and inference distribution mismatch arising from modifying positional ID assignments to enforce invariance, and (2) failure to adapt to a mixture of order-invariant and sensitive inputs in practical listwise problems. To overcome, we propose (1) RoToR, a zero-shot invariant LM for genuinely order-invariant inputs with minimal modifications of positional IDs, and (2) Selective Routing, an adaptive framework that handles both order-invariant and order-sensitive inputs in listwise tasks. On the Lost in the middle (LitM), Knowledge Graph Question Answering (KGQA), and MMLU benchmarks, we show that RoToR with Selective Routing can effectively handle practical listwise input tasks in a zero-shot manner.
摘要:語言模型 (LM) 的位置偏差緩解對於列表輸入來說是一個廣為人知且重要的問題(例如,迷失在中間)。雖然已經提出零次學習順序不變的 LM 來解決這個問題,但它們在實際列表問題上的成功卻很有限。在這項工作中,作為第一個貢獻,我們找出並克服了兩個限制,讓零次學習不變的 LM 更有實用性:(1) 訓練和推論分布不匹配,這是由於修改位置 ID 分配以強制不變性所造成的,以及 (2) 無法適應實際列表問題中不變和敏感輸入的組合。為了克服這些問題,我們提出 (1) RoToR,一個零次學習不變的 LM,用於真正不變的輸入,並對位置 ID 進行最小的修改,以及 (2) 選擇性路由,一個自適應框架,用於處理列表任務中不變和敏感的輸入。在迷失在中間 (LitM)、知識圖譜問答 (KGQA) 和 MMLU 基準測試中,我們展示了 RoToR 與選擇性路由可以有效地以零次學習的方式處理實際的列表輸入任務。
K-ON: Stacking Knowledge On the Head Layer of Large Language Model
2502.06257v1 by Lingbing Guo, Yichi Zhang, Zhongpu Bo, Zhuo Chen, Mengshu Sun, Zhiqiang Zhang, Wen Zhang, Huajun Chen
Recent advancements in large language models (LLMs) have significantly improved various natural language processing (NLP) tasks. Typically, LLMs are trained to predict the next token, aligning well with many NLP tasks. However, in knowledge graph (KG) scenarios, entities are the fundamental units and identifying an entity requires at least several tokens. This leads to a granularity mismatch between KGs and natural languages. To address this issue, we propose K-ON, which integrates KG knowledge into the LLM by employing multiple head layers for next k-step prediction. K-ON can not only generate entity-level results in one step, but also enables contrastive loss against entities, which is the most powerful tool in KG representation learning. Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities.
摘要:大型語言模型 (LLM) 的最新進展顯著提升了各種自然語言處理 (NLP) 任務。通常,LLM 會接受訓練以預測下一個符號,這與許多 NLP 任務非常吻合。然而,在知識圖譜 (KG) 場景中,實體是基本單位,而識別實體至少需要幾個符號。這導致 KG 和自然語言之間的粒度不匹配。為了解決這個問題,我們提出了 K-ON,它透過採用多個頭部層進行下一個 k 步預測,將 KG 知識整合到 LLM 中。K-ON 不僅可以在一個步驟中產生實體層級的結果,還能針對實體啟用對比損失,這是 KG 表示學習中最有力的工具。實驗結果顯示,K-ON 優於將文字甚至其他方式納入考量的最新方法。
LegalViz: Legal Text Visualization by Text To Diagram Generation
2502.06147v2 by Eri Onami, Taiki Miyanishi, Koki Maeda, Shuhei Kurita
Legal documents including judgments and court orders require highly sophisticated legal knowledge for understanding. To disclose expert knowledge for non-experts, we explore the problem of visualizing legal texts with easy-to-understand diagrams and propose a novel dataset of LegalViz with 23 languages and 7,010 cases of legal document and visualization pairs, using the DOT graph description language of Graphviz. LegalViz provides a simple diagram from a complicated legal corpus identifying legal entities, transactions, legal sources, and statements at a glance, that are essential in each judgment. In addition, we provide new evaluation metrics for the legal diagram visualization by considering graph structures, textual similarities, and legal contents. We conducted empirical studies on few-shot and finetuning large language models for generating legal diagrams and evaluated them with these metrics, including legal content-based evaluation within 23 languages. Models trained with LegalViz outperform existing models including GPTs, confirming the effectiveness of our dataset.
摘要:法律文件,包括判決和法院命令,需要高度專業的法律知識才能理解。為了向非專家揭露專家知識,我們探討了使用易於理解的圖表將法律文本視覺化的問題,並提出了一個新的 LegalViz 數據集,其中包含 23 種語言和 7,010 個法律文件和視覺化配對,使用 Graphviz 的 DOT 圖形描述語言。LegalViz 從複雜的法律語料庫中提供了一個簡單的圖表,可以一目了然地識別法律實體、交易、法律來源和陳述,這些在每項判決中都是必不可少的。此外,我們通過考慮圖形結構、文本相似性和法律內容,為法律圖表視覺化提供了新的評估指標。我們對少次學習和微調大型語言模型進行了實證研究,以生成法律圖表,並使用這些指標對它們進行了評估,包括在 23 種語言中基於法律內容的評估。使用 LegalViz 訓練的模型優於現有的模型,包括 GPT,證實了我們數據集的有效性。
Deconstructing Depression Stigma: Integrating AI-driven Data Collection and Analysis with Causal Knowledge Graphs
2502.06075v1 by Han Meng, Renwen Zhang, Ganyi Wang, Yitian Yang, Peinuan Qin, Jungup Lee, Yi-Chieh Lee
Mental-illness stigma is a persistent social problem, hampering both treatment-seeking and recovery. Accordingly, there is a pressing need to understand it more clearly, but analyzing the relevant data is highly labor-intensive. Therefore, we designed a chatbot to engage participants in conversations; coded those conversations qualitatively with AI assistance; and, based on those coding results, built causal knowledge graphs to decode stigma. The results we obtained from 1,002 participants demonstrate that conversation with our chatbot can elicit rich information about people's attitudes toward depression, while our AI-assisted coding was strongly consistent with human-expert coding. Our novel approach combining large language models (LLMs) and causal knowledge graphs uncovered patterns in individual responses and illustrated the interrelationships of psychological constructs in the dataset as a whole. The paper also discusses these findings' implications for HCI researchers in developing digital interventions, decomposing human psychological constructs, and fostering inclusive attitudes.
摘要:精神疾病的污名化是一個持續存在的社會問題,阻礙了尋求治療和康復。因此,迫切需要更清楚地了解它,但分析相關數據非常費力。因此,我們設計了一個聊天機器人,讓參與者參與對話;使用 AI 協助對這些對話進行定性編碼;並根據這些編碼結果,構建因果知識圖譜來破譯污名化。我們從 1,002 名參與者那裡獲得的結果表明,與我們的聊天機器人的對話可以引出人們對憂鬱症的豐富資訊,而我們 AI 輔助的編碼與人類專家編碼非常一致。我們將大型語言模型 (LLM) 和因果知識圖譜相結合的新方法揭示了個別反應中的模式,並說明了資料集中心理建構之間的相互關係。本文還討論了這些發現對 HCI 研究人員在開發數位介入措施、分解人類心理建構和培養包容態度方面的影響。
LegalSeg: Unlocking the Structure of Indian Legal Judgments Through Rhetorical Role Classification
2502.05836v1 by Shubham Kumar Nigam, Tanmay Dubey, Govind Sharma, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya
In this paper, we address the task of semantic segmentation of legal documents through rhetorical role classification, with a focus on Indian legal judgments. We introduce LegalSeg, the largest annotated dataset for this task, comprising over 7,000 documents and 1.4 million sentences, labeled with 7 rhetorical roles. To benchmark performance, we evaluate multiple state-of-the-art models, including Hierarchical BiLSTM-CRF, TransformerOverInLegalBERT (ToInLegalBERT), Graph Neural Networks (GNNs), and Role-Aware Transformers, alongside an exploratory RhetoricLLaMA, an instruction-tuned large language model. Our results demonstrate that models incorporating broader context, structural relationships, and sequential sentence information outperform those relying solely on sentence-level features. Additionally, we conducted experiments using surrounding context and predicted or actual labels of neighboring sentences to assess their impact on classification accuracy. Despite these advancements, challenges persist in distinguishing between closely related roles and addressing class imbalance. Our work underscores the potential of advanced techniques for improving legal document understanding and sets a strong foundation for future research in legal NLP.
摘要:
LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning
2502.05453v1 by Hanqing Yang, Jingdi Chen, Marie Siew, Tania Lorido-Botran, Carlee Joe-Wong
Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals. We publicly release our project at: https://happyeureka.github.io/damcs.
摘要:
SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation
2502.05424v1 by Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, Hui Zhang
Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large language models to align multiple domains based on textual descriptions associated with the graphs, limiting their applicability to text-attributed graphs. For text-free graphs, a few recent works attempt to align different feature distributions across domains, while generally neglecting structural differences. In this work, we propose a novel Structure Alignment framework for text-free Multi-domain Graph Pre-Training and cross-domain adaptation (SAMGPT). It is designed to learn multi-domain knowledge from graphs originating in multiple source domains, which can then be adapted to address applications in an unseen target domain. Specifically, we introduce a set of structure tokens to harmonize structure-based aggregation across source domains during the pre-training phase. Next, for cross-domain adaptation, we design dual prompts, namely, holistic prompts and specific prompts, which adapt unified multi-domain structural knowledge and fine-grained, domain-specific information, respectively, to a target domain. Finally, we conduct comprehensive experiments on seven public datasets to evaluate and analyze the effectiveness of SAMGPT.
摘要:圖表能夠在許多線上服務中對相互關聯的實體進行建模, 支援網路上廣泛的應用程式。這提出了重要的問題:我們如何針對多個來源網域訓練圖表基礎模型,並適應未見過的目標網域?一個主要的障礙是,來自不同網域的圖表通常表現出不同的特性。一些研究利用大型語言模型,根據與圖表相關的文字描述,對齊多個網域,限制其適用性於有文字屬性的圖表。對於沒有文字的圖表,最近的一些作品嘗試對齊跨網域的不同特徵分佈,同時通常忽略結構上的差異。在這項工作中,我們提出了一個新的結構對齊框架,用於無文字多網域圖表預訓練和跨網域適應 (SAMGPT)。它被設計為從起源於多個來源網域的圖表中學習多網域知識,然後可以適應於未見過的目標網域中的應用程式。具體來說,我們引入了一組結構化代碼,以在預訓練階段,調和跨來源網域的基於結構的聚合。接下來,對於跨網域適應,我們設計了雙重提示,即整體提示和具體提示,分別將統一的多網域結構知識和細緻的、特定於網域的資訊適應到目標網域。最後,我們在七個公共資料集上進行了全面的實驗,以評估和分析 SAMGPT 的有效性。
Graph-based Molecular In-context Learning Grounded on Morgan Fingerprints
2502.05414v1 by Ali Al-Lawati, Jason Lucas, Zhiwei Zhang, Prasenjit Mitra, Suhang Wang
In-context learning (ICL) effectively conditions large language models (LLMs) for molecular tasks, such as property prediction and molecule captioning, by embedding carefully selected demonstration examples into the input prompt. This approach avoids the computational overhead of extensive pertaining and fine-tuning. However, current prompt retrieval methods for molecular tasks have relied on molecule feature similarity, such as Morgan fingerprints, which do not adequately capture the global molecular and atom-binding relationships. As a result, these methods fail to represent the full complexity of molecular structures during inference. Moreover, small-to-medium-sized LLMs, which offer simpler deployment requirements in specialized systems, have remained largely unexplored in the molecular ICL literature. To address these gaps, we propose a self-supervised learning technique, GAMIC (Graph-Aligned Molecular In-Context learning, which aligns global molecular structures, represented by graph neural networks (GNNs), with textual captions (descriptions) while leveraging local feature similarity through Morgan fingerprints. In addition, we introduce a Maximum Marginal Relevance (MMR) based diversity heuristic during retrieval to optimize input prompt demonstration samples. Our experimental findings using diverse benchmark datasets show GAMIC outperforms simple Morgan-based ICL retrieval methods across all tasks by up to 45%.
摘要:
Knowledge Graph-Guided Retrieval Augmented Generation
2502.06864v1 by Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, Wei Hu
Retrieval-augmented generation (RAG) has emerged as a promising technology for addressing hallucination issues in the responses generated by large language models (LLMs). Existing studies on RAG primarily focus on applying semantic-based approaches to retrieve isolated relevant chunks, which ignore their intrinsic relationships. In this paper, we propose a novel Knowledge Graph-Guided Retrieval Augmented Generation (KG$^2$RAG) framework that utilizes knowledge graphs (KGs) to provide fact-level relationships between chunks, improving the diversity and coherence of the retrieved results. Specifically, after performing a semantic-based retrieval to provide seed chunks, KG$^2$RAG employs a KG-guided chunk expansion process and a KG-based chunk organization process to deliver relevant and important knowledge in well-organized paragraphs. Extensive experiments conducted on the HotpotQA dataset and its variants demonstrate the advantages of KG$^2$RAG compared to existing RAG-based approaches, in terms of both response quality and retrieval quality.
摘要:檢索增強生成 (RAG) 已成為一項有前途的技術,用於解決大型語言模型 (LLM) 所產生回應中的幻覺問題。現有關於 RAG 的研究主要專注於應用基於語義的方法來檢索孤立相關的區塊,而忽略它們的內在關係。在本文中,我們提出了一個新穎的知識圖表引導檢索增強生成 (KG$^2$RAG) 框架,它利用知識圖表 (KG) 來提供區塊之間的事實層級關係,從而提高檢索結果的多樣性和一致性。具體來說,在執行基於語義的檢索以提供種子區塊後,KG$^2$RAG 採用 KG 引導的區塊擴充程序和基於 KG 的區塊組織程序,以在組織良好的段落中傳達相關且重要的知識。在 HotpotQA 資料集及其變體上進行的大量實驗證明了 KG$^2$RAG 在回應品質和檢索品質方面優於現有的基於 RAG 的方法。
Can Large Language Models Understand Intermediate Representations?
2502.06854v1 by Hailong Jiang, Jianfeng Zhu, Yao Wan, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan
Intermediate Representations (IRs) are essential in compiler design and program analysis, yet their comprehension by Large Language Models (LLMs) remains underexplored. This paper presents a pioneering empirical study to investigate the capabilities of LLMs, including GPT-4, GPT-3, Gemma 2, LLaMA 3.1, and Code Llama, in understanding IRs. We analyze their performance across four tasks: Control Flow Graph (CFG) reconstruction, decompilation, code summarization, and execution reasoning. Our results indicate that while LLMs demonstrate competence in parsing IR syntax and recognizing high-level structures, they struggle with control flow reasoning, execution semantics, and loop handling. Specifically, they often misinterpret branching instructions, omit critical IR operations, and rely on heuristic-based reasoning, leading to errors in CFG reconstruction, IR decompilation, and execution reasoning. The study underscores the necessity for IR-specific enhancements in LLMs, recommending fine-tuning on structured IR datasets and integration of explicit control flow models to augment their comprehension and handling of IR-related tasks.
摘要:中間表徵 (IR) 在編譯器設計和程式分析中至關重要,但大型語言模型 (LLM) 對其理解仍未得到充分探討。本文提出了一項開創性的實證研究,以探討 LLM(包括 GPT-4、GPT-3、Gemma 2、LLaMA 3.1 和 Code Llama)理解 IR 的能力。我們分析了它們在四項任務中的表現:控制流程圖 (CFG) 重建、反編譯、程式碼摘要和執行推理。我們的結果表明,儘管 LLM 在解析 IR 語法和識別高階結構方面表現出能力,但它們在控制流程推理、執行語義和迴圈處理方面存在困難。具體而言,它們經常誤解分支指令、省略關鍵 IR 操作,並依賴於基於啟發式的推理,導致 CFG 重建、IR 反編譯和執行推理出現錯誤。這項研究強調了 LLM 中對 IR 特定的增強的必要性,建議對結構化的 IR 資料集進行微調,並整合明確的控制流程模型,以增強其對 IR 相關任務的理解和處理。
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
2502.05252v1 by Yang Zhou, Hongyi Liu, Zhuoming Chen, Yuandong Tian, Beidi Chen
Long-context large language models (LLMs) have recently shown strong performance in information retrieval and long-document QA. However, to tackle the most challenging intellectual problems, LLMs must reason effectively in long and complex contexts (e.g., frontier mathematical research). Studying how LLMs handle increasing reasoning complexity and context length is essential, yet existing benchmarks lack a solid basis for quantitative evaluation. Inspired by the abstraction of GSM-8K problems as computational graphs, and the ability to introduce noise by adding unnecessary nodes and edges, we develop a grade school math problem generator capable of producing arithmetic problems with infinite difficulty and context length under fine-grained control. Using our newly synthesized GSM-Infinite benchmark, we comprehensively evaluate existing LLMs. We find a consistent sigmoid decline in reasoning performance as complexity increases, along with a systematic inference scaling trend: exponentially increasing inference computation yields only linear performance gains. These findings underscore the fundamental limitations of current long-context LLMs and the key challenges in scaling reasoning capabilities. Our GSM-Infinite benchmark provides a scalable and controllable testbed for systematically studying and advancing LLM reasoning in long and complex contexts.
摘要:長文本大型語言模型 (LLM) 最近在資訊檢索和長文件問答中展示了強大的效能。然而,若要解決最具挑戰性的智力問題,LLM 必須在長且複雜的脈絡中有效推理(例如,前沿數學研究)。研究 LLM 如何處理增加的推理複雜性和脈絡長度至關重要,但現有的基準缺乏定量評估的穩固基礎。受到 GSM-8K 問題抽象化為計算圖形的啟發,以及透過加入不必要的節點和邊緣來引入雜訊的能力,我們開發了一個小學數學問題產生器,能夠在細緻的控制下產生具有無限難度和脈絡長度的算術問題。使用我們新合成的 GSM-Infinite 基準,我們全面評估現有的 LLM。我們發現推理效能會隨著複雜性的增加而持續呈 S 形下降,並伴隨著系統性的推論縮放趨勢:指數增加的推論計算僅產生線性的效能增益。這些發現強調了當前長脈絡 LLM 的基本限制,以及擴展推理能力的主要挑戰。我們的 GSM-Infinite 基準提供了一個可擴充且可控的測試平台,用於系統性地研究和提升 LLM 在長且複雜脈絡中的推理能力。
Causality can systematically address the monsters under the bench(marks)
2502.05085v1 by Felix Leeb, Zhijing Jin, Bernhard Schölkopf
Effective and reliable evaluation is essential for advancing empirical machine learning. However, the increasing accessibility of generalist models and the progress towards ever more complex, high-level tasks make systematic evaluation more challenging. Benchmarks are plagued by various biases, artifacts, or leakage, while models may behave unreliably due to poorly explored failure modes. Haphazard treatments and inconsistent formulations of such "monsters" can contribute to a duplication of efforts, a lack of trust in results, and unsupported inferences. In this position paper, we argue causality offers an ideal framework to systematically address these challenges. By making causal assumptions in an approach explicit, we can faithfully model phenomena, formulate testable hypotheses with explanatory power, and leverage principled tools for analysis. To make causal model design more accessible, we identify several useful Common Abstract Topologies (CATs) in causal graphs which help gain insight into the reasoning abilities in large language models. Through a series of case studies, we demonstrate how the precise yet pragmatic language of causality clarifies the strengths and limitations of a method and inspires new approaches for systematic progress.
摘要:有效的、可靠的評估對於推進經驗機器學習至關重要。然而,一般化模型的可及性日益提高,以及朝著更複雜、更高級別任務的進展,使得系統評估更具挑戰性。基準測試受到各種偏差、人工製品或洩漏的困擾,而模型由於探索不充分的故障模式而可能表現得不可靠。隨意處理和不一致的表述等「怪物」可能會導致重複工作、對結果缺乏信任以及不支援的推論。在本文中,我們論證因果關係提供了一個系統性解決這些挑戰的理想框架。通過在方法中明確因果假設,我們可以忠實地模擬現象,制定具有解釋力的可測試假設,並利用原則性的分析工具。為了使因果模型設計更易於使用,我們在因果圖中識別出幾個有用的通用抽象拓撲 (CAT),有助於深入了解大型語言模型中的推理能力。通過一系列案例研究,我們展示了因果關係的精確但務實的語言如何釐清方法的優缺點,並激發系統進展的新方法。
Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures
2502.05078v1 by Tushar Pandey, Ara Ghukasyan, Oktay Goktas, Santosh Kumar Radha
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities, yet their performance is highly dependent on the prompting strategy and model scale. While reinforcement learning and fine-tuning have been deployed to boost reasoning, these approaches incur substantial computational and data overhead. In this work, we introduce Adaptive Graph of Thoughts (AGoT), a dynamic, graph-based inference framework that enhances LLM reasoning solely at test time. Rather than relying on fixed-step methods like Chain of Thought (CoT) or Tree of Thoughts (ToT), AGoT recursively decomposes complex queries into structured subproblems, forming an dynamic directed acyclic graph (DAG) of interdependent reasoning steps. By selectively expanding only those subproblems that require further analysis, AGoT unifies the strengths of chain, tree, and graph paradigms into a cohesive framework that allocates computation where it is most needed. We validate our approach on diverse benchmarks spanning multi-hop retrieval, scientific reasoning, and mathematical problem-solving, achieving up to 46.2% improvement on scientific reasoning tasks (GPQA) - comparable to gains achieved through computationally intensive reinforcement learning approaches and outperforming state-of-the-art iterative approaches. These results suggest that dynamic decomposition and structured recursion offer a scalable, cost-effective alternative to post-training modifications, paving the way for more robust, general-purpose reasoning in LLMs.
摘要:大型語言模型 (LLM) 已展現令人印象深刻的推理能力,但其效能高度依賴於提示策略和模型規模。雖然強化學習和微調已被用於提升推理,但這些方法會造成大量的運算和資料開銷。在這項工作中,我們引入了「適應性思考圖」(AGoT),一個動態的、基於圖形的推論架構,它僅在測試時就能增強 LLM 推理。AGoT 並非依賴於鏈式思考 (CoT) 或樹狀思考 (ToT) 等固定步驟方法,而是遞迴地將複雜的查詢分解成結構化的子問題,形成一個由相互依賴的推理步驟所組成的動態有向無環圖 (DAG)。透過選擇性地僅擴充那些需要進一步分析的子問題,AGoT 將鏈式、樹狀和圖形範例的優勢統一到一個緊密的架構中,將運算分配到最需要的地方。我們在跨越多重跳躍檢索、科學推理和數學問題解決等多樣基準上驗證了我們的做法,在科學推理任務 (GPQA) 上達到了高達 46.2% 的改進,這與透過運算密集的強化學習方法所獲得的增益相當,並且優於最先進的迭代方法。這些結果表明,動態分解和結構化遞迴提供了一個可擴充、具成本效益的替代方案,用於訓練後修改,為 LLM 中更強健、更通用的推理鋪平了道路。
Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics
2502.05239v1 by Hussam Ghanem, Christophe Cruz
Recent advancements in large language models have demonstrated significant potential in the automated construction of knowledge graphs from unstructured text. This paper builds upon our previous work [16], which evaluated various models using metrics like precision, recall, F1 score, triple matching, and graph matching, and introduces a refined approach to address the critical issues of hallucination and omission. We propose an enhanced evaluation framework incorporating BERTScore for graph similarity, setting a practical threshold of 95% for graph matching. Our experiments focus on the Mistral model, comparing its original and fine-tuned versions in zero-shot and few-shot settings. We further extend our experiments using examples from the KELM-sub training dataset, illustrating that the fine-tuned model significantly improves knowledge graph construction accuracy while reducing the exact hallucination and omission. However, our findings also reveal that the fine-tuned models perform worse in generalization tasks on the KELM-sub dataset. This study underscores the importance of comprehensive evaluation metrics in advancing the state-of-the-art in knowledge graph construction from textual data.
摘要:大型語言模型的最新進展已證明在從非結構化文字自動建構知識圖譜方面具有顯著的潛力。本文建立在我們先前的研究 [16] 之上,該研究使用準確度、召回率、F1 分數、三元組匹配和圖形匹配等指標評估各種模型,並引入了一種改進的方法來解決幻覺和遺漏的關鍵問題。我們提出一個增強的評估框架,結合 BERTScore 來進行圖形相似性,並將圖形匹配的實際閾值設定為 95%。我們的實驗重點在 Mistral 模型上,比較其原始版本和微調版本在零次學習和少量學習的設定中。我們進一步使用 KELM-sub 訓練資料集中的範例來擴展我們的實驗,說明微調後的模型顯著提高了知識圖譜建構的準確度,同時減少了精確的幻覺和遺漏。然而,我們的研究結果也顯示,微調後的模型在 KELM-sub 資料集上的泛化任務表現較差。這項研究強調了全面評估指標在推進從文字資料建構知識圖譜的最新技術方面的重要性。
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
2502.04644v1 by Junde Wu, Jiayuan Zhu, Yuyuan Liu
We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Unlike conventional LLM-based reasoning approaches, which rely solely on internal inference, Agentic Reasoning dynamically engages web search, code execution, and structured reasoning-context memory to solve complex problems requiring deep research and multi-step logical deduction. Our framework introduces the Mind Map agent, which constructs a structured knowledge graph to track logical relationships, improving deductive reasoning. Additionally, the integration of web-search and coding agents enables real-time retrieval and computational analysis, enhancing reasoning accuracy and decision-making. Evaluations on PhD-level scientific reasoning (GPQA) and domain-specific deep research tasks demonstrate that our approach significantly outperforms existing models, including leading retrieval-augmented generation (RAG) systems and closed-source LLMs. Moreover, our results indicate that agentic reasoning improves expert-level knowledge synthesis, test-time scalability, and structured problem-solving. The code is at: https://github.com/theworldofagents/Agentic-Reasoning.
摘要:我們引入了代理推理,一個透過整合外部工具使用代理來增強大型語言模型 (LLM) 推理的框架。與僅依賴於內部推論的傳統基於 LLM 的推理方法不同,代理推理動態地運用網路搜尋、程式碼執行和結構化推理情境記憶來解決需要深入研究和多步驟邏輯推論的複雜問題。我們的框架引入了心智圖代理,它建立一個結構化的知識圖譜來追蹤邏輯關係,改善演繹推理。此外,整合網路搜尋和編碼代理能進行即時擷取和運算分析,增強推理準確度和決策制定。在博士等級科學推理 (GPQA) 和特定領域的深入研究任務上的評估顯示,我們的做法明顯優於現有模型,包括領先的檢索增強生成 (RAG) 系統和封閉原始碼 LLM。此外,我們的結果顯示,代理推理改進了專家級知識綜合、測試時間可擴充性和結構化問題解決。程式碼在:https://github.com/theworldofagents/Agentic-Reasoning。
Position-aware Automatic Circuit Discovery
2502.04577v1 by Tal Haklay, Hadas Orgad, David Bau, Aaron Mueller, Yonatan Belinkov
A widely used strategy to discover and understand language model mechanisms is circuit analysis. A circuit is a minimal subgraph of a model's computation graph that executes a specific task. We identify a gap in existing circuit discovery methods: they assume circuits are position-invariant, treating model components as equally relevant across input positions. This limits their ability to capture cross-positional interactions or mechanisms that vary across positions. To address this gap, we propose two improvements to incorporate positionality into circuits, even on tasks containing variable-length examples. First, we extend edge attribution patching, a gradient-based method for circuit discovery, to differentiate between token positions. Second, we introduce the concept of a dataset schema, which defines token spans with similar semantics across examples, enabling position-aware circuit discovery in datasets with variable length examples. We additionally develop an automated pipeline for schema generation and application using large language models. Our approach enables fully automated discovery of position-sensitive circuits, yielding better trade-offs between circuit size and faithfulness compared to prior work.
摘要:廣泛用於發現和了解語言模型機制的策略是電路分析。電路是模型計算圖的最小子圖,可執行特定任務。我們找出電路發現方法中的一個缺口:它們假設電路與位置無關,將模型組件視為在輸入位置中同樣相關。這限制了它們捕捉跨位置互動或在不同位置中變化的機制的能力。為了解決這個缺口,我們提出兩項改進,將位置性納入電路中,即使在包含變長範例的任務中也是如此。首先,我們擴充邊緣屬性修補,一種基於梯度的電路發現方法,以區分符號位置。其次,我們引入了資料集架構的概念,它定義了在範例中具有類似語義的符號跨距,使我們可以在具有變長範例的資料集中進行與位置相關的電路發現。此外,我們開發了一個自動化管線,用於使用大型語言模型進行架構生成和應用。我們的做法能讓位置敏感電路的發現完全自動化,與先前的研究相比,在電路大小和忠實度之間產生了更好的權衡。
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
2502.04510v1 by Shangbin Feng, Zifeng Wang, Palash Goyal, Yike Wang, Weijia Shi, Huang Xia, Hamid Palangi, Luke Zettlemoyer, Yulia Tsvetkov, Chen-Yu Lee, Tomas Pfister
We propose Heterogeneous Swarms, an algorithm to design multi-LLM systems by jointly optimizing model roles and weights. We represent multi-LLM systems as directed acyclic graphs (DAGs) of LLMs with topological message passing for collaborative generation. Given a pool of LLM experts and a utility function, Heterogeneous Swarms employs two iterative steps: role-step and weight-step. For role-step, we interpret model roles as learning a DAG that specifies the flow of inputs and outputs between LLMs. Starting from a swarm of random continuous adjacency matrices, we decode them into discrete DAGs, call the LLMs in topological order, evaluate on the utility function (e.g. accuracy on a task), and optimize the adjacency matrices with particle swarm optimization based on the utility score. For weight-step, we assess the contribution of individual LLMs in the multi-LLM systems and optimize model weights with swarm intelligence. We propose JFK-score to quantify the individual contribution of each LLM in the best-found DAG of the role-step, then optimize model weights with particle swarm optimization based on the JFK-score. Experiments demonstrate that Heterogeneous Swarms outperforms 15 role- and/or weight-based baselines by 18.5% on average across 12 tasks. Further analysis reveals that Heterogeneous Swarms discovers multi-LLM systems with heterogeneous model roles and substantial collaborative gains, and benefits from the diversity of language models.
摘要:
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot
2502.04413v1 by Xuejiao Zhao, Siyan Liu, Su-Yin Yang, Chunyan Miao
Retrieval-augmented generation (RAG) is a well-suited technique for retrieving privacy-sensitive Electronic Health Records (EHR). It can serve as a key module of the healthcare copilot, helping reduce misdiagnosis for healthcare practitioners and patients. However, the diagnostic accuracy and specificity of existing heuristic-based RAG models used in the medical domain are inadequate, particularly for diseases with similar manifestations. This paper proposes MedRAG, a RAG model enhanced by knowledge graph (KG)-elicited reasoning for the medical domain that retrieves diagnosis and treatment recommendations based on manifestations. MedRAG systematically constructs a comprehensive four-tier hierarchical diagnostic KG encompassing critical diagnostic differences of various diseases. These differences are dynamically integrated with similar EHRs retrieved from an EHR database, and reasoned within a large language model. This process enables more accurate and specific decision support, while also proactively providing follow-up questions to enhance personalized medical decision-making. MedRAG is evaluated on both a public dataset DDXPlus and a private chronic pain diagnostic dataset (CPDD) collected from Tan Tock Seng Hospital, and its performance is compared against various existing RAG methods. Experimental results show that, leveraging the information integration and relational abilities of the KG, our MedRAG provides more specific diagnostic insights and outperforms state-of-the-art models in reducing misdiagnosis rates. Our code will be available at https://github.com/SNOWTEAM2023/MedRAG
摘要:檢索增強生成 (RAG) 是一種適用於檢索隱私敏感的電子健康記錄 (EHR) 的技術。它可以作為醫療保健副駕駛的一個關鍵模組,協助減少醫療保健從業人員和患者的誤診。然而,在醫療領域中使用的現有基於啟發法的 RAG 模型的診斷準確性和特異性不足,特別是對於具有類似表現的疾病。本文提出 MedRAG,一種由知識圖譜 (KG) 引發的推理增強的 RAG 模型,用於醫療領域,它根據表現檢索診斷和治療建議。MedRAG 系統性地構建了一個全面的四層階層式診斷 KG,涵蓋各種疾病的關鍵診斷差異。這些差異與從 EHR 資料庫中檢索到的類似 EHR 動態整合,並在大型語言模型中進行推理。這個過程可以實現更準確和具體的決策支援,同時主動提供後續問題,以增強個人化醫療決策制定。MedRAG 在公共資料集 DDXPlus 和從陳篤生醫院收集的私人慢性疼痛診斷資料集 (CPDD) 上進行評估,並將其效能與各種現有 RAG 方法進行比較。實驗結果顯示,利用 KG 的資訊整合和關係能力,我們的 MedRAG 提供了更具體的診斷見解,並在降低誤診率方面優於最先進的模型。我們的程式碼將在 https://github.com/SNOWTEAM2023/MedRAG 提供
Ontology-Guided, Hybrid Prompt Learning for Generalization in Knowledge Graph Question Answering
2502.03992v1 by Longquan Jiang, Junbo Huang, Cedric Möller, Ricardo Usbeck
Most existing Knowledge Graph Question Answering (KGQA) approaches are designed for a specific KG, such as Wikidata, DBpedia or Freebase. Due to the heterogeneity of the underlying graph schema, topology and assertions, most KGQA systems cannot be transferred to unseen Knowledge Graphs (KGs) without resource-intensive training data. We present OntoSCPrompt, a novel Large Language Model (LLM)-based KGQA approach with a two-stage architecture that separates semantic parsing from KG-dependent interactions. OntoSCPrompt first generates a SPARQL query structure (including SPARQL keywords such as SELECT, ASK, WHERE and placeholders for missing tokens) and then fills them with KG-specific information. To enhance the understanding of the underlying KG, we present an ontology-guided, hybrid prompt learning strategy that integrates KG ontology into the learning process of hybrid prompts (e.g., discrete and continuous vectors). We also present several task-specific decoding strategies to ensure the correctness and executability of generated SPARQL queries in both stages. Experimental results demonstrate that OntoSCPrompt performs as well as SOTA approaches without retraining on a number of KGQA datasets such as CWQ, WebQSP and LC-QuAD 1.0 in a resource-efficient manner and can generalize well to unseen domain-specific KGs like DBLP-QuAD and CoyPu KG Code: \href{https://github.com/LongquanJiang/OntoSCPrompt}{https://github.com/LongquanJiang/OntoSCPrompt}
摘要:現有的知識圖譜問答(KGQA)方法大多是為特定 KG 而設計的,例如 Wikidata、DBpedia 或 Freebase。由於底層圖形模式、拓撲和斷言的異質性,大多數 KGQA 系統無法在沒有資源密集型訓練資料的情況下轉移到未見過的知識圖譜(KG)。我們提出 OntoSCPrompt,這是一種基於大型語言模型(LLM)的新型 KGQA 方法,採用兩階段架構,將語義解析與依賴 KG 的互動分開。OntoSCPrompt 首先生成 SPARQL 查詢結構(包括 SPARQL 關鍵字,例如 SELECT、ASK、WHERE 和缺失令牌的佔位符),然後用 KG 特定的資訊填寫它們。為了增強對底層 KG 的理解,我們提出了一種由本体指導的混合提示學習策略,將 KG 本体整合到混合提示(例如,離散和連續向量)的學習過程中。我們還提出了多種特定任務的解碼策略,以確保在兩個階段中生成的 SPARQL 查詢的正確性和可執行性。實驗結果表明,OntoSCPrompt 在 CWQ、WebQSP 和 LC-QuAD 1.0 等多個 KGQA 資料集上執行時,效能與 SOTA 方法一樣好,且資源使用效率高,並且可以很好地概括到未見過的特定領域 KG,例如 DBLP-QuAD 和 CoyPu KG Code: \href{https://github.com/LongquanJiang/OntoSCPrompt}{https://github.com/LongquanJiang/OntoSCPrompt}
Multimodal Medical Code Tokenizer
2502.04397v2 by Xiaorui Su, Shvat Messica, Yepeng Huang, Ruth Johnson, Lukas Fesser, Shanghua Gao, Faryad Sahneh, Marinka Zitnik
Foundation models trained on patient electronic health records (EHRs) require tokenizing medical data into sequences of discrete vocabulary items. Existing tokenizers treat medical codes from EHRs as isolated textual tokens. However, each medical code is defined by its textual description, its position in ontological hierarchies, and its relationships to other codes, such as disease co-occurrences and drug-treatment associations. Medical vocabularies contain more than 600,000 codes with critical information for clinical reasoning. We introduce MedTok, a multimodal medical code tokenizer that uses the text descriptions and relational context of codes. MedTok processes text using a language model encoder and encodes the relational structure with a graph encoder. It then quantizes both modalities into a unified token space, preserving modality-specific and cross-modality information. We integrate MedTok into five EHR models and evaluate it on operational and clinical tasks across in-patient and out-patient datasets, including outcome prediction, diagnosis classification, drug recommendation, and risk stratification. Swapping standard EHR tokenizers with MedTok improves AUPRC across all EHR models, by 4.10% on MIMIC-III, 4.78% on MIMIC-IV, and 11.30% on EHRShot, with the largest gains in drug recommendation. Beyond EHR modeling, we demonstrate using MedTok tokenizer with medical QA systems. Our results demonstrate the potential of MedTok as a unified tokenizer for medical codes, improving tokenization for medical foundation models.
摘要:
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents
2502.04392v1 by Chenyang Shao, Xinyuan Hu, Yutang Lin, Fengli Xu
The rapid expansion of web content has made on-device AI assistants indispensable for helping users manage the increasing complexity of online tasks. The emergent reasoning ability in large language models offer a promising path for next-generation on-device AI agents. However, deploying full-scale Large Language Models (LLMs) on resource-limited local devices is challenging. In this paper, we propose Division-of-Thoughts (DoT), a collaborative reasoning framework leveraging the synergy between locally deployed Smaller-scale Language Models (SLMs) and cloud-based LLMs. DoT leverages a Task Decomposer to elicit the inherent planning abilities in language models to decompose user queries into smaller sub-tasks, which allows hybrid language models to fully exploit their respective strengths. Besides, DoT employs a Task Scheduler to analyze the pair-wise dependency of sub-tasks and create a dependency graph, facilitating parallel reasoning of sub-tasks and the identification of key steps. To allocate the appropriate model based on the difficulty of sub-tasks, DoT leverages a Plug-and-Play Adapter, which is an additional task head attached to the SLM that does not alter the SLM's parameters. To boost adapter's task allocation capability, we propose a self-reinforced training method that relies solely on task execution feedback. Extensive experiments on various benchmarks demonstrate that our DoT significantly reduces LLM costs while maintaining competitive reasoning accuracy. Specifically, DoT reduces the average reasoning time and API costs by 66.12% and 83.57%, while achieving comparable reasoning accuracy with the best baseline methods.
摘要:
Boosting Knowledge Graph-based Recommendations through Confidence-Aware Augmentation with Large Language Models
2502.03715v1 by Rui Cai, Chao Wang, Qianyi Cai, Dazhong Shen, Hui Xiong
Knowledge Graph-based recommendations have gained significant attention due to their ability to leverage rich semantic relationships. However, constructing and maintaining Knowledge Graphs (KGs) is resource-intensive, and the accuracy of KGs can suffer from noisy, outdated, or irrelevant triplets. Recent advancements in Large Language Models (LLMs) offer a promising way to improve the quality and relevance of KGs for recommendation tasks. Despite this, integrating LLMs into KG-based systems presents challenges, such as efficiently augmenting KGs, addressing hallucinations, and developing effective joint learning methods. In this paper, we propose the Confidence-aware KG-based Recommendation Framework with LLM Augmentation (CKG-LLMA), a novel framework that combines KGs and LLMs for recommendation task. The framework includes: (1) an LLM-based subgraph augmenter for enriching KGs with high-quality information, (2) a confidence-aware message propagation mechanism to filter noisy triplets, and (3) a dual-view contrastive learning method to integrate user-item interactions and KG data. Additionally, we employ a confidence-aware explanation generation process to guide LLMs in producing realistic explanations for recommendations. Finally, extensive experiments demonstrate the effectiveness of CKG-LLMA across multiple public datasets.
摘要:基於知識圖譜的推薦因其利用豐富語義關係的能力而備受關注。然而,構建和維護知識圖譜 (KG) 是一項資源密集型任務,而 KG 的準確性可能會受到雜訊、過時或無關的三元組的影響。大型語言模型 (LLM) 的最新進展為提高 KG 在推薦任務中的品質和相關性提供了一種有前途的方法。儘管如此,將 LLM 整合到基於 KG 的系統中會帶來挑戰,例如有效擴充 KG、處理幻覺,以及開發有效的聯合學習方法。在本文中,我們提出具有 LLM 擴充的信心感知型基於 KG 的推薦框架 (CKG-LLMA),這是一個結合 KG 和 LLM 進行推薦任務的新穎框架。該框架包括:(1) 一個基於 LLM 的子圖擴充器,用於使用高品質資訊豐富 KG,(2) 一個信心感知型訊息傳播機制,用於過濾雜訊三元組,以及 (3) 一個雙視圖對比學習方法,用於整合使用者-項目互動和 KG 資料。此外,我們採用一個信心感知型解釋產生程序,以引導 LLM 為推薦產生逼真的解釋。最後,大量的實驗證明了 CKG-LLMA 在多個公開資料集中的有效性。
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)
2502.03450v1 by Yiye Chen, Harpreet Sawhney, Nicholas Gydé, Yanan Jian, Jack Saunders, Patricio Vela, Ben Lundell
Scene graphs have emerged as a structured and serializable environment representation for grounded spatial reasoning with Large Language Models (LLMs). In this work, we propose SG-RwR, a Schema-Guided Retrieve-while-Reason framework for reasoning and planning with scene graphs. Our approach employs two cooperative, code-writing LLM agents: a (1) Reasoner for task planning and information queries generation, and a (2) Retriever for extracting corresponding graph information following the queries. Two agents collaborate iteratively, enabling sequential reasoning and adaptive attention to graph information. Unlike prior works, both agents are prompted only with the scene graph schema rather than the full graph data, which reduces the hallucination by limiting input tokens, and drives the Reasoner to generate reasoning trace abstractly.Following the trace, the Retriever programmatically query the scene graph data based on the schema understanding, allowing dynamic and global attention on the graph that enhances alignment between reasoning and retrieval. Through experiments in multiple simulation environments, we show that our framework surpasses existing LLM-based approaches in numerical Q\&A and planning tasks, and can benefit from task-level few-shot examples, even in the absence of agent-level demonstrations. Project code will be released.
摘要:場景圖表已成為大型語言模型 (LLM) 以基礎空間推理為基礎的結構化且可序列化的環境表徵。在這項工作中,我們提出 SG-RwR,一個以綱要為導向的檢索與推理框架,用於場景圖表的推理和規劃。我們的做法採用了兩個協作的、編寫程式碼的 LLM 代理:一個 (1) 推論器,用於任務規劃和資訊查詢產生,以及一個 (2) 檢索器,用於根據查詢提取對應的圖形資訊。兩個代理反覆合作,實現對圖形資訊的順序推理和適應性關注。與先前的作品不同,兩個代理僅提示場景圖表綱要,而不是完整的圖形資料,這透過限制輸入代碼減少了幻覺,並驅使推論器抽象地產生推理軌跡。根據軌跡,檢索器根據綱要理解以程式化方式查詢場景圖形資料,允許對圖形進行動態和整體關注,增強推理和檢索之間的一致性。透過在多個模擬環境中的實驗,我們表明我們的框架在數值問答和規劃任務中超越了現有的基於 LLM 的方法,並且可以受益於任務級別的少次範例,即使在沒有代理級別示範的情況下也是如此。專案程式碼將會釋出。
SymAgent: A Neural-Symbolic Self-Learning Agent Framework for Complex Reasoning over Knowledge Graphs
2502.03283v2 by Ben Liu, Jihai Zhang, Fangquan Lin, Cheng Yang, Min Peng, Wotao Yin
Recent advancements have highlighted that Large Language Models (LLMs) are prone to hallucinations when solving complex reasoning problems, leading to erroneous results. To tackle this issue, researchers incorporate Knowledge Graphs (KGs) to improve the reasoning ability of LLMs. However, existing methods face two limitations: 1) they typically assume that all answers to the questions are contained in KGs, neglecting the incompleteness issue of KGs, and 2) they treat the KG as a static repository and overlook the implicit logical reasoning structures inherent in KGs. In this paper, we introduce SymAgent, an innovative neural-symbolic agent framework that achieves collaborative augmentation between KGs and LLMs. We conceptualize KGs as dynamic environments and transform complex reasoning tasks into a multi-step interactive process, enabling KGs to participate deeply in the reasoning process. SymAgent consists of two modules: Agent-Planner and Agent-Executor. The Agent-Planner leverages LLM's inductive reasoning capability to extract symbolic rules from KGs, guiding efficient question decomposition. The Agent-Executor autonomously invokes predefined action tools to integrate information from KGs and external documents, addressing the issues of KG incompleteness. Furthermore, we design a self-learning framework comprising online exploration and offline iterative policy updating phases, enabling the agent to automatically synthesize reasoning trajectories and improve performance. Experimental results demonstrate that SymAgent with weak LLM backbones (i.e., 7B series) yields better or comparable performance compared to various strong baselines. Further analysis reveals that our agent can identify missing triples, facilitating automatic KG updates.
摘要:
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
2502.03032v2 by Daniil Laptev, Nikita Balagansky, Yaroslav Aksenov, Daniil Gavrilov
We introduce a new approach to systematically map features discovered by sparse autoencoder across consecutive layers of large language models, extending earlier work that examined inter-layer feature links. By using a data-free cosine similarity technique, we trace how specific features persist, transform, or first appear at each stage. This method yields granular flow graphs of feature evolution, enabling fine-grained interpretability and mechanistic insights into model computations. Crucially, we demonstrate how these cross-layer feature maps facilitate direct steering of model behavior by amplifying or suppressing chosen features, achieving targeted thematic control in text generation. Together, our findings highlight the utility of a causal, cross-layer interpretability framework that not only clarifies how features develop through forward passes but also provides new means for transparent manipulation of large language models.
摘要:我們提出了一種新方法,用於系統性地繪製大型語言模型連續層中稀疏自動編碼器發現的功能,擴展了先前研究層間特徵連結的工作。透過使用無資料餘弦相似性技術,我們追蹤特定特徵在每個階段如何持續、轉換或首次出現。此方法產生了特徵演化的細粒度流程圖,實現了細粒度的可解釋性和對模型運算的機制見解。至關重要的是,我們展示了這些跨層特徵圖如何透過放大或抑制所選特徵來促進模型行為的直接引導,在文字生成中實現目標主題控制。我們的研究結果共同突出了因果、跨層可解釋性框架的效用,不僅闡明了特徵如何透過前向傳遞發展,還提供了新的方法來透明地操作大型語言模型。
A Benchmark for the Detection of Metalinguistic Disagreements between LLMs and Knowledge Graphs
2502.02896v1 by Bradley P. Allen, Paul T. Groth
Evaluating large language models (LLMs) for tasks like fact extraction in support of knowledge graph construction frequently involves computing accuracy metrics using a ground truth benchmark based on a knowledge graph (KG). These evaluations assume that errors represent factual disagreements. However, human discourse frequently features metalinguistic disagreement, where agents differ not on facts but on the meaning of the language used to express them. Given the complexity of natural language processing and generation using LLMs, we ask: do metalinguistic disagreements occur between LLMs and KGs? Based on an investigation using the T-REx knowledge alignment dataset, we hypothesize that metalinguistic disagreement does in fact occur between LLMs and KGs, with potential relevance for the practice of knowledge graph engineering. We propose a benchmark for evaluating the detection of factual and metalinguistic disagreements between LLMs and KGs. An initial proof of concept of such a benchmark is available on Github.
摘要:評估大型語言模型 (LLM) 執行知識圖譜建構支援事實萃取等任務時,通常會使用基於知識圖譜 (KG) 的基準事實計算準確度指標。這些評估假設錯誤代表事實上的分歧。然而,人類話語經常出現元語言分歧,其中代理人之間的差異不在於事實,而在於用於表達事實的語言的含義。鑑於使用 LLM 處理和產生自然語言的複雜性,我們提出疑問:LLM 和 KG 之間是否會發生元語言分歧?根據使用 T-REx 知識比對資料集進行的調查,我們假設元語言分歧確實會發生在 LLM 和 KG 之間,並可能與知識圖譜工程實務有關。我們提出一個基準,用於評估 LLM 和 KG 之間的事實和元語言分歧的偵測。此基準的初步概念驗證可在 Github 上取得。
Mol-LLM: Generalist Molecular LLM with Improved Graph Utilization
2502.02810v1 by Chanhui Lee, Yuheon Song, YongJun Jeong, Hanbum Ko, Rodrigo Hormazabal, Sehui Han, Kyunghoon Bae, Sungbin Lim, Sungwoong Kim
Recent advances in Large Language Models (LLMs) have motivated the development of general LLMs for molecular tasks. While several studies have demonstrated that fine-tuned LLMs can achieve impressive benchmark performances, they are far from genuine generalist molecular LLMs due to a lack of fundamental understanding of molecular structure. Specifically, when given molecular task instructions, LLMs trained with naive next-token prediction training assign similar likelihood scores to both original and negatively corrupted molecules, revealing their lack of molecular structure understanding that is crucial for reliable and general molecular LLMs. To overcome this limitation and obtain a true generalist molecular LLM, we introduce a novel multi-modal training method based on a thorough multi-modal instruction tuning as well as a molecular structure preference optimization between chosen and rejected graphs. On various molecular benchmarks, the proposed generalist molecular LLM, called Mol-LLM, achieves state-of-the-art performances among generalist LLMs on most tasks, at the same time, surpassing or comparable to state-of-the-art specialist LLMs. Moreover, Mol-LLM also shows superior generalization performances in reaction prediction tasks, demonstrating the effect of the molecular structure understanding for generalization perspective.
摘要:大型語言模型 (LLM) 的近期進展激勵了針對分子任務開發通用 LLM。雖然多項研究已證明微調 LLM 可實現令人印象深刻的基準效能,但由於缺乏對分子結構的基本理解,它們遠非真正的通才分子 LLM。具體來說,當給予分子任務說明時,使用天真的下一個符號預測訓練訓練的 LLM 會將類似的可能性評分分配給原始分子和負面損壞分子,這顯示出它們缺乏對分子結構的理解,而這對於可靠且通用的分子 LLM 至關重要。為了克服這個限制並獲得真正的通才分子 LLM,我們引入了一種新穎的多模態訓練方法,該方法基於徹底的多模態說明調整以及在所選和拒絕圖形之間的分子結構偏好最佳化。在各種分子基準測試中,所提出的通才分子 LLM(稱為 Mol-LLM)在多數任務中實現了通才 LLM 中的最新效能,同時超越或與最新的專家 LLM 相當。此外,Mol-LLM 在反應預測任務中也展現出優異的泛化效能,證明了分子結構理解對泛化觀點的影響。
Leveraging the true depth of LLMs
2502.02790v1 by Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret
Large Language Models demonstrate remarkable capabilities at the cost of high compute requirements. While recent research has shown that intermediate layers can be removed or have their order shuffled without impacting performance significantly, these findings have not been employed to reduce the computational cost of inference. We investigate several potential ways to reduce the depth of pre-trained LLMs without significantly affecting performance. Leveraging our insights, we present a novel approach that exploits this decoupling between layers by grouping some of them into pairs that can be evaluated in parallel. This modification of the computational graph -- through better parallelism -- results in an average improvement of around 1.20x on the number of tokens generated per second, without re-training nor fine-tuning, while retaining 95%-99% of the original accuracy. Empirical evaluation demonstrates that this approach significantly improves serving efficiency while maintaining model performance, offering a practical improvement for large-scale LLM deployment.
摘要:大型语言模型展示了其强大的功能,但代价是较高的计算需求。虽然最近的研究表明,中间层可以被移除或重新排列其顺序,而不会显著影响性能,但这些发现尚未被用来降低推理的计算成本。我们研究了几种潜在的方法来减少预训练 LLM 的深度,而不会显著影响性能。利用我们的见解,我们提出了一种新颖的方法,该方法通过将其中一些分组为可以并行评估的成对来利用层之间的这种解耦。 通过更好的并行性对计算图进行修改,平均而言,每秒生成的令牌数量提高了约 1.20 倍,而无需重新训练或微调,同时保留了 95%-99% 的原始准确性。经验评估表明,这种方法显著提高了服务效率,同时保持了模型性能,为大规模 LLM 部署提供了实际改进。
Modular Training of Neural Networks aids Interpretability
2502.02470v2 by Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots
An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a "clusterability loss" function that encourages the formation of non-interacting clusters. Using automated interpretability techniques, we show that our method can help train models that are more modular and learn different, disjoint, and smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and language models. Our approach provides a promising direction for training neural networks that learn simpler functions and are easier to interpret.
摘要:一種改善神經網路可解釋性的方法是透過群集性, 也就是將模型分割成可獨立研究的不相交群集。我們定義一個群集性的度量,並顯示預訓練的 模型透過光譜圖形群集形成高度糾纏的群集。因此,我們使用「群集性損失」函數訓練模型,使其更具模組化, 這鼓勵形成非交互群集。使用自動化可解釋性技術,我們顯示我們的模型可以幫助訓練更具模組化的模型,並學習不同、不相交且較小的電路。我們 研究了在 MNIST 和 CIFAR 上訓練的 CNN,在模組化加法上訓練的小型Transformer,以及語言模型。我們的做法為訓練學習更簡單函數且更容易解釋的神經網路提供了有希望的方向。