📊 ArXiv 研究报告 (2026-05-08)

生成时间: 2026-05-08 08:50:13 数据源: ArXiv

📌 配置信息

关键词列表（共 27 个，总权重 27.0）

关键词	权重	类型
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	主要
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	主要
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	主要
“Scaling Laws” AND “Data Quality”	1.0	主要
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	主要
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	主要
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	主要
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	主要
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	主要
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	主要
“Context Window Extension” OR “Long Context LLMs”	1.0	主要
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	主要
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	主要
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	主要
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	主要
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	主要
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	主要
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	主要
“Multi-agent Systems” OR “Agent Coordination”	1.0	主要
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	主要
“Speculative Decoding” OR “Inference Acceleration”	1.0	主要
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	主要
“Mechanistic Interpretability” OR “Explainable AI”	1.0	主要
“World Models” AND “General World Models”	1.0	主要
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	主要
“In-context Learning” OR “Many-shot Learning”	1.0	主要
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	主要

评分设置

每个关键词最大分: 15
及格分公式: 5.0 + 0.8 × 总权重
当前及格分: 26.6

📈 论文统计

总抓取: 306 篇
及格论文: 0 篇 (0.0%)

📋 所有论文列表

1. ❌ Taming Outlier Tokens in Diffusion Transformers

作者: Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu, Liang-Chieh Chen, Zhe Gan, Chen Wei 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05206v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier representations, and DiTs themselves can develop internal outlier tokens, especially in intermediate layers. Moreover, simply masking high-norm tokens does not improve performance, indicating that the problem is not only caused by a few extreme values, but is more closely related to corrupted local patch semantics. To address this issue, we introduce Dual-Stage Registers (DSR), a register-based intervention for both components: trained registers when available, recursive test-time registers otherwise, and diffusion registers for the denoiser. Across ImageNet and large-scale text-to-image generation, these interventions consistently reduce outlier artifacts and improve generation quality. Our results highlight outlier-token control as an important ingredient in building stronger DiTs.

2. ❌ Almost-Orthogonality in Lp Spaces: A Case Study with Grok

作者: Ziang Chen, Jaume de Dios Pont, Paata Ivanisvili, Jose Madrid, Haozhu Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05192v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)j\subset L^p$ we have [ \Big|\sum_j f_j\Big|p \ \le\ \left(\sup{j} \sum{k} α_{jk}^{,c}\right)^{1/p’} \Big(\sum_j |f_j|p^p\Big)^{1/p}, ] where $c=2$, $1/p+1/p’=1$, and $α{jk}=\sqrt{\frac{|f_{j}f_{k}|{p/2}}{|f{j}|{p}|f{k}|{p}}}$. In the first part of this paper we construct a counterexample showing that this inequality fails for every $p>2$. We then prove that if an estimate of the above form holds, the exponent must satisfy $c\le p’$. Finally, at the critical exponent $c=p’$, we establish the inequality for all integer values $p\ge 2$. In the second part of the paper we obtain a sharp three-function bound [ \Big|\sum{j=1}^{3} f_j\Big|p \ \le\ \left(1+2Γ^{c(p)}\right)^{1/p’} \Big(\sum{j=1}^{3} |f_j|_p^p\Big)^{1/p}, ] where $p \geq 3$, $c(p) = \frac{2\ln(2)}{(p-2)\ln(3)+2\ln(2)}$ and $Γ=Γ(f_1,f_2,f_3)\in[0,1]$ quantifies the degree of orthogonality among $f_1,f_2,f_3$. The exponent $c(p)$ is optimal, and improves upon the power $r(p) = \frac{6}{5p-4}$ obtained previously by Carlen, Frank, and Lieb. Some intermediate lemmas and inequalities appearing in this work were explored with the assistance of the large language model Grok.

3. ❌ Grokability in five inequalities

作者: Paata Ivanisvili, Xinyuan Xie 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05193v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharper $L_2$-$L_1$ moment comparison inequalities on the Hamming cube ${-1,1}^n$, a strengthened autoconvolution inequality, improved asymptotic bounds on the size of the largest $g$-Sidon sets in ${1,\dots,n}$, and an optimal balanced Szarek’s inequality.

4. ❌ LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

作者: Yijun Lu, Rui Ye, Yuwen Du, Jiajun Wang, Songhua Liu, Siheng Chen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05191v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context management should be adaptive: parts of the agent’s trajectory are maintained at different levels of detail depending on their current relevance to the task. To operationalize this principle, we introduce Context-ReAct, a general agentic paradigm for elastic context orchestration that integrates reasoning, context management, and tool use in a unified loop. Context-ReAct provides five atomic operations: Skip, Compress, Rollback, Snippet and Delete, which allow the agent to dynamically reshape its working context, preserving important evidence, summarizing resolved information, discarding unhelpful branches, and controlling context size. We prove that the Compress operator is expressively complete, while the other specialized operators provide efficiency and fidelity guarantees that reduce generation cost and hallucination risk. Building on this paradigm, we develop LongSeeker, a long-horizon search agent fine-tuned from Qwen3-30B-A3B on 10k synthesized trajectories. Across four representative search benchmarks, LongSeeker achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH, substantially outperforming Tongyi DeepResearch (43.2% and 46.7%) and AgentFold (36.2% and 47.3%). These results highlight the potential of adaptive context management, showing that agents can achieve more reliable and efficient long-horizon reasoning by actively shaping their working memory.

5. ❌ When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

作者: Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex, Robin Walters, Karl Schmeckpeper, Thomas Weng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05172v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/

6. ❌ Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

作者: The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05170v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced “Design Conductor” (or just “Conductor”), a system capable of building a 5-stage Linux-capable RISC-V CPU in 12 hours. In this work, we introduce an updated multi-agent harness powered by frontier models released in April 2026, which is able to handle 80x larger tasks, at higher quality, fully autonomously. Following a brief introduction, we examine 4 designs that the system produced autonomously, including “VerTQ”, an LLM inference accelerator which hard-wires support for TurboQuant in a 240-cycle pipeline, starting from the TurboQuant arXiv paper. VerTQ includes heavy compute processing, with 5129 FP16/32 units; the design was mapped to an FPGA at 125 MHz and consumes 5.7 mm^2 in TSMC 16FF (8 attention pipes). We review the key new characteristics that enabled these results. Finally, we analyze Design Conductor’s token usage and other empirical characteristics, including its limitations.

7. ❌ The First Token Knows: Single-Decode Confidence for Hallucination Detection

作者: Mina Gabriel 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05166v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model’s initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.

8. ❌ Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

作者: Enhui Chai, Sicheng Chen, Tianyi Zhang, Chad Wong, Kecheng Huang, Zeyu Liu, Fei Xia 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05164v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch representations in homogeneous Euclidean spaces, overlooking the hierarchical organization and regional heterogeneity of pathological tissues. This limits current models’ ability to capture global tissue architecture and fine-grained cellular morphology. To address this limitation, we introduce a hybrid hyperbolic-Euclidean representation that embeds WSI features in dual geometric spaces, enabling complementary modeling of hierarchical tissue structures and local morphological details. Building on this formulation, we develop BatMIL, a WSI classification framework that leverages both geometric spaces. To model long-range dependencies among thousands of patches, we employ a structured state space sequence model (S4) backbone that encodes patch sequences with linear computational complexity. Furthermore, to account for regional heterogeneity, we introduce a chunk-level mixture-of-experts (MoE) module that groups patches into regions and dynamically routes them to specialized subnetworks, improving representational capacity while reducing redundant computation. Extensive experiments on seven WSI datasets spanning six cancer types demonstrate that BatMIL consistently outperforms state-of-the-art MIL approaches in slide-level classification tasks. These results indicate that geometry-aware representation learning offers a promising direction for next-generation computational pathology.

9. ❌ PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation

作者: Srikar Kashyap Pulipaka 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05159v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini, with a multi-stage quality filtering pipeline including embedding-based deduplication. We find that per-language threshold tuning on the development set yields 2 to 4% F1 improvements without retraining. We also use weighted ensembles of 12B and 27B model predictions with per-language strategy selection. Our final system achieves a mean macro-F1 of 0.811 across all 22 languages, ranking 2nd overall of the participating teams, with 1st place finishes in 3 languages and top-3 in 8 languages. We also find that alternative architectures (XLM-RoBERTa, Qwen3) that showed strong development set performance suffered 30 to 50% F1 drops on the test set, highlighting the importance of generalization.

10. ❌ Aes3D: Aesthetic Assessment in 3D Gaussian Splatting

作者: Chuanzhi Xu, Boyu Wei, Haoxian Zhou, Xuanhua Yin, Zihan Deng, Haodong Chen, Qiang Qu, Weidong Cai 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05155v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence of general 3DGS datasets with aesthetic annotations, and (2) the intrinsic nature of 3DGS as a low-level primitive representation, which makes it difficult to capture high-level aesthetic features. To address these challenges, we propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-level aesthetic cues and accurately regresses aesthetic scores. Experimental results demonstrate that our approach achieves strong performance while maintaining a lightweight design, establishing a new benchmark for 3D scene aesthetic assessment. Code and datasets will be made available in a future version.

11. ❌ Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting

作者: Alper Yıldırım 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05151v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models

12. ❌ What Matters in Practical Learned Image Compression

作者: Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05148v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed. In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime – including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics. We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms – faster than most top ML-based codecs run on a V100 GPU.

13. ❌ Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation

作者: Olivia Jullian Parra, Sara Zoccheddu, David Catalan Cerezo, Tom Forzy, Franziska Ulrich, William Sutcliffe, Jakob Martin Burgstaller, Oliver Senn, Patrick Owen, Nicola Serra 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05125v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, limiting their robustness in electronic health records (EHRs), where time-varying confounding and missing-not-at-random (MNAR) biomarkers can reach 50%–80%. We propose a two-stage pipeline for treatment effect estimation from incomplete longitudinal EHRs. First, CausalFlow-T, a directed acyclic graph (DAG)-constrained normalizing flow with long short-term memory (LSTM)-encoded patient history, performs exact invertible counterfactual inference, avoiding approximation errors from variational inference and separating confounding through explicit causal structure. Ablations on four synthetic and one semi-synthetic benchmark with known counterfactuals show that DAG constraints and exact inference address distinct failure modes: neither compensates for the other. Second, because CausalFlow-T requires completed inputs, we introduce an LLM-driven evolutionary imputer that proposes executable imputation operators rather than individual entries, and evaluate it with three large language model (LLM) backends, including two open-source models. Across 30%–80% MNAR missingness, this imputer achieves the best pooled rank over biomarker and causal metrics, leading in point-wise accuracy and temporal extrapolation while preserving average treatment effect (ATE) recovery as statistical baselines degrade. On Swiss primary-care EHRs from adults with type 2 diabetes initiating a GLP-1 receptor agonist or SGLT-2 inhibitor, the pipeline estimates a per-protocol weight-loss difference of -0.98 kg [95% CI -1.01, -0.96] favoring GLP-1 receptor agonists, consistent with randomized evidence and obtained from realistically incomplete real-world EHRs.

14. ❌ Executable World Models for ARC-AGI-3 in the Era of Coding Agents

作者: Sergey Rodionov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05138v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.

15. ❌ Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning

作者: Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang, Miroslav Pajic, Yuichi Motai 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05123v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second–and more importantly–it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.

16. ❌ On the Wasserstein Gradient Flow Interpretation of Drifting Models

作者: Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05118v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.

17. ❌ LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts

作者: Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha, Gabriel Nelson 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05110v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.

18. ❌ Building informative materials datasets beyond targeted objectives

作者: Rafael Espinosa Castañeda, Ashley Dale, Hongchen Wang, Yonatan Kurniawan, Hao Wan, Runze Zhang, Adji Bousso Dieng, Kangming Li, Jason Hattrick-Simpers 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05104v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest while preserving performance on untargeted ones. Our approach uses diversity-aware selection to ensure broad coverage of the materials space. In noisy experimental dataset construction, we find that without our diversity-aware framework, prediction performance on untargeted properties can degrade by up to 40% relative to random sampling, whereas applying our framework yields improvements of up to 10% . For targeted properties, performance can degrade with respect to random sampling by up to 12.5% without diversity, while our framework achieves gains of up to 25%. Incorporating diversity into dataset construction not only preserves informativeness for the targeted properties, but also improves materials coverage for potential future objectives. As a result, the constructed datasets remain broadly informative across considered and unconsidered outcomes, ensuring unbiased quality entries and mitigating cold-start limitations in subsequent modeling and discovery campaigns.

19. ❌ Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics

作者: Andreas Pattichis, Constantine Dovrolis 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05097v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen what repetition confirms, and let the rest fade. We argue that external memory should follow a similar principle. In Memini, this view takes the form of an associative memory that organizes knowledge as a directed graph. Each edge carries two coupled internal variables, one fast and one slow, following the Benna-Fusi model of synaptic consolidation. From this coupling, episodic sensitivity, gradual consolidation, and selective forgetting emerge as facets of a single mechanism, reframing external memory as a learning substrate that reorganizes through its own dynamics.

20. ❌ Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement

作者: Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05103v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce the Concept Field of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $ζ$, the mean absolute z-distance between the observed delta and the field’s local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direct probabilistic reading. We support the computation with the introduction of a Vector Sequence Database (VSDB) that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal Regulations, and novelty detection over Project Gutenberg. Using controlled LLM-generated rewrites, Concept Fields achieve strong selective classification performance under a grounded / ungrounded / unsure triage policy, which unlike retrieval-centric baselines have similar coverage-risk behavior across both domains, supporting a probability-based interpretation that transfers across domains. We also sketch how divergence and curl of the Concept Field, computed on dense clusters, surface qualitatively meaningful semantic patterns (logic sources, sinks, and implicit topics), which we offer as hypothesis-generating rather than as a quantitative result. Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.

21. ❌ Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

作者: Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li, Haoruo Zhang, Chen Lv 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05092v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.

22. ❌ Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior

作者: Hanbo Xie, Akshay K. Jagadish, Lan Pan, Robert C. Wilson 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05091v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved predictive performance on held-out data. Additionally, we find that the discovered models belong to different structural classes than those discovered from behavior alone for the majority of participants (69.4%), specifically, it shifts from Explicit comparator towards Integrated utility. These results suggest that process-level language data not only improve model fit, but also systematically reshape the structure of the discovered cognitive models, enabling the identification of mechanisms that are not recoverable from behavior alone.

23. ❌ Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models

作者: Quintin Pope, Ajay Hayagreeve Balaji, Jacques Thibodeau, Xiaoli Fern 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05090v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across aligned prompt contexts and produces human-readable, statistically validated natural-language hypotheses describing how the models differ, along with recurring themes that summarize patterns across validated hypotheses. We evaluate the approach in synthetic setting by injecting known behavioral changes and showing that the pipeline reliably recovers them. We then apply it to three real-world interventions, reasoning distillation, knowledge editing and unlearning, demonstrating that the method surfaces both intended and unexpected behavioral shifts, distinguishes large from subtle interventions, and does not hallucinate differences when effects are absent or misaligned with the prompt bank. Overall, the pipeline provides a statistically grounded and interpretable tool for post-hoc auditing of intervention-induced changes in model behavior.

24. ❌ Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity

作者: Avhishek Biswas, Apala Pramanik, Eylem Ekici, Mehmet C. Vuran 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05071v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for real-time double-directional mmWave beam management primed by camera sensing. VIBE fuses machine learning, model-based reasoning, and closed-loop RF feedback to balance beam-pair establishment latency with link quality. VIBE bypasses exhaustive training overhead and accelerates link establishment by leveraging camera observations to reduce the beam-search space. Lightweight beam refinement and offset tracking mechanisms adaptively refine beams in response to dynamic application requirements. VIBE is implemented and evaluated across online indoor/outdoor testbeds, public datasets, and real-time vehicular experiments, demonstrating strong generalization capabilities, making it suitable for real-time V2X communication. Comparisons with 5G NR hierarchical beamforming show that VIBE consistently maintains lower outage rates. Furthermore, VIBE outperforms state-of-the-art end-to-end ML models for beam selection when evaluated on public datasets and achieves outage rates as low as 1.1-1.4 %. The results show that a hybrid model-based, closed-loop learning architecture is better suited for real-world mmWave vehicular connectivity than end-to-end trained ML models. For reproducibility, we publish our code to https://github.com/UNL-CPN-Lab/Look-Once-Beam-Twice.

25. ❌ The Impossibility Triangle of Long-Context Modeling

作者: Yan Zhou 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05066v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ability to recall a number of historical facts proportional to sequence length (Recall). We formalize this trade-off within an Online Sequence Processor abstraction that unifies Transformers, state space models, linear recurrent networks, and their hybrids. Using the Data Processing Inequality and Fano’s Inequality, we prove that any model satisfying Efficiency and Compactness can recall at most O(poly(d)/log V) key-value pairs from a sequence of arbitrary length, where d is the model dimension and V is the vocabulary size. We classify 52 architectures published before March 2026 into the triangle, showing that each achieves at most two of the three properties and that hybrid architectures trace continuous trajectories in the interior. Experiments on synthetic associative recall tasks with five representative architectures validate the theoretical bound: empirical recall capacity lies strictly below the information-theoretic limit, and no architecture escapes the triangle.

26. ❌ SoK: Robustness in Large Language Models against Jailbreak Attacks

作者: Feiyue Xu, Hongsheng Hu, Chaoxiang He, Sheng Hang, Hanqing Hu, Xiuming Liu, Yubo Zhao, Zhengyan Zhou, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05058v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, eroding safety, trust, and regulatory compliance in high-stakes applications. Although a variety of attack and defense methods have been proposed, existing evaluation practices are inadequate, often relying on narrow metrics like attack success rate that fail to capture the multidimensional nature of LLM security. In this paper, we present a systematic taxonomy of jailbreak attacks and defenses and introduce Security Cube, a unified, multi-dimensional framework for comprehensive evaluation of these techniques. We provide detailed comparison tables of existing attacks and defenses, highlighting key insights and open challenges across the literature. Leveraging Security Cube, we conduct benchmark studies on 13 representative attacks and 5 defenses, establishing a clear view of the current landscape encompassing jailbreak attacks, defenses, automated judges, and LLM vulnerabilities. Based on these evaluations, we distill critical findings, identify unresolved problems, and outline promising research directions for enhancing LLM robustness against jailbreak attacks. Our analysis aims to pave the way towards more robust, interpretable, and trustworthy LLM systems. Our code is available at Code.

27. ❌ Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework

作者: Bac Trinh-Nguyen, Sara Berri, Sin G. Teo, Tram Truong-Huu, Arsenia Chorti 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05055v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effort required for dataset collection campaigns on a given infrastructure, the training process for localization models can vary significantly. Furthermore, with respect to feature selection, recent works have demonstrated the robustness of angle-of-arrival (AoA) based localization. In view of these two points, we propose an adaptive framework for AoA-based localization that consists of two alternative learning strategies, each suited either for large or small training datasets. The proposed framework is evaluated on a real, massive multiple input multiple output (mMIMO) orthogonal frequency division multiplexing (OFDM) outdoor channel state information (CSI) dataset. First, we investigate offline learning when large training datasets are available; we propose a hierarchical framework that first distinguishes between line of sight (LoS) and non line of sight (NLoS) regions and then moves to more fine grained localization in the respective region. This approach provides high-performance localization through accumulated batch retraining and an integrated hyperparameter optimization mechanism. Second, when only a small training dataset is available, an online learning framework is proposed, using incremental tree-based and ensemble-based models for handling streaming data and continuously updating mode, as well as an online few-shot learning model for rapidly initializing new classes from a limited labeled support set. These results showcase that highly accurate robust localization can be achieved incrementally during network operation by exploiting online learning, alleviating the need for large dataset collection campaigns.

28. ❌ Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation

作者: Hongxu Chen, Yanghao Wang, Bowei Zhu, Hongxiang Li, Zhen Wang, Ziqi Jiang, Lin Li, Rui Liu, Long Chen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05054v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we identify three overlooked limitations in them: 1) Angular dynamics distortion: The radial-angular coupling induces non-uniform speed on the angular sub-manifold, leading to regression training difficulty and extra truncation errors. 2) Radial dynamics neglect: Feature normalization discards modality confidence, failing to distinguish out-of-distribution and in-distribution data, and abandoning crucial radial dynamics. 3) Context-agnostic unconditional flow: Dataset-specific information loss during pre-trained cross-modal feature extraction remains unrecovered. To resolve these issues, we propose warped product flow matching (WP-FM), a unified Riemannian framework that reformulates alignment on a warped product manifold. Within this framework, we derive direct product flow matching (DP-FM) by introducing a constant-warping metric, which yields a decoupled cylindrical manifold (i.e., direct product manifold). DP-FM enables independent radial evolution and constant-speed angular geodesic transport, effectively eliminating angular dynamics distortion while preserving radial consistency. Meanwhile, we incorporate classifier-free guidance by conditioning the flow on the pre-trained VLMs’ hidden states to inject missing dataset-specific information. Extensive results across 11 benchmarks have demonstrated that DP-FM achieves a new state-of-the-art for multi-step few-shot adaptation.

29. ❌ Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

作者: Sajal Dash, Feiyi Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05049v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across heterogeneous networks, and severe workload imbalance. To characterize these challenges, we develop a mathematical model that quantifies memory, compute, and communication requirements for MoE configurations under various parallelization schemes, verified through micro-benchmarking, code instrumentation, and hardware profiling. Our analysis identifies performance bottlenecks: all-to-all latency at scale from expert parallelism, insufficient compute-communication overlap, low GPU utilization from imbalanced skinny GEMMs, and the absence of platform-aware hybrid parallelization strategies. To address these, we introduce Piper, a framework that leverages resource modeling to identify efficient training strategies for MoE models on target HPC platforms, applying pipeline parallelism with optimized schedules. Piper achieves 2-3.5X higher MFU than state-of-the-art frameworks such as X-MoE, and a novel all-to-all algorithm delivers 1.2-9X bandwidth over vendor implementation.

30. ❌ Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization

作者: Xin Yu, Liuchen Liao, Yiwen Zhang, Yingchen Yu, Lingzhou Xue, Qinzhen Guo 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05040v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and can degrade reasoning performance over time. Moreover, self-distillation from the same model with prompt augmentation lacks the exploratory diversity provided by a genuine external teacher. To address these limitations, we move beyond fixed-teacher KL matching and propose \textbf{P}reference-\textbf{B}ased \textbf{S}elf-\textbf{D}istillation (\textbf{PBSD}), which revisits on-policy self-distillation through a reward-regularized perspective. Instead of directly matching the teacher distribution, we derive a reward-regularized objective whose analytic optimum is a reward-reweighted teacher distribution, yielding a target policy provably superior to the original teacher under this objective. Practically, PBSD optimizes preference gaps between teacher and student samples while maintaining on-policy student sampling. We support this framework with a statistical analysis of the induced preference-learning problem, formally establishing when on policy self-distillation is preferable to learning from an external teacher in our setting. Experiments on mathematical reasoning and tool-use benchmarks across multiple model scales demonstrate that PBSD consistently achieves the strongest average performance among comparable baselines, showing improved training stability over prior self-distillation baselines while preserving token efficiency.

31. ❌ Position: Embodied AI Requires a Privacy-Utility Trade-off

作者: Xiaoliang Fan, Jiarui Chen, Zhuodong Liu, Ziqi Yang, Peixuan Xu, Ruimin Shen, Junhui Liu, Jianzhong Qi, Cheng Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05017v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemic privacy crisis when deployed in sensitive settings, thereby advancing the position that privacy in EAI is a life cycle-level architectural constraint rather than a stage-local feature. To address these challenges, we propose Secure Privacy Integration in Next-generation Embodied AI (SPINE), a unified privacy-aware framework that treats privacy as a dynamic control signal governing cross-stage coupling throughout the entire EAI life cycle. SPINE decomposes the EAI pipeline into various stages and establishes a multi-criterion privacy classification matrix to orchestrate contextual sensitivity across stage boundaries. We conduct preliminary simulation and real-world case studies to conceptually validate how privacy constraints propagate downstream to reshape system behavior, illustrating the insufficiency of fragmented privacy patches and motivating future research directions into secure yet functional embodied AI systems. We detail the SPINE framework and case studies at https://github.com/rminshen03/EAI_Privacy_Position.

32. ❌ Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models

作者: Bartlomiej Sobieski, Matthew Tivnan, Dawid Płudowski, Michał Jan Włodarczyk, Pengfei Jin, Przemyslaw Biecek, Quanzheng Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05026v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced manifold. We begin by showing that a hallucination filter based on such instabilities matches or exceeds the performance of the recently proposed temporal one. By tracing the source of these instabilities, we identify local intrinsic dimension (LID) as their primary driver and propose Intrinsic Quenching (IQ), a direct corrective mechanism that deflates it to alleviate hallucinations. IQ consistently outperforms standard hallucination reduction baselines across a wide array of benchmarks and offers a highly promising solution for enforcing anatomical consistency in downstream medical imaging tasks.

33. ❌ Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation

作者: Zhiqing Cui, Haotong Xie, Jiahao Yuan, Cheng Yang, Hanqing Wang, Yuxin Wu, Yifan Wu, Siru Zhong, Tao Yu, Yifu Guo, Siyu Zhang, Xinlei Yu, Qibing Ren, Usman Naseem 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05007v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.

34. ❌ Misaligned by Reward: Socially Undesirable Preferences in LLMs

作者: Gayane Ghazaryan, Esra Dönmez 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05003v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden. We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoning. We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise. This enables us to test whether reward models prefer socially undesirable responses, and whether their preferences produce systematically biased distributions over selected outputs. Across five publicly available reward models and two instruction-tuned models used as reward proxies, we find substantial variation across domains, with no single model performing best overall. The models fall well short of strong social intelligence: they often prefer socially undesirable options, and their preferences produce systematically biased distributions. Moreover, stronger bias avoidance can reduce sensitivity to context, revealing a key alignment trade-off between avoiding biased outcomes and preserving contextual faithfulness. These findings show that standard reward benchmarks are insufficient for assessing social alignment and highlight the need for evaluations that directly measure the social preferences encoded in reward models.

35. ❌ Federated Learning for Early Prediction of EV Charging Demand

作者: Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis, Panagiotis Thomakos, Andreas Sendros 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04993v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a session is estimated using only information available at plug-in time and during the first minutes of charging. This enables actionable decisions while the session is still in progress, which is of direct importance for EV network operators. We construct a session-level dataset from the Adaptive Charging Network (ACN), combining session metadata with early-window charging measurements, and derive tabular features capturing user intent, temporal patterns, and initial charging behavior. We focus on a single operational depot, Caltech, and model intra-depot heterogeneity through station-level client partitions while evaluating multiple model families in a federated learning (FL) setting. Our results show that federated models can approach centralized predictive performance while keeping data in-depot, enabling privacy-enhanced training across distributed charging infrastructures. Overall, we demonstrate that reliable demand estimates can be obtained early in the session with minimal data, and that FL provides a practical pathway toward scalable and privacy-aware analytics for EV charging networks. Code is available at https://github.com/Indigma-Innovations/federated-learning-ev-charging-demand.

36. ❌ On-line Learning in Tree MDPs by Treating Policies as Bandit Arms

作者: Anvay Shah, Ramsundar Anandanarayanan, Sharayu Moharir, Shivaram Kalyanakrishnan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04979v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential games with perfect recall, against stationary opponents. We consider the problem of on-line learning in T-MDPs, both in the PAC and the regret-minimisation regimes. We show that well-known bandit algorithms – \textsc{Lucb} and \textsc{Ucb} – can be applied on T-MDPs by treating each policy as an arm. The apparent technical challenge in this approach is that the number of policies is exponential in the number of states. Our main innovation is in the design of confidence bounds based on data shared by the policies, so that the bandit algorithms can yet be implemented with polynomial memory and per-step computation. We obtain instance-dependent upper bounds on sample complexity and regret that sum a ``gap term’’ from every terminal state, rather than every policy. Empirically, our algorithms consistently outperform available alternatives on a suite of hidden-information games.

37. ❌ Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking

作者: Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04971v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Weight matrices in deep networks exhibit geometric continuity – principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventing the rotation drift that would otherwise destabilize weight structure. Crucially, a nonlinear but rotation-preserving activation fails to retain continuity, isolating symmetry breaking – not nonlinearity itself – as the active ingredient. Activation and normalization play distinct roles: activation concentrates continuity in the leading singular direction, while normalization distributes it across multiple directions. In transformers, continuity is projection-specific: Q, K, Gate, and Up (which read from the residual stream) develop input-space ($\mathbf{v}_1$) continuity; O and Down (which write to it) develop output-space ($\mathbf{u}_1$) continuity; V alone, lacking an adjacent nonlinearity, develops only low continuity.

38. ❌ Architectural Constraints Alignment in AI-assisted, Platform-based Service Development

作者: Julius Irion, Moritz Leugers, Paul Hartwig, Simon Kling, Tachmyrat Annayev, Alexander Schwind, Maria C. Borges, Sebastian Werner 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04973v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhibit brittle behavior and limited deployability. We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities. By combining template retrieval with structured interaction, the method embeds production-relevant considerations during service scaffolding. Evaluation indicates improved architectural consistency and deployability compared to general-purpose AI code generation workflows, suggesting that constraint-aware retrieval is essential for aligning AI-assisted service development with production software engineering practices.

39. ❌ Skill Neologisms: Towards Skill-based Continual Learning

作者: Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04970v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model’s effective context. We explore skill neologisms–i.e., soft tokens integrated in the model’s vocabulary and optimized to improve capabilities over a specific skill–as a way to selectively extend model capabilities to new skills without weight updates. We first observe that off-the-shelf pre-trained LLMs already demonstrate tokens associated with procedural knowledge. We then show that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.

40. ❌ Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport

作者: Philip Naumann, Jacob Kauffmann, Klaus-Robert Müller, Grégoire Montavon 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04965v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes the ground metric by integrating observed sample displacements as an additional source of knowledge. Technically, ReshapeOT replaces the Euclidean metric with a Mahalanobis distance estimated from displacement second moments. This effectively carves expressways through the input space, inviting transport solutions that better align with observed displacements. Our method is computationally lightweight, integrates seamlessly into any OT solver that operates on a cost matrix, and can be kernelized for further flexibility. Experiments on synthetic and real-world data show that ReshapeOT achieves substantial gains in transport reliability. We further demonstrate our method’s usefulness in two practical use cases.

41. ❌ EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance

作者: Song Yu, Li Li, Wenwen Zhao, Zhisheng Yang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04960v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token informativeness, widespread step-level polarity misalignment, and substantial training waste. To address these limitations, we propose Entropy-Progress Aligned GRPO (EP-GRPO), a framework that mines the model’s intrinsic information flow for dense, self-supervised guidance. EP-GRPO integrates entropy-gated modulation to prioritize high entropy decision pivots, implicit process signals from policy divergence anchored to outcome advantages for directional token-level feedback without external reward models, and cumulative entropy mapping that enables progress-aligned advantage normalization, naturally maintaining gradient flow under zero reward variance. Extensive experiments on mathematical reasoning benchmarks demonstrate that EP-GRPO achieves superior accuracy and efficiency compared to GRPO and its variants. The code will be available.

42. ❌ DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

作者: Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04943v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architecture. DART extends the Joint-Embedding Predictive Architecture (JEPA) to the cross-modal domain by coupling a Vision Transformer (ViT-H/14) with Llama-3.2-3B-Instruct via a Severity-Conditioned Cross-Modal Fusion (SC-CMF) module. Three architectural innovations drive the model’s versatility: (1) HD-MASK, a saliency-guided masking strategy that focuses self-supervised reconstruction on damage-dense patches; (2) per-class learnable severity gates that adaptively weight language grounding by damage category; and (3) a Contrastive Damage Disentanglement (CDD) loss that shapes the embedding space to simultaneously encode damage type, severity ordering, and cross-modal semantics. Trained once on 4,270 images spanning 14 fine-grained rope damage classes, the frozen DART backbone supports downstream tasks without any task-specific fine-tuning: damage classification (93.22 % accuracy, 91.04 % macro-F1, +38.5 pp over a vision-only baseline), continuous severity regression (Spearman rho = 0.94, within-1-ordinal accuracy 99.6 %), few-shot recognition (89.2 % macro-F1 at 20 shots). These results demonstrate that DART functions as a general-purpose CM backbone that goes well beyond classification, providing actionable inspection intelligence from a single shared representation.

43. ❌ Modular Reinforcement Learning For Cooperative Swarms

作者: Erel Shtossel, Gal A. Kaminka 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04939v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, this requires each robot to represent a potentially combinatorial number of interaction states, challenging the memory capabilities of the robots. This paper proposes an alternative approach for representing spatial interaction states for multi-robot reinforcement learning in swarms. A modular (decomposed) representation is used, where each feature of the state is handled by a separate learning procedure, and the results aggregated. We demonstrate the efficacy of the approach in numerous experiments with simulated robot swarms carrying out foraging.

44. ❌ When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data

作者: Miguel Fernandez-de-Retana, Ruben Sanchez-Corcuera, Unai Zulaika, Aritz Bilbao-Jayo, Aitor Almeida 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04930v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on the value of causality for this task. We argue that existing benchmarks are insufficiently controlled to answer this question because they evaluate on real or semi-real data where multiple pathologies co-occur, confounding failure modes, and obscuring the specific conditions under which different inference methods excel or fail. To address this gap, we introduce a controlled diagnostic framework that isolates seven biologically motivated pathologies (dropout, latent confounders, cell-type mixing, feedback loops, network density, sample size, and pseudotime drift) and measure how six representative methods spanning three inference paradigms degrade as each pathology intensifies. Across 6,120 controlled experiments, we find that causal methods genuinely dominate in clean and structurally favorable regimes, but specific pathologies (notably dropout and latent confounders) selectively neutralize their advantages. We further introduce an error-type decomposition that reveals methods with similar aggregate accuracy commit qualitatively different errors. To probe whether single-pathology effects persist when multiple stressors co-occur, we perform an interaction sweep over the three most impactful pathologies and find that their joint effects are sub-additive, while also exposing density-conditional cross-overs invisible to single-dial analysis. Our findings offer a nuanced understanding of when and why different methods succeed or fail for GRN inference, providing actionable insights for method development and practical guidance for practitioners.

45. ❌ Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation

作者: Jiangwen Dong, Bo Li, Wanyu Lin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04922v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas across various benchmark-native metrics, such as novelty, feasibility, and clarity. Instead of coordinating solely through texts, EIG represents a partially formed proposal as an evolving idea graph, where nodes capture scientific claims and edges encode relations (e.g., support and conflict), enabling unresolved weaknesses to remain identifiable throughout the idea evolving process. Specifically, a learned two-head controller operates over the evolving graph to guide the ideation: one head selects graph edits for agents to execute, while the other decides when the graph is ready for commit as final proposal synthesis. On AI Idea Bench 2025 and LiveIdeaBench, EIG outperforms all compared systems on both automatic benchmark scores and blind expert ratings. Ablations further show that explicit graph state provides the main performance gains, and learned edit-and-commit control adds consistent improvements.

46. ❌ A Foundation Model for Zero-Shot Logical Rule Induction

作者: Yin Jun Phua 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04916v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across variable identities and counts without retraining. The model consists of a statistical encoder and a parallel slot-based decoder. Parallel decoding preserves the permutation invariance of logical disjunction; an autoregressive decoder would instead impose an arbitrary clause order. Product T-norm relaxation makes rule execution differentiable, allowing end-to-end training on prediction accuracy alone. We evaluate NRI on rule recovery, robustness to label noise and spurious correlations, and zero-shot transfer to real-world benchmarks, and we believe this work opens up the possibility of foundation models for symbolic reasoning. Code and the reference checkpoint are available at https://github.com/phuayj/neural-rule-inducer.

47. ❌ Curated AI beats frontier LLMs at pharma asset discovery

作者: Łukasz Kidziński, Kevin Thomas 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04908v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset – an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations – against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.

48. ❌ Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

作者: Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang, Yi Cai, Mengchen Zhao 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04906v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process. In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs’ strategic reasoning ability in multi-agent games. We introduce a novel recursive reasoning paradigm where an agent’s reasoning also integrates other agents’ reasoning processes. To provide effective reward signals for the intermediate reasoning sequences, we employ a centralized Chain-of-Thought (CoT) comparison module to evaluate the reasoning quality. Finally, we compute an accurate hybrid advantage and develop a group-relative RL approach to optimize the LLM policy. Experimental results show that Strat-Reasoner substantially improves strategic abilities of underlying LLMs, achieving 22.1% average performance improvements across various multi-agent games.

49. ❌ Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

作者: Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04903v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch – computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs – DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B – across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) – well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $ρ$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.

50. ❌ On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference

作者: Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu, Jiaping Gui, Yu Feng, Ning Liu, Minyi Guo, Jingwen Leng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04901v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client’s input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach significantly improves efficiency, exposing activations enables adversaries to extract model weights. To mitigate this risk, existing works employ a shuffling defense that reveals only randomly permuted activations to the client. In this work, we show that the shuffling defense is not as robust as previously claimed. We propose an attack that aligns differently shuffled activations to a common permutation and subsequently exploits them to extract model weights. Experiments on Pythia-70m and GPT-2 demonstrate that the proposed attack can align shuffled activations with mean squared errors ranging from $10^{-9}$ to $10^{-6}$. With a query cost of approximately $1, the adversary can recover model weights with L1-norm differences ranging from $10^{-4}$ to $10^{-2}$ compared to the oracle weights.

51. ❌ Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall

作者: Joshua Adler, Guy Zehavi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04897v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retrieval pipeline operating over events preserved verbatim. The full system runs as a single SQLite file on commodity CPU with no external database, vector index, graph store, or GPU. On LoCoMo (1,540 questions across 10 multi-session conversations), True Memory Pro reaches 93.0% accuracy (3-run mean) against 61.4% for Mem0, 65.4% for Supermemory, approximately 71% for Zep, and 94.5% for EverMemOS under a matched gpt-4.1-mini answer model. On LongMemEval (500 questions), True Memory Pro reaches 87.8% (3-run mean). On BEAM-1M (700 questions at the 1-million-token scale), True Memory Pro reaches 76.6% (3-run mean), above the prior published result of 73.9% for Hindsight. A 56-configuration ablation shows a 1.3-percentage-point spread within the top-performing configuration family.

52. ❌ FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection

作者: Mohamed Elhabebe, Ayman El-Baz, Qing Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04882v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a fair pretraining method for vision-language models (VLMs) that enables simultaneous debiasing across multiple sensitive attributes. FairEnc jointly mitigates biases in both textual and visual modalities with respect to multiple sensitive attributes, including race, gender, ethnicity, and language. Specifically, for the textual encoder, we leverage a large language model to generate synthetic clinical descriptions with varied sensitive attributes while preserving disease semantics, and employ a contrastive alignment objective to encourage demographic-invariant representations. For the visual encoder, we propose a dual-level fairness strategy that combines mutual information regularization to reduce statistical dependence between learned features and demographic groups, with multi-discriminator adversarial debiasing. Comprehensive experiments on the publicly available Harvard-FairVLMed dataset demonstrate that FairEnc effectively reduces demographic disparity as measured by DPD and DEOdds while achieving strong diagnostic performance under both zero-shot and linear probing evaluations. Additional experiments on the private FairFundus dataset show that FairEnc consistently preserves fairness advantages under cross-domain and cross-modality settings and maintains diagnostic performance within a competitive range. These results highlight FairEnc’s ability to generalize fairness under distribution shifts, supporting its potential for more equitable deployment in real-world clinical settings. Our codebase and synthetic clinical notes are available at https://github.com/Mohamed-Elhabebe/FairEnc

53. ❌ A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

作者: Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04880v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.

54. ❌ Anticipating Innovation Using Large Language Models

作者: Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04875v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.

55. ❌ Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset

作者: Eduardo Santos, Juliana Carvalho, César Rennó-Costa 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04857v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.

56. ❌ Quantile-Free Uncertainty Quantification in Graph Neural Networks

作者: Soyoung park, Hwanjun Song, Sungsu Lim 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04847v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice. Moreover, achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22% higher coverage and 50% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.

57. ❌ StoryAlign: Evaluating and Training Reward Models for Story Generation

作者: Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04831v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains $1,133$ high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only $66.3%$ accuracy. To address this limitation, we construct roughly $100,000$ high-quality story preference pairs across diverse domains and develop StoryReward, an advanced reward model for story preference trained on this dataset. StoryReward achieves state-of-the-art (SoTA) performance on StoryRMB, outperforming much larger models. We also adopt StoryReward in downstream test-time scaling applications for best-of-n (BoN) story selection and find that it generally chooses stories better aligned with human preferences. We will release our dataset, model, and code to facilitate future research. Related code and data are available at https://github.com/THU-KEG/StoryReward.

58. ❌ DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

作者: Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang, Mintong Kang, Chejian Xu, Qichang Liu, Xiaogeng Liu, Tianneng Shi, Chaowei Xiao, Sanmi Koyejo, Percy Liang, Wenbo Guo, Dawn Song, Bo Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04808v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is inherently challenging, as agents operate in dynamic, untrusted environments involving external tools, heterogeneous data sources, and frequent user interactions. However, realistic, controllable, and reproducible environments for large-scale risk assessment remain largely underexplored. To address this gap, we introduce the DecodingTrust-Agent Platform (DTap), the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and over 50 simulation environments that replicate widely used systems such as Google Workspace, Paypal, and Slack. To scale the risk assessment of agents in DTap, we further propose DTap-Red, the first autonomous red-teaming agent that systematically explores diverse injection vectors (e.g., prompt, tool, skill, environment, combinations) and autonomously discovers effective attack strategies tailored to varying malicious goals. Using DTap-Red, we curate DTap-Bench, a large-scale red-teaming dataset comprising high-quality instances across domains, each paired with a verifiable judge to automatically validate attack outcomes. Through DTap, we conduct large-scale evaluations of popular AI agents built on various backbone models, spanning security policies, risk categories, and attack strategies, revealing systematic vulnerability patterns and providing valuable insights for developing secure next-generation agents.

59. ❌ Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes

作者: Michael Soprano, Andrea Cioci, Stefano Mizzaro 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04797v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.

60. ❌ AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use

作者: Chenglin Yang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04785v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding what an action means. We present AgentTrust, a runtime safety layer that intercepts agent tool calls before execution and returns a structured verdict: allow, warn, block, or review. AgentTrust combines a shell deobfuscation normalizer, SafeFix suggestions for safer alternatives, RiskChain detection for multi-step attack chains, and a cache-aware LLM-as-Judge for ambiguous inputs. We release a 300-scenario benchmark across six risk categories and an additional 630 independently constructed real-world adversarial scenarios. On the internal benchmark, the production-only ruleset achieves 95.0% verdict accuracy and 73.7% risk-level accuracy at low-millisecond end-to-end latency. On the 630-scenario benchmark, evaluated under a patched ruleset and not claimed as zero-shot, AgentTrust achieves 96.7% verdict accuracy, including about 93% on shell-obfuscated payloads. AgentTrust is released under the AGPL-3.0 license and provides a Model Context Protocol server for MCP-compatible agents.

61. ❌ Gyan: An Explainable Neuro-Symbolic Language Model

作者: Venkat Srinivasan, Vishaal Jatav, Anushka Chandrababu, Geetika Sharma 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04759v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explainable language model based on a novel non-transformer architecture, without any of these limitations. Gyan achieves SOTA performance on 3 widely cited data sets and superior performance on two proprietary data sets. The novel architecture decouples the language model from knowledge acquisition and representation. The model draws on rhetorical structure theory, semantic role theory and knowledge-based computational linguistics. Gyan’s meaning representation structure captures the complete compositional context and attempts to mimic humans by expanding the context to a ‘world model’. AI model adoption critically depends on trust and transparency especially in mission critical use cases. Collectively, our results demonstrate that it is possible to create models which are trustable and reliable for mission critical tasks. We believe our work has tremendous potential for guiding the development of transparent and trusted architectures for language models.

62. ❌ Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop

作者: Wu-Yuin Hwang, Nur Alif Ilyasa, Muhammad Irfan Luthfi, Yuniar Indrihapsari 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04761v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano’s New Taxonomy of Educational Objectives and tries to clone learner’s thinking model and build cognitive twin. It was constructed using a pipeline that combines large language model inference (Gemini 2.5 Pro), sentence embeddings, dimensionality reduction, and consensus clustering. This paper evaluates PTM fidelity through three methods applied to 40 participants in a seven-week study. First, automatic evaluation using atomic information point matching yielded an overall F1 score of 74.57% before human-in-the-loop (HITL) refinement and 75.48% after refinement. Second, user evaluation using a Likert scale produced mean ratings of 4.26 and 4.30 on a five-point scale for pre and post-HITL conditions respectively. Third, semantic alignment verification showed that topic coherence increased from 0.436 at the behavioral layer to 0.626 at the core value layer, while lexical overlap with journal vocabulary decreased from 0.114 to 0.007 across those same layers. These results suggest that the PTM produces outputs with acceptable fidelity, was generally perceived by users as reflecting their thinking, and showed a pattern consistent with semantic abstraction across layers.

63. ❌ Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition

作者: Eugene Kofi Okrah Denteh, Blessing Agyei Kyem, Joshua Kofi Asamoah, Armstrong Aboah 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04752v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations motivate a unified framework that links motion evidence to spatial feature selection while preserving data-adaptive temporal characterization. This study therefore proposes FLO-EMD, a hybrid approach that couples motion-guided attention with empirical, data-driven temporal decomposition. Dense optical flow guides channel and spatial attention so that RGB features are refined toward motion-relevant regions. In parallel, aggregated flow statistics form compact motion traces that are decomposed using Empirical Mode Decomposition (EMD) to extract intrinsic temporal components. The resulting EMD embedding is fused with learned spatiotemporal representations to classify light, medium, and heavy congestion. Experiments on 1,050 five-second clips from four surveillance networks show that FLO-EMD achieves 97.5% overall test accuracy (weighted F1 = 0.9742), outperforming established baselines and remaining robust across diverse environmental conditions; ablation and sensitivity analyses further quantify the contributions of EMD, the number of intrinsic mode functions, and the selected motion descriptors.

64. ❌ Knowledge-Free Correlated Agreement for Incentivizing Federated Learning

作者: Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04747v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly truthful, addressing the label-flipping vulnerability of Correlated Agreement (CA). We evaluate KFCA on federated LLM adapter tuning and a real-world PCB inspection task, showing efficient real-time reward computation suitable for decentralized and blockchain-based incentive designs.

65. ❌ Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing

作者: Miao Wang, Yuling Shi, Yijiang Li, Yeheng Chen, Xiaodong Gu, Bin Li, Bo Gao, Yaduan Ruan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04733v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene’s atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye-Brain-Mouth Reinforcement Learning), a decoupled GRPO-based framework that explicitly separates observation ([perception]), reasoning ([think]), and utterance ([answer]). This structure promotes human-like sensory grounding by compelling the model to first attend to visual cues, then form internal interpretations, and finally generate context-appropriate dialogue. EBM-RL integrates four complementary rewards: (i) CLIP-based scene-text alignment to improve ambiance and emotion; (ii) a Perceptual-Cognitive reward that encourages [perception] and [think] processes that increase the likelihood of the reference response; (iii) answer accuracy to ensure faithfulness; and (iv) a dense format reward to enforce the desired structured output. Extensive experiments demonstrate that EBM-RL substantially outperforms text-only role-playing baselines and larger-scale vision-language models on our immersive role-playing benchmark, delivering simultaneous gains in visual-atmosphere consistency and character authenticity. Beyond the role-playing domain, EBM-RL also exhibits strong zero-shot generalization: without any additional fine-tuning, it consistently improves performance on out-of-domain VideoQA benchmarks. We additionally release an open-source dataset for video-grounded role-playing dialogue.

66. ❌ AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education

作者: Alvaro Becerra, Alejandra Palma, Ruth Cobos 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04740v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback. Key to the system is a “teacher-in-the-loop” mediation workflow, where educators use specialized Learning Analytics dashboards to curate and refine AI-generated drafts before delivery. Furthermore, we detail the underlying data infrastructure, which employs a hybrid SQL and MongoDB strategy to ensure traceability and manage semi-structured feedback versions.

67. ❌ AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations

作者: Alvaro Becerra, Diego Gomez, Ruth Cobos 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04729v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload their slide decks prior to an oral presentation and automatically receive quantitative scores and qualitative feedback based on teacher-defined evaluation rubrics. The system analyzes both slide-level features and slide content, generates structured feedback through an LLM (ChatGPT 5.2), and presents the results through interactive dashboards for students and teachers. We tested AISSA on a pilot deployment with 46 undergraduate students in a real academic setting. The results indicate that AISSA is technically reliable, economically feasible, and perceived by students as useful for iterative slide improvement. These findings suggest that combining LLM-based analysis with Learning Analytics dashboards is a promising approach for supporting formative feedback on presentation slides at scale.

68. ❌ From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists

作者: Stefano Cecconello, Mauro Conti, Luca Pajola, Luca Pasa, Pier Paolo Tricomi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04724v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII) from innocuous public data. We explore how music streaming ecosystems, where users routinely release public playlists, can be exploited for Offensive AI. To quantify this threat, we developed musicPIIrate. This novel tool leverages deep learning architectures that utilize both standalone data representations and the structural information embedded in a user’s playlist collection. Our design explores set-based approaches (e.g., Deep Sets) and methodologies modeling relationships between playlists (e.g., Graph Neural Networks), which we also combine to leverage both perspectives. Our approach addresses feature extraction from unordered, variable-length set data, enabling accurate PII prediction. Empirical evaluation demonstrates that musicPIIrate achieves state-of-the-art inference accuracy. The tool successfully infers a wide array of attributes, including: Demographics (Age, Country, Gender), Habits (Alcohol, Smoke, Sport), and Personality Traits (OCEAN scores). musicPIIrate outperforms existing methods, beating baselines in 9 out of 15 attribute inference tasks. To counter this vulnerability, we propose JamShield, a lightweight defensive framework. JamShield strategically injects dummy playlists into an account to dilute the PII-carrying signal. Our analysis indicates that JamShield represents a promising defense, lowering inference F1-scores by an average of 10%. This work provides an initial Offensive-AI benchmark for playlist-based PII inference using architectures that leverage set- and graph-structured data and introduces a defense showing encouraging mitigation effects.

69. ❌ Exact Dual Geometry of SOC-ICNN Value Functions

作者: Kang Liu, Jianchen Hu, Wei Peng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04722v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quadratic and conic modules and admit an exact representation as value functions of second-order cone programs (SOCPs). This value-function structure enables an explicit convex-analytic treatment of SOC-ICNN inference. In this paper, we study the exact first-order and local second-order geometry of SOC-ICNNs from the dual viewpoint. We show that supporting slopes, subdifferentials, directional derivatives, and local Hessians can be recovered directly from optimal dual variables. These results provide the geometric primitives for white-box SOC-ICNN inference, going beyond black-box automatic differentiation. Numerical experiments validate the exact multiplier readout, the local Hessian formula, and the set-valued behavior at structurally degenerate inputs. We also provide a step-by-step tutorial showing how the readout mechanism instantiates a complete white-box inference loop. The code is available at https://anonymous.4open.science/r/SOC-ICNN-Theory-BEFC/.

70. ❌ Budget-aware Auto Optimizer Configurator

作者: Kang Liu, Wei Peng, Jianchen Hu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04711v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given budgets. Specifically, BAOC samples gradient streams to derive statistical metrics that quantify the potential performance risk of applying cheaper configurations (e.g., low precision or removing momentum). It then solves a constrained allocation problem to minimize total risk under memory and time budgets, selecting a budget-feasible configuration for each block. Experiments across vision, language, and diffusion workloads demonstrate that BAOC maintains training quality while significantly reducing the memory usage of optimizer states. The code is available at https://anonymous.4open.science/r/BAOC-45C6.

71. ❌ FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation

作者: Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu, Sen Liang, Wenyue Li, Tianxiang Zheng, Qinglin Lu, Zhen Cui 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04702v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.

72. ❌ Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

作者: Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04700v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.

73. ❌ Average Attention Transformers and Arithmetic Circuits

作者: Lena Ehrmuth, Laura Strieker 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04683v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.

74. ❌ Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding

作者: Jingtao Liu, Peiliang Gong, Chuhang Zheng, Yiheng Liu, Qi Zhu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04680v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by retinotopic mapping and subject-specific neuroanatomy - severely impede cross-modal alignment. To address this, we propose MB2L, a Multi-Level Bidirectional Biomimetic Learning framework that incorporates structured physiological inductive biases into representation learning. Specifically, we propose Adaptive Blur with Visual Priors to mitigate perceptual-structural mismatch by reweighting visual inputs according to retinotopic priors. We further propose Biomimetic Visual Feature Extraction to learn multi-level visual representations consistent with hierarchical cortical processing, enhancing subject-invariant encoding. These modules are jointly optimized via Multi-level Bidirectional Contrastive Learning, which aligns EEG and visual features in a shared semantic space through bidirectional contrastive objectives. Experiments show MB2L achieves 80.5% Top-1 and 97.6% Top-5 accuracy on zero-shot EEG-to-image retrieval, significantly outperforming prior methods and demonstrating strong generalization across subjects and experimental settings.

75. ❌ CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

作者: Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora, Akhilesh Deepak Gotmare, Gokulakrishnan Gopalakrishnan, Tharun Gali, Madhav Rathi, Doyen Sahoo, Manpreet Singh, Mayuresh Verma, Laksh Venka, Shuchita Singh 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04677v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing reliance on manual bottleneck identification. For each target, CodeEvolve generates candidate edits, evaluates them through build validation, unit tests, performance checks, static analysis, and LLM-based review, and retains only variants that preserve functional correctness. Across real-world optimization tasks, CodeEvolve improves performance and code metrics while maintaining correctness. On a large enterprise Java codebase, it achieves an average speedup of 15.22$\times$ across seven hotspot functions and outperforms single-pass LLM optimization on five of them. An ablation study on Apex optimization shows that the full MCTS-augmented configuration produces 19.5 valid programs out of 20 on average, indicating that search, filtering, and refinement each contribute to more reliable optimization.

76. ❌ Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

作者: Mingda Li, Rundong Lv, Xinyu Li, Weinan Zhang, Ting Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04638v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks that operates in parameter space, we propose to consider gradients in semantic space. Our method builds on the key intuition that a confident LLM should maintain stable output distributions under semantically equivalent input perturbations. We interpret the stability as the gradients in semantic space and introduce a Semantic Preservation Score (SPS) to identify embeddings that best capture semantics, with respect to which gradients are computed. We further propose HybridGrad, which combines the strengths of SemGrad and parameter gradients. Experiments demonstrate that both of our methods provide efficient and effective uncertainty estimates, achieving superior performance than state-of-the-art methods, particularly in settings with multiple valid responses.

77. ❌ AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair

作者: Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04624v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public leaderboard and release AuditRepairBench, a paired-execution trace corpus of 576,000 registered cells (96,000 executed) that operationalizes evaluator-channel-blocking ranking instability within a declared observability boundary. A modular screening architecture decides pathway-blocking through four interchangeable implementations, a learned influence proxy, a rule-based channel-exposure ratio that uses no trained model, a counterfactual sensitivity proxy, and a sparse human-audit proxy, combined into a screening posterior that feeds a cell-level flip functional, a set-valued label, a stratified system score, and a set-valued leaderboard. The resource is supported by mechanism-anchored validation on an 80-case source-level channel-surgery subset, an independent-discovery protocol under which two annotator groups separated from the pipeline developers discover coupling patterns blinded to the screening design and the frozen ensemble attains pooled AUROC 0.83 on their 79 cases, implementation robustness, uncertainty propagation that raises 95% coverage from 0.81 to 0.95, and forward transfer with pooled community-evaluator Spearman \r{ho} = 0.65. Screening-guided blinding patches reduce rank displacement by 55–74% (mean 62%) at fewer than 50 lines of code, whereas random channel blinding produces at most 7% reduction and generic retraining at most 13%. AuditRepairBench-Lite, a rule-only configuration on a 12,000-cell subset, preserves the leaderboard at Kendall τ = 0.88 under twenty-four GPU-hours and is the primary release artifact at 42 GB.

78. ❌ Library learning with e-graphs on jazz harmony

作者: Zeng Ren, Maddy Bowers, Xinyi Guan, Martin Rohrmeier 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04622v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational model for the learning of jazz harmonic patterns based on library learning. Given a corpus of harmonic progressions, our model searches over a space of programs composed of primitive harmonic relations in order to discover concise generative explanations of the corpus. The model first enumerates possible programs for each piece, and then jointly learns a library of harmonic patterns and refactored programs. To efficiently navigate the vast joint space of programs and libraries, we integrate deductive parsing with library learning on e-graphs. We explore how well our model captures aspects of human musical pattern learning by evaluating the intuitiveness of both programs and libraries, as well as similarities to human-written harmonic derivations.

79. ❌ Guidelines for Designing AI Technologies to Support Adult Learning

作者: Jennifer M. Reddig, Glen R. Smith, Sanaz Ahmadzadeh Siyahrood, Wesley G. Morris, Yoojin Bae, Kaitlyn Crutcher, John Kos, Rahul K. Dass, Jinho Kim, Momin Naushad Siddiqui, Daniel Weitekamp, Ploy Thajchayapong, Sandeep Kakar, Alex Endert, Scott Crossley, Min Kyu Kim, Chris Dede, Ashok Goel, Christopher J. MacLellan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04616v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the United States focused on adult learning and online education. Drawing on longitudinal deployment data, we conducted a reflexive thematic analysis to identify recurring challenges and design considerations across systems. These insights were synthesized into a set of 19 design guidelines intended to inform future AI-supported adult learning technologies. We demonstrate the utility of these guidelines through a heuristic evaluation of the deployed systems. Lastly, we present a guideline exploration tool that aids in the ideation of technologies by connecting the guidelines to stakeholder statements surfaced in the analysis process.

80. ❌ VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models

作者: Yukun Chen, Tianrui Wang, Zhaoxi Mu, Xinyu Yang, EngSiong Chng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04613v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic annotation highly necessary. Despite their utility, current automatic transcription systems face significant challenges: they often rely on complex multi-stage pipelines, struggle to recover text-note alignments, and exhibit poor generalization to out-of-distribution (OOD) singing data. To alleviate these issues, we present VocalParse, a unified singing voice transcription (SVT) model built upon a Large Audio Language Model (LALM). Specifically, our novel contribution is to introduce an interleaved prompting formulation that jointly models lyrics, melody, and word-note correspondence, yielding a generated sequence that directly maps to a structured musical score. Furthermore, we propose a Chain-of-Thought (CoT) style prompting strategy, which decodes lyrics first as a semantic scaffold, significantly mitigating the context disruption problem while preserving the structural benefits of interleaved generation. Experiments demonstrate that VocalParse achieves state-of-the-art SVT performance on multiple singing datasets. The source code and checkpoint are available at https://github.com/pymaster17/VocalParse.

81. ❌ Beyond Retrieval: A Multitask Benchmark and Model for Code Search

作者: Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu, Fan Zhou, Hang Yu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04615v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retrieval to cover the full code search pipeline. \textsc{CoREB} is built from counterfactually rewritten LiveCodeBench problems in five programming languages and delivered as timed releases with graded relevance judgments. We benchmark eleven embedding models and five rerankers across three tasks: text-to-code, code-to-text, and code-to-code. Our experiments reveal that: \circone code-specialised embeddings dominate code-to-code retrieval (${\sim}2{\times}$ over general encoders), yet no single model wins all three tasks; \circtwo short keyword queries, the format closest to real developer search, collapse every model to near-zero nDCG@10; \circthree off-the-shelf rerankers are task-asymmetric, with a 12-point swing on code-to-code and no baseline net-positive across all tasks; \circfour our fine-tuned \textsc{CoREB-Reranker} is the first to achieve consistent gains across all three tasks. The data and model are released.

82. ❌ SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition

作者: Naiyu Zheng, Tianlong Yu, Haochen Yin, Xiaoyi Fan, Xiping Hu, Zhimeng Yin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04608v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled data, position-specific ambiguity, and a lack of transparent reasoning. Inspired by the advanced agents framework, which emulates a collaborative agent using Large Language Models (LLMs), we propose SensingAgents, a novel multi-agent system for robust IMU activity recognition. SensingAgents organizes LLM-powered agents into specialized roles: a group of Analyst Agents for position-specific sensor analysis (arm, wrist, belt, pocket), a pair of Advocate Agents that resolves sensor conflicts through dynamic and static dialectical debates, and a Decision Agent that ensures reliability under sensor drift or failure. Evaluation on the Shoaib dataset demonstrates that SensingAgents significantly outperforms state-of-the-art single-agent and multi-agent LLM models, achieving an accuracy of 79.5% in a zero setting–29% higher than existing agent models and 9.4% higher than deep learning baselines–particularly in complex scenarios where multi-sensor data is conflicting or noisy. Our work highlights the potential of multi-agent collaborative reasoning for advancing the robustness and interpretability of ubiquitous sensing systems.

83. ❌ Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness

作者: Yichen Li, Qiankun Liu, Ying Fu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04606v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. It leverages feature similarity between predicted objects and unlabeled reference images. Unlike previous unsupervised methods that lack category guidance and one-shot methods which require labeled data, RefCD introduces a carefully designed feature similarity loss to explicitly guide the learning of potential category-specific features. Additionally, RefCD supports category-agnostic detection without reference images, serving as a unified framework. Comprehensive quantitative and qualitative analysis of category-aware and category-agnostic detection results demonstrates its effectiveness, and RefCD can learn category information in an unsupervised paradigm even without category labels.

84. ❌ A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints

作者: Chengyi Nie, Nian Si, Zijie Zhou 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04595v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which accelerates decoding but quickly exhausts GPU memory. In this paper, we introduce the first queueing-theoretic framework that explicitly incorporates both computation and GPU memory constraints into the analysis of LLM inference. Based on this framework, we derive rigorous stability and instability conditions that determine whether an LLM inference service can sustain incoming demand without unbounded queue growth. This result offers a powerful tool for system deployment, potentially addressing the core challenge of GPU provisioning. By combining an estimated request arrival rate with our derived stable service rate, operators can calculate the necessary cluster size to avoid both costly over-purchasing and performance-violating under-provisioning. We further validate our theoretical predictions through extensive experiments in real GPU production environments. Our results show that the predicted stability conditions are highly accurate, with deviations typically within 10%.

85. ❌ HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily

作者: Xinyi Li, Ming Li, Lu Bai, Lixin Cui, Feilong Cao, Ke Lv, Yunliang Jiang, Pietro Liò 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04594v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decoupling framework for heterogeneous graph learning under heterophily. HeterSEED decouples representation learning into a heterogeneous semantic channel that captures type- and relation-aware local semantics and a structure-aware heterophily channel that separates homophilic and heterophilic neighborhoods via pseudo-label-guided partitioning and aggregates them using metapath-based structural weights. A node-level adaptive fusion mechanism then combines the two channels to produce context-dependent node representations. Theoretically, we establish that, on heterogeneous graphs under heterophily, HeterSEED is strictly more expressive than standard heterogeneous graph neural networks that rely primarily on feature similarity and provably reduces the prediction bias introduced by heterophilic neighbors. Experiments on five real-world heterogeneous graphs, including two large-scale networks at the million-node and hundred-million-edge scale, demonstrate that HeterSEED consistently outperforms representative heterogeneous graph neural networks and recent heterophily-aware baselines, especially in strongly heterophilic regimes.

86. ❌ From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation

作者: Zishen Qu, Xuesong Li, Haijian Gu, Hongwei Kang, Quan Meng, Tianrui Niu, Xin Yang, Ruidong Pan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04590v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion models that are harmful to discriminative segmentation tasks. In response, we propose RLFSeg, a novel framework that leverages Rectified Flow to learn direct mapping from the image to the segmentation mask within the latent space. The model is thus freed from the noise-denoise process and the need to optimize the time step of diffusion models, resulting in substantially better performance than previous diffusion-based methods, especially on zero-shot scenarios. By introducing label refinement and an Adaptive One-Step Sampling strategy, the model achieves higher accuracy even on a single inference step. The framework redirects a pretrained generative model to the discriminative segmentation task with zero modification to model structure, thus reveals promising application potential and significant research value.

87. ❌ From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning

作者: Xiao Wang, Yifei Zhang, YongKang Liu, Xiaocui Yang, Zihan Wang, Shi Feng, Daling Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04572v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign fine-tuning causes parameters to cumulatively drift toward danger-aligned directions, progressively undermining the model’s safety. This finding suggests that samples contributing more to this drift has greater fine-tuning risks. Based on this insight, we propose a method of Sample-Level Quantification of Safety Degradation (SQSD), which quantifies the influence of each training sample on safety degradation. Specifically, SQSD computes continuous risk scores to samples by measuring their induced parameter updates’ projection difference between danger and safety directions. Extensive experiments across multiple models and datasets demonstrate that SQSD effectively quantifies sample-level fine-tuning risks and exhibits strong transferability across model architectures, parameter scales, and parameter-efficient methods.

88. ❌ Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination

作者: Jonathan Spieler, Sven Behnke 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04568v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expensive for high-dimensional control tasks. While gradient-based methods are a promising alternative, recent works have empirically shown that gradient-based methods often perform worse than their gradient-free counterparts. We propose Dream-MPC, a novel approach that generates few candidate trajectories from a rolled-out policy and optimizes each trajectory by gradient ascent using a learned world model, uncertainty regularization and amortization of optimization iterations over time by reusing previously optimized actions. Our results on 24 continuous control tasks show that Dream-MPC can significantly improve the performance of the underlying policy and can outperform gradient-free MPC and state-of-the-art baselines. We will open source our code and more at https://dream-mpc.github.io.

89. ❌ Stage-adaptive audio diffusion modeling

作者: Xuanhao Zhang, Chang Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04547v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio diffusion models remains computationally expensive, and most existing pipelines still rely on static optimization recipes that treat the relative importance of training signals as fixed throughout learning. In this work, we argue that a major source of inefficiency lies in the evolving balance between semantic acquisition and generation-oriented refinement. Early training places stronger emphasis on acquiring condition-aligned semantic structure and coarse global organization, whereas later training increasingly emphasizes temporal consistency, perceptual fidelity, and fine-detail refinement. To characterize this evolving balance, we introduce a progress-based regime variable derived from the training-time slope of an SSL-space discrepancy, which measures semantic progress during training. Based on this signal, we develop three complementary stage-aware mechanisms: decayed SSL guidance for early semantic bootstrapping, self-adaptive timestep sampling driven by the regime variable, and structure-aware regularization activated from convergent grouped organization in parameter space. We evaluate these mechanisms on text-conditioned audio generation and audio-conditioned super-resolution. Across both settings, the proposed stage-aware strategies improve convergence behavior and yield gains on the primary generation and spectral reconstruction metrics over standard static baselines. These results support the view that efficient audio diffusion training can benefit from treating external guidance, internal organization, and optimization emphasis as stage-dependent components rather than fixed ingredients.

90. ❌ Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis

作者: Vlad Vasilescu, Daniela Faur, Teodor Costachioiu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04557v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitoring. In this paper, we tackle the problem of geometry-controlled high-resolution satellite image synthesis by adding control over existing pre-trained diffusion models. We propose a simple yet efficient method for controlling the synthesis process by leveraging only skip connection features using windowed cross-attention modules. Several previously established control techniques are compared, indicating that our method achieves comparable performance while leading to a better alignment with the geometry control map. We also discuss the limitations in current evaluation approaches, amplifying the necessity of a consistent alignment assessment.

91. ❌ RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization

作者: Qiming Bao, Juho Leinonen, Paul Denny, Michael J. Witbrock 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04539v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logical correctness. This blindspot leaves a logical alignment gap – SFT models reach NLI entailment of only 0.05-0.22 despite producing fluent text. We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERTa-v3 NLI signal with a verifier LLM score, removing human annotation while overcoming the “alignment tax” of single-signal optimization. Evaluated across five academic domains (Biology, Medicine, Law) with three base architectures (LLaMA-2-13B, Qwen3-8B, Gemma 4 E4B-it), RLearner-LLM yields up to 6x NLI improvement over SFT, with NLI gains in 11 of 15 cells and consistent answer-coverage gains. On Gemma 4 E4B-it (4.5B effective params), Hybrid-DPO lifts NLI in four of five domains (+11.9% to +2.4x) with faster inference across all five, scaling down to compact base models without losing the alignment-tax mitigation. Our Qwen3-8B RLearner-LLM wins 95% of pairwise comparisons against its own SFT baseline; GPT-4o-mini in turn wins 95% against our concise output – alongside the 69% win the same judge gives a verbose SFT over our DPO model, this replicates verbosity bias on a frontier comparator and motivates logic-aware metrics (NLI, ACR) over LLM-as-a-judge for knowledge-intensive generation.

92. ❌ Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap

作者: Christoph Treude 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04532v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern the use of AI-powered development tools. In this vision paper, we present a comparative analysis of the Terms of Service for widely used AI coding assistants and agent-enabled development tools. We examine how these documents allocate ownership, responsibility, liability, and disclosure obligations between tool providers and software developers, and we identify common patterns and divergences between providers. Our analysis reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, as well as substantial variation in how providers address issues such as indemnification, data reuse, and acceptable use. Based on these findings, we argue that existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows. We outline a research roadmap for accountable agents in software engineering, identifying challenges and opportunities for modeling responsibility, designing governance artifacts, developing tooling that supports accountability, and conducting empirical studies of developers’ perceptions and practices.

93. ❌ SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting

作者: Kuan-Hao Tseng, Niruth Bogahawatta, Yasod Ginige, Kosta Dekic, Arunan Sivanathan, Suranga Seneviratne 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04530v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that encodes the classical Cisco troubleshooting methodology as an explicit policy. SADE pairs a phase-gated diagnostic workflow, which separates evidence acquisition from hypothesis commitment, with a routed library of fault-family skills and high-yield diagnostic helpers. On a held-out 523 incident set of the public NIKA benchmark covering eleven unseen scenarios, SADE improves root-cause F1 by 37 percentage points over a ReAct + GPT-5 baseline; a model-controlled comparison against the same Claude Sonnet backend without the SADE policy attributes 22 of those points to the diagnostic policy alone, showing that the gain is not a side-effect of the model upgrade.

94. ❌ RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation

作者: Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04523v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present our winning system for Task~~B (generation with reference passages) in SemEval-2026 Task~~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential, with the ensemble consistently beating any single model. We also introduce Meno-Lite-0.1, a 7B domain-adapted model with a strong cost–performance trade-off, and analyse MTRAGEval, highlighting annotation limitations and directions for improvement. Our code is publicly available: https://github.com/RaguTeam/ragu_mtrag_semeval

95. ❌ DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration

作者: Mark C. Ballandies, Florian Spychiger, Uwe Serdült, Claudio J. Tessone 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04522v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots; (4) show how these elements specify workflows that couple machine execution with human oversight, enabling enhanced self-organization of techno-socio-economic systems, which we call DePAI; and (5) analyze risks, including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, and argue for value-sensitive design and continuously adaptive governance. DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.

96. ❌ Predictive and Prescriptive AI toward Optimizing Wildfire Suppression

作者: Leonard Boussioux, Alexandre Jacquillat, Ryne Reger, Jacob Wachspress 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04510v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dynamics on a time-state network, and linking constraints between them. We develop a two-sided branch-and-price-and-cut algorithm based on: (i) a two-sided column generation scheme that generates fire suppression plans and crew routes iteratively; (ii) a new family of cuts exploiting the knapsack structure of the linking constraints; and (iii) novel branching rules to accommodate non-linear wildfire dynamics. We also propose a data-driven double machine learning approach to estimate wildfire spread as a function of covariate information and suppression efforts, mitigating observed confounding between historical crew assignments and wildfire growth. Extensive computational experiments show that the optimization algorithm scales to otherwise intractable real-world instances; and that the methodology can enhance suppression effectiveness in practice, resulting in significant reductions in area burned over a wildfire season and guiding resource sharing across wildfire jurisdictions.

97. ❌ Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

作者: Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04506v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched against the learned features, followed by two-stage 3D clustering to retrieve relevant Gaussian groups. This enables our framework to identify arbitrary objects in 3D scenes based on natural language descriptions, without requiring category supervision or manual annotations. Experiments on standard benchmarks demonstrate that Ilov3Splat outperforms prior open-vocabulary 3D-GS methods in both object selection and instance segmentation, offering a flexible and accurate solution for language-driven 3D scene understanding. Project page: https://csiro-robotics.github.io/Ilov3Splat.

98. ❌ JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions

作者: Leying Zhang, Bowen Shi, Haibin Wu, Bach Viet Do, Yanmin Qian 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04505v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities, and instructional flexibility. To address these bottlenecks, we propose JASTIN, a generalizable, instruction-driven audio evaluation framework that formulates audio assessment as a self-instructed reasoning task. JASTIN bridges a frozen high-performance audio encoder with a fine-tuned LLM backbone via a trainable audio adapter. To ensure robust zero-shot generalization, we introduce a comprehensive instruction following data preparation pipeline, incorporating Multi-Source, Multi-Task, Multi-Calibration, and Multi-Description data. Experimental results demonstrate that JASTIN achieves state-of-the-art Pearson and Spearman correlations with human subjective ratings. It consistently outperforms general MLLMs across speech, sound, music, and out-of-domain evaluation tasks without the need for task-specific retraining.

99. ❌ SpecPL: Disentangling Spectral Granularity for Prompt Learning

作者: Jingtao Zhou, Xirui Kang, Feiyang Huang, Lai-Man Po 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04504v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decompose visual signals into semantic low-frequency bands and granular high-frequency details. A frozen Visual Semantic Bank anchors text representations to universal low-frequency invariants, mitigating overfitting. Crucially, fine-grained discrimination is driven by counterfactual granule training: by permuting high-frequency signals, we compel the model to explicitly distinguish visual granularity from semantic invariance. Uniquely, SpecPL serves as a universal plug-and-play booster, revitalizing text-oriented baselines like CoOp and MaPLe via visual-side guidance. Experiments on 11 benchmarks demonstrate competitive state-of-the-art performance, achieving a new performance ceiling of 81.51% harmonic-mean accuracy. These results validate that spectral disentanglement with counterfactual supervision effectively bridges the gap in the stability-generalization trade-off. Code is released at https://github.com/Mlrac1e/SpecPL-Prompt-Learning.

100. ❌ DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning

作者: Yuancheng Wei, Haojie Zhang, Linli Yao, Lei Li, Jiali Chen, Tao Huang, Yiting Lu, Duojun Huang, Xin Li, Zhao Zhong 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04503v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However, existing benchmarks lack diversity and compositional complexity, and standard lexical-overlap metrics (e.g., BLEU, METEOR) fail to capture semantic consistency or penalize hallucinations, which together prevent a comprehensive and robust evaluation of multimodal large language models (MLLMs) on IDC. To address these gaps, we introduce DiffCap-Bench, a comprehensive IDC benchmark covering ten distinct difference categories to ensure diversity and compositional complexity. Furthermore, we propose an LLM-as-a-Judge evaluation protocol grounded in human-validated Difference Lists, enabling a robust assessment of models’ ability to both capture and describe visual changes. Through extensive evaluation of state-of-the-art MLLMs, we reveal significant performance gaps between proprietary and open-source models, highlight the critical importance of reasoning capability, and identify clear limitations in model scaling. Our framework also demonstrates strong alignment with human expert judgments and strong correlation with downstream image editing data construction quality. These findings establish DiffCap-Bench as both a reliable IDC evaluation framework and a practical predictor of downstream utility. The benchmark and code will be made publicly available to support further research.

101. ❌ Implicit Representations of Grammaticality in Language Models

作者: Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05197v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.

102. ❌ MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

作者: Perry E. Radau 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05175v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI practice. Purpose: We developed MRI-Eval, a tiered benchmark for relative model comparison on MRI physics and GE scanner operations knowledge using primary multiple-choice questions (MCQ), with stem-only and primed diagnostic conditions as complementary analyses. Methods: MRI-Eval includes 1365 scored items across nine categories and three difficulty tiers from textbooks, GE scanner manuals, programming course materials, and expert-generated questions. Five model families were evaluated (GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6, Gemini 2.5 Pro, Llama 3.3 70B). MCQ was primary; stem-only removed options and used an independent LLM judge; primed stem-only tested responses to incorrect user claims. Results: Overall MCQ accuracy was 93.2% to 97.1%. GE scanner operations was the lowest category for every model (88.2% to 94.6%). In stem-only, frontier-model accuracy fell to 58.4% to 61.1%, and Llama 3.3 70B fell to 37.1%; GE scanner operations stem-only accuracy was 13.8% to 29.8%. Conclusion: High MCQ performance can mask weak free-text recall, especially for vendor-specific operational knowledge. MRI-Eval is most informative as a relative comparison benchmark rather than an absolute competency measure and supports caution in using raw LLM outputs for GE-specific protocol guidance.

103. ❌ Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction

作者: Yucheng Ruan, Ling Huang, Qika Lin, Kai He, Mengling Feng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05121v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on semantic representations and often produce overconfident predictions under ambiguous, noisy, or shifted data. Moreover, most methods lack reliable uncertainty estimation, undermining trust in risk-sensitive mental health applications. To address these limitations, we formulate the task as a multi-view learning problem that integrates semantic information from encoder-only models with higher-level reasoning information from decoder-only models, where reasoning-aware representations and uncertainty modeling are obtained in a trustworthy manner. To ensure reliable fusion, we adopt an evidential learning framework based on Subjective Logic to explicitly model uncertainty and introduce an evidential fusion strategy that balances complementary views while discounting unreliable evidence. Benchmarking on three real-world datasets, Dreaddit, SDCNL, and DepSeverity, reports accuracies of 0.835, 0.731, and 0.751, respectively, demonstrating its potential for reliable mental health prediction. Additional experiments on robustness to noise and case studies for interpretability confirm that our proposed framework not only improves predictive performance but also provides trustworthy uncertainty estimates and human-understandable reasoning signals, making it suitable for risk-sensitive applications in mental health assessment.

104. ❌ The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences

作者: Hubert Plisiecki, Sabina Siudaj, Kacper Dudzic, Anna Sterna, Maciej Gorski, Karolina Drozdz, Marcin Moskalewicz 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05080v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p<.0001$). To test this hypothesis at the item level, we introduce the Pinocchio score ($π_i$), the ratio of inter-model response variance under neutral prompting to that under a human-simulation prompt, as an annotation-free measure of each item’s experiential demand. $π_i$ predicts condition-induced shifts in primary factor loading magnitudes ($ρ=-.215$, $p<.0001$, $n=1292$–$1310$ items), confirming that between-model divergence on experiential items is structured rather than noisy. Applying PCA to per-model EFA scores across all questionnaires reveals one dominant dimension, the Pinocchio Axis ($Π$): the degree to which a model presents itself as a locus of phenomenal experience rather than a system of behavioral responses. This axis captures 47.1% of cross-questionnaire between-model variance in primary factor scores and converges with item-level Pinocchio scores ($r=.864$). Marked within-provider divergence across closely related model variants is consistent with post-training fine-tuning as a key contributor, supporting the interpretation that $Π$ reflects a training-shaped self-representational tendency governing how a model treats experiential language as self-applicable. The dominant axis of between-model psychometric variation is therefore not a conventional personality trait but a self-representational stance toward one’s own nature as an experiencer.

105. ❌ When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise

作者: Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05045v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.

106. ❌ Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals

作者: Gijs van Dijk 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05025v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head’s distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, attention divergence is highly predictive of answer correctness and performs competitively with existing uncertainty estimation methods. We find that this signal is concentrated in middle layers and on factual tokens such as named entities and numbers, suggesting that attention dynamics provides an efficient and interpretable white-box signal of model uncertainty.

107. ❌ Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

作者: Senkang Hu, Yong Dai, Xudong Han, Zhengru Fang, Yuzhi Zhao, Sam Tak Wu Kwong, Yuguang Fang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04984v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory level and therefore cannot assign credit to intermediate turns. We propose Self-Induced Outcome Potential (SIOP), which treats semantic clusters of final answers as latent future outcome states for potential-based turn-level credit assignment. For each query, SIOP samples multiple rollouts, clusters final answers into semantic outcome modes, and builds a reliability-aware target distribution over these states. It then rewards turns for increasing posterior support for reliable future states using a tractable cluster-level approximation. The objective generalizes information-potential shaping from gold-answer supervision to settings without task-specific gold verifiers while avoiding the broadcasted rollout-level advantages used by standard GRPO. We formalize the framework, characterize its supervised gold-answer limit, and show that SIOP improves average performance over verifier-free outcome-level baselines on seven search-augmented agentic reasoning benchmarks while approaching a gold-supervised outcome baseline. Code is available at https://github.com/dl-m9/SIOP.git.

108. ❌ Conceptors for Semantic Steering

作者: Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai, Lyle Ungar, Sharath Chandra Guntuku, Neville Ryant, João Sedoc 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04980v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept’s full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.

109. ❌ Why Expert Alignment Is Hard: Evidence from Subjective Evaluation

作者: Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04972v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. First, alignment difficulty varies substantially across experts, suggesting that expert evaluation styles differ widely in their distance from a model’s prior behavior. Second, explicit criteria and reasoning do not always improve alignment, indicating that expert judgment is not fully captured by verbalized rules. Third, editing is sensitive to both the number and the identity of examples, with small numbers of edits providing useful but unstable gains. Fourth, alignment difficulty differs across evaluation dimensions: dimensions grounded more directly in proposal content are easier to align, while dimensions requiring external knowledge or value-based judgment remain harder. Taken together, these results suggest that expert alignment is difficult not only because of model limitations, but also because subjective evaluation is inherently heterogeneous, partly tacit, dimension-dependent, and temporally unstable.

110. ❌ TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

作者: Minjie Qiang, Mingming Zhang, Xiaoyi Bao, Xing Fu, Yu Cheng, Weiqiang Wang, Zhongqing Wang, Ningtao Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04962v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outputs, whereas text embedding models often fail to capture tabular structure and numerical semantics. To bridge this gap, we first introduce the Tabular Embedding Benchmark (TabBench), a comprehensive suite designed to evaluate the tabular understanding capability of embedding models. We then propose TabEmbed, the first generalist embedding model that unifies tabular classification and retrieval within a shared embedding space. By reformulating diverse tabular tasks as semantic matching problems, TabEmbed leverages large-scale contrastive learning with positive-aware hard negative mining to discern fine-grained structural and numerical nuances. Experimental results on TabBench demonstrate that TabEmbed significantly outperforms state-of-the-art text embedding models, establishing a new baseline for universal tabular representation learning. Code and datasets are publicly available at https://github.com/qiangminjie27/TabEmbed and https://huggingface.co/datasets/qiangminjie27/TabBench.

111. ❌ Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir

作者: Mullosharaf K. Arabov, Svetlana S. Khaybullina 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04948v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of results, each configuration was trained with three different random seeds. The lowest perplexity on the test set was obtained for GPT-2 medium with full fine-tuning (3.34). Meanwhile, QLoRA applied to Mistral-7B (3.79) and Phi-2 (3.81) achieved comparable quality with over 40 times fewer trainable parameters. However, we also observed cases of significant quality degradation when using PEFT for certain architectures (e.g., DeepSeek-7B with rank 8, perplexity = 129.55), indicating that the outcome depends critically on the choice of the base model and its tokenizer. Additionally, a qualitative analysis of generated texts based on Bashkir prompts revealed that models with the best perplexity do not necessarily produce the most coherent outputs: QLoRA-tuned models generated monolingual Bashkir continuations, whereas the fully fine-tuned model with the lowest perplexity frequently switched to English. The results suggest that QLoRA on 7B-scale models offers an effective compromise between quality and computational cost for Bashkir. To ensure reproducibility, open data, code, and trained adapters will be released upon acceptance.

112. ❌ UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning

作者: Ivan Kartáč, Kristýna Onderková, Jan Bronec, Zdeněk Kasner, Mateusz Lango, Ondřej Dušek 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04941v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameters). The system consists of an LLM-based parser that translates natural language syllogisms to a first-order logic (FOL) representation, an automated theorem prover, and two optional modules: machine translation for multilingual inputs and a symbolic retrieval component for the identification of relevant premises. The system achieves competitive accuracy and relatively low content effect on most subtasks. Our ablations show that this approach outperforms LLM-based zero-shot baselines in this parameter size range, but also reveal limited multilingual capabilities of small LLMs. Finally, we include a discussion of the task’s main ranking metric and analyze its limitations.

113. ❌ Unintended Negative Impacts of Promotional Language in Patent Evaluation

作者: Bingkun Zhao, Chenwei Zhang, Hao Peng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04926v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional words to study the association between promotional language and patent evaluation outcomes among 2.7 million USPTO patent applications. Our large-scale study reveals three unexpected findings. First, in contrast to scientific evaluation, we find that a higher frequency of promotional words is negatively associated with the probability of an application being (i) granted a patent, (ii) transferred ownership, and (iii) successfully appealed. This promotional penalty holds even after accounting for a range of confounding factors and is largely robust across different technological areas. Among matched samples, the difference in the success rate between the lowest and highest promotional density quintile is 5.5, 5.9, and 5.3 percentage points for patentability, transferability, and rejection reversal. Second, contrary to institutional skepticism, we show that promotional language is not a mask of weak technology, but objectively reflects the degree of combinatorial novelty and future citation impact. Third, digging into the mechanisms, we find that the tolerance to promotional framing is strongly moderated by human factors, with men and experienced examiners showing a higher acceptance of promotional narratives than women and novice examiners. By revealing an emerging paradox in the patent system, our study offers theoretical and practical implications for improving patent evaluation through more objective scrutiny of linguistic patterns in patent filings.

114. ❌ Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization

作者: Xiyan Fu, Wei Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04920v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level training paradigm fails to capture the global compositional structure required for generalizing to unseen combinations. In this work, we investigate whether compositional generalization can instead be improved through outcome-level reinforcement learning. We adopt Group Relative Policy Optimization to optimize models based on feedback on their final outputs. Within this framework, we explore both a simple binary outcome reward and a composite reward that provides additional composition feedback. Experiments on multiple compositional benchmarks show that reinforcement learning improves compositional generalization compared to supervised fine-tuning. Further analysis reveals that supervised models tend to overfit frequent training compositions, whereas reinforcement learning improves compositional generalization by reshaping the output distribution, particularly for more complex composition types.

115. ❌ Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training

作者: Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang, Junhao Su 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04913v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT

116. ❌ Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics

作者: Dominik Dahlem, Diego Maniloff, Mac Misiura 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04893v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.

117. ❌ A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset

作者: Vita Anggraini, Cintya Bella, Bastian, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04888v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep learning model exhibited mild overfitting. These findings suggest that for medium-scale informal text data, classical machine learning with robust feature extraction can outperform more complex deep learning approaches. Finally, the trained models were integrated into an interactive web application using Streamlit and deployed on Hugging Face Spaces for public access.

118. ❌ Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm

作者: Ridho Benedictus Togi Manik, Muhammad Aqil Ramadhan, Ihsan Maulana Yusuf, Luluk Muthoharoh, Ardika Satria, Martin Clinton Tosima Manullang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04887v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. By utilizing a secondary dataset of YouTube comments retrieved from e-commerce review videos, the raw text underwent rigorous preprocessing to generate normalized numerical features. The experimental results demonstrate that the PyCaret-optimized machine learning framework delivers superior classification resilience. Beyond standard performance metrics, lexical evaluations and feature-importance mapping uncover a notable phenomenon: e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction.

作者: Arnault Chatelain, Étienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot, Emile Chapuis, Abdelkrim Beloued, Marie Candito, Nicolas Hervé, Didier Schwab 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04886v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks – standardized tools for assessing computational systems – are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress – they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely yield better performance on classic and contemporary tasks in disciplines as diverse as history, sociology, political science or economics. This is all the more pressing as these disciplines are quickly turning to LLMs for assistance. To address this gap, we introduce BenCSSmark, a benchmark composed of datasets annotated by computational social scientists. By integrating social scientific perspectives into benchmarking, BenCSSmark seeks to promote more robust, transparent, and socially relevant AI systems and to foster efficient collaboration.

120. ❌ A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter

作者: Tanty Widiyastuti, Mayada, Adisty Syawalda Ariyanto, Luluk Muthoharoh, Ardika Satria, Martin Clinton Tosima Manullang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04885v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase patterns and bidirectional context. The benchmark is built from the released 13,130-row annotation table, whose HS label yields a 58:42 class ratio. On the held-out split, CNN-BiLSTM achieves the best result with 83.8% accuracy, 79.8% precision, 82.7% recall, and 81.2% F1-score. Within the PyCaret branch, Random Forest is the strongest conventional model with 77.2% accuracy and 77.0% F1-score. The neural branch therefore improves accuracy by 6.6 points and F1-score by 4.2 points. Exploratory corpus analysis, learning curves, and confusion matrices show that the dataset is short-text, moderately imbalanced, and still difficult because many decisions depend on local lexical cues plus short contextual composition. The study concludes that PyCaret AutoML is an effective conventional benchmarking framework, whereas CNN-BiLSTM is the stronger end model for the reported benchmark setting.

121. ❌ Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models

作者: Huatian Zhang, Zhendong Mao, Lei Zhang, Yongdong Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04874v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level preference into fine-grained supervision on visual fidelity. To safeguard vision-related tokens that are prone to hallucination, existing methods typically allocate training emphasis according to the model’s self-assessed visual sensitivity signals. However, such sensitivity, estimated by a model still under training, introduces self-referential bias: reinforcing already well-learned visual cues while neglecting hard-to-perceive but critical details, thereby limiting deeper alignment. In this work, we propose an Uncertainty-aware Exploratory Direct Preference Optimization (UE-DPO) method for MLLMs, which enables the model to uncover its cognitive deficiencies and actively explore for self-correction, guided by token-level epistemic uncertainty. Specifically, we first quantify the uncertainty from the model’s failure to ground token predictions in the given image. Then, based on an uncertainty-aware exploration intensity, we encourage more learning pressure on visually deficient tokens in preferred samples, and alleviate the over-penalization of beneficial knowledge in dispreferred samples. Further, we provide a theoretical justification for our method, and extensive experiments demonstrate its effectiveness and robustness.

122. ❌ Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment

作者: Maria Luongo, Davide Marocco, Nicola Milano 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04873v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability and generalizability across contexts. The present study introduces a theory-driven and fully unsupervised framework for measuring psychological states directly from natural language using semantic projection. Psychological constructs were operationalized as interpretable semantic axes derived from lexical anchors and items from validated clinical scales assessing depression, anxiety, and worry. Participants textual responses were embedded using Sentence-BERT and projected onto these axes to generate continuous psychological scores across multiple response formats, including selected words, generated words, phrases, and free-text responses. Projection scores were evaluated through correlations with standardized clinical measures , split-half reliability analyses, attenuation corrections, distributional similarity using Wasserstein distance, and comparisons with lexicon-based sentiment analysis (VADER). Results showed strong associations between projection scores and clinical measures, particularly for structured formats such as selected words, written words, and phrases. Free-text responses produced weaker results when analyzed as whole texts, but performance improved substantially when sentence-level aggregation strategies were applied. These findings support semantic projection as an interpretable and scalable alternative to supervised language models for psychological assessment and highlight the importance of response format and text-processing strategies in language-based mental health measurement.

123. ❌ Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations

作者: Ge Lei, Samuel J. Cooper 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04764v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing that it depends strongly on prompt text and query protocol. We introduce an uncertainty-alignment criterion that measures whether model uncertainty tracks residual ambiguity among sample-consistent functions. Across controlled inference tasks and Bayesian optimization studies, we find that structural prompts act as effective priors, POINTWISE and JOINT querying induce different beliefs, and sequential evidence leads to non-monotonic, order-sensitive confidence updates. These effects change downstream acquisition decisions and regret, showing that elicitation protocol is part of the LLM surrogate specification, not a formatting detail.

124. ❌ Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL

作者: Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao, Xuemei Dong, Mengdie Chu, Xiang Qi, Pingfu Chao 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04719v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.

125. ❌ Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs

作者: Aofan Liu, Jingxiang Meng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04665v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversational prose, the requested format dissolves, and exact-match evaluation pipelines silently misjudge the result. To make this measurable, we release PARACONSIST, a 900-prompt benchmark of 150 base queries with five lexical, syntactic, and semantic-expansion prompt variants each, and a Semantic Consistency Score that decomposes prompt-variant robustness into answer consistency, sentence-BERT semantic similarity, and length stability. Under a whole-word answer-set match, only ~22% of closed-form variant responses preserve the ground-truth label inside their output, while ~78% drift away from the answer space entirely. In our pool, the dominant predictor of collapse is task structure rather than model identity, with model differentiation jointly carried by answer consistency and length stability. Robustness audits should therefore track response-mode preservation as a first-class reliability target alongside answer accuracy.

126. ❌ CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning

作者: Shuai-long Lei, Xiaobin Zhu, Jiarui Liang, Guoxi Sun, Zhiyu Fang, Xu-Cheng Yin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04652v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typically focus on only one of these sources, which limits the ability to fully exploit the complementary predictive signals in TKGs. To address this, we propose CHE-TKG, a novel collaborative dual-view learning framework for TKG reasoning. CHE-TKG explicitly separates and jointly models historical evidence and evolutionary dynamics, aiming to learn and exploit their complementary predictive signals. Specifically, CHE-TKG constructs a historical evidence graph to capture long-term structural regularities and stable relational constraints, alongside an evolutionary dynamics graph to model temporal transitions and recent changes, with dedicated encoders for each view. We further employ relation decomposition and a contrastive alignment objective to better capture the predictive signals across the two views. Extensive experiments demonstrate that CHE-TKG achieves state-of-the-art performance on multiple benchmarks.

127. ❌ FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation

作者: Guangsheng Bao, Hongbo Zhang, Han Cui, Yanbin Zhao, Yue Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04651v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification and language modeling benchmarks, FAAST matches or exceeds backprop-based adaptation while reducing adaptation time by over 90% and is competitive to memory/context-based adaptation while saving memory usage by up to 95%. These results demonstrate FAAST as a highly efficient, scalable solution for supervised task adaptation, particularly for resource-constrained models. We release the code and models at https://github.com/baoguangsheng/faast.

128. ❌ Graph-Augmented LLMs for Swiss MP Ideology Prediction

作者: Yifei Yuan, Luis Salamanca, Sophia Schlosser, Laurence Brandenberger 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04643v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promising results in estimating MPs’ ideological stances, there are more actors and elements in the parliamentary system, and relations between them, that could provide a wider and more informative picture. However, due to the complexity of integrating them in the prediction task, these additional elements are generally ignored. In this work, we propose an LLM framework, PG-RAG, that implements a retrieval-augmented generation pipeline: it first queries a political knowledge graph (KG) and then integrates the resulting graph-structured information into the context. This allows for capturing both textual semantics and inter-MP relationships, another relevant information source in any parliamentary system. We evaluate the approach on the task of ideology prediction, using data from a Swiss parliamentary dataset. When comparing graph-augmented models against several state-of-the-art baselines, the results demonstrate that incorporating this enriched information, which encodes information about different entities and relations, improves prediction performance. These results help to highlight the value of domain-specific relational information in modeling political behavior.

129. ❌ TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)

作者: Mullosharaf K. Arabov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04583v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library that provides the first comprehensive pipeline for processing authentic Tajik text while preserving the original Cyrillic orthography. The library implements a modular architecture centered around a unified Doc object, enabling sequential application of components for cleaning, normalization, tokenization (including subword BPE), morphemic segmentation, part-of-speech tagging, stemming, lemmatization, and sentence splitting. A novel unified morphology engine is introduced, offering controlled and deep analysis modes that significantly improve handling of Tajik’s agglutinative nominal and verbal inflections. The release further incorporates a lexicon-based sentiment analyser and pre-trained Word2Vec/FastText embeddings loaded directly from the Hugging Face Hub. To ensure reproducibility and facilitate future research, four accompanying linguistic datasets – a POS-tagged corpus (52.5k entries), a sentiment lexicon (3.5k entries), a toponym gazetteer (5.6k entries), and a personal names dataset (3.8k entries) – have been openly published under permissive licenses. The library’s reliability is validated by an extensive test suite of 616 automated tests achieving 93% source code coverage. TajikNLP thus establishes a foundational technological infrastructure for Tajik language processing, lowering the barrier to entry for both academic and industrial applications in low-resource Cyrillic-script environments.

130. ❌ Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus

作者: Mullosharaf K. Arabov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04576v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world’s languages, their capacity for grammatical analysis of Tajik has remained unexplored until now. The aim of this study is to fill this gap through a systematic comparison of classical neural network architectures and modern multilingual transformers. Experiments were conducted on the TajPersParallel corpus, a parallel lexical resource comprising approximately 44,000 dictionary entries. Due to the absence of full-fledged example sentences in the current version of the corpus, the task was performed at the level of isolated lexical units, representing a challenging case of context-independent classification. The study compares the following architectures: a recurrent BiLSTM-CRF model, as well as multilingual models XLM-RoBERTa (large), mBERT, ParsBERT (Persian), and ruBERT (Russian), adapted using the parameter-efficient fine-tuning method LoRA. The testing results showed that the best performance is achieved by the mBERT + LoRA model (macro F1-score = 0.11, weighted F1-score = 0.62). It was established that in the absence of syntactic context, all models experience significant difficulty in resolving morphological ambiguity, successfully classifying primarily high-frequency classes (“noun,” “adjective”) while demonstrating zero effectiveness for rare function words. Zero-shot evaluation revealed the greatest typological proximity of Tajik to Persian (ParsBERT) and Russian (ruBERT). The obtained results form a foundation for further research and development in the field of automatic processing of the Tajik language.

131. ❌ Open-Source Image Editing Models Are Zero-Shot Vision Learners

作者: Wei Liu, Jiaxin Lin, Rui Chen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04566v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box. We conduct a systematic evaluation of three open-source image-editing models – Qwen-Image-Edit, FireRed-Image-Edit, and LongCat-Image-Edit – on dense visual prediction tasks \emph{without any fine-tuning}. We benchmark monocular depth estimation on NYUv2 and DIODE, surface normal estimation on NYUv2, and semantic segmentation on Cityscapes, covering both geometric and semantic scene understanding. Results show that open-source image-editing models exhibit non-trivial zero-shot visual understanding. On NYUv2 surface normals, FireRed-Image-Edit achieves a mean angular error of $17.69^\circ$, surpassing the fine-tuned Marigold ($20.86^\circ$) and matching the instruction-tuned Vision Banana ($17.78^\circ$) without any task-specific training. On NYUv2 depth estimation, LongCat-Image-Edit obtains $δ_1{=}0.822$ with affine alignment, and Qwen-Image-Edit leads on DIODE Indoor ($δ_1{=}0.868$). On Cityscapes semantic segmentation, Qwen-Image-Edit reaches 25.7 mIoU at the 19-class level and 49.5 mIoU at a coarser 7-category level. By comparing three independently trained editors, we test whether zero-shot vision ability is an emergent property of image-editing pretraining rather than a model-specific artifact. Code, evaluation scripts, and all results are publicly released to serve as a reproducible baseline for future work.

132. ❌ The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil

作者: Brielen Madureira, Andreas Niekler, Marc Keuschnigg, Mariana Madruga de Brito 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04552v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headlines. However, systematic analyses of what makes these events be discussed abroad are still missing. Addressing this gap requires representative, validated and country-specific news datasets. This paper presents a peak analysis of 2k news about Brazilian fires and landslides in German newspapers from 2000 to 2024. Using time series segmentation to detect news event peaks, we examine the extent to which they can be temporally aligned with observations in national and global disaster databases.

133. ❌ UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding

作者: Yepeng Weng, Qiao Hu, Takehisa Yairi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04543v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exploit the coupling between horizontal and vertical dimensions of candidate trees. In this paper, we propose a unified perspective that casts tree-based verification as a conditional OT problem. Our key insight is that vertical dependencies can be abstracted through prefix acceptance probabilities, which act as dynamic scaling factors to actively guide horizontal draft selection. Based on this principle, we introduce UniVer, a verification algorithm that jointly optimizes across tree levels by composing local optimal transport plans under prefix constraints. We prove that UniVer remains lossless and achieves the optimal acceptance rate under the proposed conditional framework. Extensive experiments across different tasks and models demonstrate that UniVer improves acceptance length by 4.2% to 8.5% over standard recursive rejection sampling without replacement, while maintaining exact distributional alignment with the target model.

134. ❌ Distilling Bayesian Belief States into Language Models for Auditable Negotiation

作者: Zongqi Cui, Baihan Lin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04507v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implicit and difficult to inspect. We propose BOND (Bayesian Opponent-belief Negotiation Distillation), a framework for auditable negotiation. BOND consists of an LLM-based Bayesian teacher that scores dialogue contexts against the six possible opponent priority orderings, updates a posterior over those orderings, and uses the posterior for menu-based decision making, as well as a smaller 8B student language model that emits both negotiation actions and normalized posterior beliefs as tagged text. In the CaSiNo negotiation dataset, BOND outperforms the state-of-the-art and achieves mean Brier score 0.085 over opponent-priority posteriors. The distilled student preserves much of this belief signal, achieving Brier 0.114, below the uniform six-ordering reference of 5/36, approximately 0.139. Compared with a 70B structured-CoT baseline, the significantly smaller 8B student model yields substantially better elicited posterior calibration. We further showcase auditability through posterior trajectories, belief-versus-policy error decomposition, and posterior-prefix interventions. These diagnostics reveal that distillation preserves a scoreable belief report more strongly than causal belief-conditioned control, making weak belief-action coupling visible, not hidden.

135. ❌ Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties

作者: Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04500v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VACAI-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.

136. ❌ SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

作者: Zhenliang Zhang, Wenqing Wang, Yong Hu, Yaming Yang, Jiaheng Gao, Chen Shen, Xiaojun Wan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04496v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention dilution. In parallel, specialized LTU agents often sacrifice fidelity through task-agnostic abstractions like graph construction or indexing. We identify a key insight for LTU: query-relevant information is typically sparse relative to the full document, so effective reasoning should rely on a query-sufficient subset rather than the entire context. To address this, we propose SCOUT, a new paradigm for LTU that shifts from passive processing to active information foraging. It treats the document as an explorable environment and answers from a compact, provenance-grounded epistemic state. Guided by state-level gap diagnosis, SCOUT adaptively alternates between coarse-to-fine exploration and anchored state updates that progressively contract its epistemic state toward query sufficiency. Experiments show that SCOUT matches state-of-the-art proprietary models while reducing token consumption by up to 8x. Moreover, SCOUT remains stable as context length scales, substantially alleviating the practical cost-performance trade-off.

137. ❌ CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

作者: Zhipeng Song, Yizhi Zhou, Xiangyu Kong, Jiulong Jiao, Xuezhou Ye, Chunqi Gao, Xueqing Shi, Yuhang Zhou, Heng Qi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04495v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator’s uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a document usefulness signal. CAR estimates confidence through the semantic consistency of multiple sampled answers under query-only and query-document conditions. Documents that significantly increase confidence are promoted, those that decrease confidence are demoted, and uncertain cases preserve the baseline order, while a query-level gate avoids unnecessary intervention on already confident queries. Experiments on four BEIR datasets show that CAR consistently improves NDCG@5 across sparse and dense retrievers, LLM-based and supervised rerankers, and four LLM backbones. Notably, CAR improves the YesNo reranker by 25.4 percent on average under Contriever retrieval, and its ranking gains strongly correlate with downstream generation F1 improvements, achieving Spearman rho = 0.964.

138. ❌ A Hybrid Method for Low-Resource Named Entity Recognition

作者: Do Minh Duc, Quan Xuan Truong, Viet Tran Hong, Le Hoang Anh, Mac Thi Minh Tra, Nguyen Van Thuy, Le Hai Ha, Vinh Nguyen Van 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04489v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pipeline: first, a rule-based component reduces label complexity by grouping relational and special categories; second, pre-trained language models are fine-tuned for high-precision extraction. A post-processing module is then utilized to restore fine-grained labels, preserving expressiveness for application-level usability. To mitigate data scarcity, a scalable data augmentation strategy leveraging Large Language Models (LLMs) is introduced to expand the label set without full re-annotation, which is a significant novelty of this work. The effectiveness of this method was evaluated across five specific-domain datasets, including logistics, wildlife, and healthcare. Experimental results demonstrate substantial improvements over strong RoBERTa-based baselines. Specifically, the proposed system achieved F1 scores of 90 percent in Customer Service, up from 83 percent; 84 percent in GAM, up from 73 percent; 83 percent in AI Fluent, up from 80 percent; 94 percent in PhoNER_Covid19, up from 91 percent; and 60 percent in Rare Wildlife, up from 36 percent. These findings confirm that the hybrid approach effectively captures the linguistic complexity of Vietnamese and contextual nuances in specialized domains, offering a robust contribution to low-resource NER research.

139. ❌ Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control

作者: Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04468v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of matching a fixed reference distribution, the anchor interpolates between the current model and a frozen reference to construct an intermediate target that the model distills toward, transforming global fine-tuning into a sequence of local trust-region updates in distribution space. Theoretically, we prove this anchor-based update admits a linear KL-divergence upper bound per iteration, ensuring a stable transition between model distributions. Extensive experiments on iGSM, MedCalc, and IFEval show that Anchored Learning consistently lies on the Pareto frontier of gain-stability trade-offs, achieving near-optimal performance improvements while substantially reducing degradation compared to strong baselines. For example, while standard SFT suffers from over 53% performance degradation on iGSM and MedCalc, Anchored Learning slashes this drop to under 5% while maintaining near-optimal gains (e.g., 75.2% on iGSM).

140. ❌ DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation

作者: Bryan Li, William Walden, Yu Hou, Gabrielle Kaili-May Liu, Dawn Lawrie, Jame Mayfield, Eugene Yang, Chris Callison-Burch, Laura Dietz 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04458v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a report’s coverage of query-relevant information attested in the underlying collection. While nuggets have traditionally been represented as short statements, recent work has used question-answer (QA) representations, enabling fine-grained evaluations that decouple the information need (i.e. the question) from the potentially diverse content that satisfies it (i.e. its answers). A persistent challenge for nugget-based evaluation is the need to manually curate sets of nuggets for each topic in a test collection – a laborious process that scales poorly to novel information needs. This challenge is acute in cross-lingual settings, where information is found in multilingual source documents. Accordingly, we introduce DoGMaTiQ, a pipeline for generating high-quality QA-based nugget sets in three stages: (1) document-grounded nugget generation, (2) paraphrase clustering, and (3) nugget subselection based on principled quality criteria. We integrate DoGMaTiQ nuggets with AutoArgue – a recent nugget-based evaluation framework – to enable fully automatic evaluation of generated reports. We conduct extensive experiments on two cross-lingual TREC shared tasks, NeuCLIR and RAGTIME, showing strong rank correlations with both human-in-the-loop and fully manual judgments. Finally, detailed analysis of our pipeline reveals that a strong LLM nugget generator is key, and that the system rankings induced by DoGMaTiQ are robust to outlier systems. We facilitate future research in report evaluation by publicly releasing our code and artifacts at https://github.com/manestay/dogmatiq.

141. ❌ GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking

作者: Ziqi Zhu, Adithya Suresh, Tomal Deb, Iman Abbasnejad 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04449v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Network that captures dialogue structure and turn-level dependencies, and a finetuned T5-Small encoder-decoder for sequence modeling, coordinated by an intelligent router. For complex value generation tasks, we integrate ReAct agents that perform structured reasoning over dialogue context. On MultiWOZ 2.2, GEM achieves 65.19% Joint Goal Accuracy, substantially outperforming end-to-end LLM approaches (best: 38.43%) and surpassing state-of-the-art (SOTA) methods including TOATOD (63.79%), D3ST (58.70%), and Diable (56.48%). Our graph-enhanced mixture-of-experts architecture with ReAct integration demonstrates that combining structured dialogue representation with dynamic expert routing and agent-based reasoning provides a powerful paradigm for dialogue state tracking, achieving superior accuracy while maintaining computational efficiency through selective expert activation.

142. ❌ Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting

作者: Mikhail L. Arbuzov, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey A. Shvets 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04426v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ratio, TE performs a full semantic rewrite: it decomposes the input into atomic fact lines, substitutes verbose phrases with $\sim$40 logical and relational symbols, and lets the compression ratio adapt to each document’s information density. A consequence of the line-structure rule is that compression and semantic chunking become the same operation – each output line is an independently addressable fact, so the compressed representation is simultaneously a semantic index. We evaluate TE on 4{,}081 question-answer pairs from LongBench-v2 across five OpenAI models and two difficulty levels. At roughly 50% token reduction, TE preserves 99.1% accuracy on key facts with GPT-4.1 and outperforms LLMLingua-2 at matched compression ratios on every model and task tested. The gap widens on smaller models – up to 11 percentage points on fine-detail tasks – suggesting that explicit relational structure compensates for limited model capacity. We release the grammar specification, compression prompt, benchmark data, and reference implementation.

143. ❌ Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs

作者: Yixuan Mei, Zikun Li, Zixuan Chen, Shiqi Pan, Mengdi Wu, Xupeng Miao, Zhihao Jia, K. V. Rashmi 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04357v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that enjoy better availability and deliver comparable performance per dollar to top-tier hardware. To efficiently harness these heterogeneous resources for serving multiple LLMs concurrently, we introduce Coral, an adaptive heterogeneity-aware multi-LLM serving system. The key idea behind Coral is to jointly optimize resource allocation and the serving strategy of each model replica across all models. To keep pace with shifting throughput demand and resource availability, Coral applies a lossless two-stage decomposition that preserves joint optimality while cutting online solve time from hours to tens of seconds. Our evaluation across 6 models and 20 GPU configurations shows that Coral reduces serving cost by up to 2.79$\times$ over the best baseline, and delivers up to 2.39$\times$ higher goodput under scarce resource availability.

144. ❌ Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference

作者: Mohammed Sabry, Anya Belz 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04341v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We study distillation for large language models under explicit compute constraints, with the goal of producing student models that are not only cheaper to train, but structurally efficient at inference time. While prior approaches to parameter-efficient distillation, such as LoRA, reduce adaptation cost, they leave the dense backbone unchanged and therefore fail to deliver meaningful inference savings. We propose Budgeted LoRA, a distillation framework that treats model compression as a structured compute allocation problem. Instead of using a fixed student architecture, we introduce a global compute budget that sets the final target fraction of dense computation retained. Under this constraint, the model redistributes capacity across dense and low-rank pathways via (i) module-level dense retention coefficients, (ii) adaptive low-rank allocation, and (iii) post-training compression that selectively removes, approximates, or preserves dense components. This formulation yields a family of students controlled by a single budget dial. Empirically, Budgeted LoRA matches standard LoRA perplexity at a moderate budget with a 1.74x compressed-module speedup; at an aggressive budget it achieves a 4.05x speedup with moderate perplexity degradation, and it preserves higher accuracy on function-style in-context learning probes. These results suggest that, under compute-constrained distillation, retaining behavior is less about matching perplexity or removing more parameters than it is about controlling how dense computation is transferred to low-rank pathways.

145. ❌ NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise

作者: Zhi Xu, Yun Fu 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04313v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities, they struggle to disentangle correlation from causation, particularly when observations are partially incorrect or irrelevant information is present. In this work, we introduce NoisyCausal, a new benchmark designed to evaluate causal reasoning under structured noise. Each instance is generated from a ground-truth causal graph and contextualized with a natural language scenario by injecting controllable forms of noise, such as irrelevant distractors, value perturbations, confounding, and partial observability. Moreover, we propose a modular reasoning framework that combines LLMs with explicit causal structure to address these challenges. Our method prompts the LLM to extract variables, construct a causal graph from context, and then reformulates the reasoning task as a structured prompt grounded in this graph. Rather than relying on statistical patterns alone, the LLM is guided by symbolic structure, enabling more interpretable and robust inference. Experimental results show that our method significantly outperforms standard prompting and reasoning baselines on NoisyCausal. Furthermore, it generalizes well to external benchmarks such as Cladder without task-specific tuning. Our findings highlight the importance of combining causal abstractions with language-driven reasoning to achieve faithful and robust causal understanding in LLMs.

146. ❌ SWAN: Semantic Watermarking with Abstract Meaning Representation

作者: Ziping Ye, Gourab Dey, Christos Christodoulopoulos, Charith Peris, Anil Ramakrishna, Weitong Ruan, Aram Galstyan, Kai-Wei Chang, Rahul Gupta, Ninareh Mehrabi 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04305v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce SWAN (Semantic Watermarking with Abstract Meaning Representation), a novel framework that embeds watermark signatures into the semantic structure of a sentence using Abstract Meaning Representation (AMR). In contrast to existing watermarking methods, which typically encode signatures by adjusting token selection preferences during text generation, SWAN embeds the signature directly in the sentence’s semantic representation. As the signature is encoded at the semantic structure level, any paraphrase that preserves meaning automatically preserves the signature. SWAN is training-free: watermark injection is achieved by prompting an LLM to generate sentences guided by a selected AMR template while maintaining contextual coherence, and detection uses an off-the-shelf AMR parser followed by a simple one-proportion z-test. Empirical evaluation on the RealNews benchmark shows SWAN matches state-of-the-art detection performance on unaltered watermarked text, while significantly improving robustness against paraphrasing, increasing detection AUC by up to 13.9 percentage points compared to prior methods. These results demonstrate that SWAN’s approach of anchoring watermarks in AMR semantic structures provides a simple, effective, and prompt-based method for robust text provenance verification under paraphrasing, opening new avenues for semantic-level watermarking research.

147. ❌ Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning

作者: Qihua Dong, Ruozhen He, Junwen Chen, Yizhou Wang, Xu Ma, Songyao Jiang, Yun Fu 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04304v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Advanced chart question answering requires both precise perception of small visual elements and multi-step reasoning across several subplots. While existing MLLMs are strong at understanding single plots, they often struggle with multi-step reasoning across multiple subplots. We propose HierVA, a hierarchical visual agent framework for chart reasoning that iteratively constructs and updates a working context in a joint image–text space. A high-level manager generates plans and maintains a compact context containing only key information, while specialized workers perform reasoning, gather evidence, and return results. In particular, the agent maintains separate visual and textual contexts, using a zoom-in tool to restrict the visual context. Experiments on the CharXiv reasoning subset demonstrate consistent improvements over strong multimodal baselines, and ablation studies verify that hierarchical architecture, scoped visual context, and distilled context contribute complementary gains.

148. ❌ Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs

作者: Stefano Bannò, Kate Knill, Mark Gales 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04298v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Automated essay scoring (AES) research often relies on rank-based correlation metrics to validate analytic assessment. However, such metrics obscure both intrinsic intercorrelations among analytic dimensions that arise from the structure of writing proficiency itself and halo effects, whereby holistic impressions bleed into fine-grained component scores. As a result, high correlations may mask a system’s true diagnostic behaviour. In this study, we propose a novel self-referential assessment evaluation framework that focuses on identifying intra-learner strengths and weaknesses rather than assessing inter-learner rankings. We conduct experiments on the publicly available ICNALE GRA, a uniquely dense second-language writing dataset annotated holistically and analytically by up to 80 trained raters. To obtain reliable reference scores, we apply two-facet Rasch modelling to calibrate rater severity and derive fair average scores across ten analytic aspects and holistic proficiency. We compare the analytic scoring performance of human operational raters and three large language models (LLMs) in a zero-shot setting. Our results show that LLMs tend to outperform single human raters in identifying relative weaknesses (negative feedback) across several proficiency aspects, while human raters remain stronger at identifying relative strengths (positive feedback). Overall, our findings highlight the limitations of rank-based evaluation for analytic assessment and demonstrate the value of intra-learner, profile-based methods for assessing and deploying LLMs in AES.

149. ❌ Material Database Agent: A Multimodal Agentic Framework for Scientific Literature Mining

作者: Achuth Chandrasekhar, Omid Barati Farimani, Radheesh Sharma Meda, Amir Barati Farimani 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04278v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Materials science workflows rely on structured and unstructured data from the vast body of available scientific literature. However, most of the experimental details remain buried in text, tables, graphs and figures. Thus, constructing databases that incorporate this data is a manual, time-consuming, and hard-to-scale process. Multimodal large language models have made it feasible to extract information from text and scientific figures with high speed and accuracy. This opens the possibility of an AI system that can create production-scale material databases. Material Database Agent (MDA) is a modular, multi-agent system architecture for converting research literature into structured databases. MDA accepts article PDFs as input, which are subsequently processed in parallel into markdown files and figures. Multiple sub-agents read these markdown files and figures in parallel to assemble sub-databases for each paper. These sub-databases are then compiled into a single tabular database by an agent. As opposed to using either a rule-based approach or a single-pass pipeline for extracting information, MDA is a specialized architecture for transforming the literature into a database in the field of materials science. More generally, this study provides a basis for positioning multimodal agentic information extraction as a viable means for constructing next-generation scientific databases from the primary literature.

150. ❌ Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

作者: Yao-Shun Chuang, Tushti Mody, Uday Pratap Singh, Shirindokht Shiraz, Chun-Teh Lee, Ryan Brandon, Muhammad F Walji, Xiaoqian Jiang, Bunmi Tokede 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04221v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Clinical named entity recognition from dental progress notes is challenging because documentation is highly unstructured, domain-specific, and often privacy-sensitive. We developed a locally deployable framework that enables small language models to self-generate, verify, refine, and evaluate entity-specific prompts for extracting multiple clinical entities from dental notes. Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization. Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks. Qwen2.5-14B-Instruct achieved the strongest baseline performance. After DPO, Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct achieved micro/macro F1 scores of 0.864/0.837 and 0.806/0.797, respectively. These findings suggest that automated prompt optimization combined with lightweight preference-based post-training can support scalable clinical information extraction using locally deployed small language models.

151. ❌ Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks

作者: Yaobo Zhang 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04217v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-γd}\cos(ωd)$, $e^{-γd}\sin(ωd)$, $d e^{-γd}\cos(ωd)$, and $d e^{-γd}\sin(ωd)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{iωd}$, rather than merely adding a separate distance channel to RoPE. We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.

152. ❌ Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages

作者: Stephen E. Moore, Mich-Seth Owusu, Akwasi Asare, Lawrence Adu Gyamfi, Paul Azunre, Joel Budu, Jonathan Asiamah, Elias Dzobo, Kelvin Newman, Edmund O. Benefo, Gerhardt Datsomor, Onesimus Addo Appiah, Ama Branoa Banful, Lucas Woedem Kpatah, Saani Mustapha Deishini, John Ayernor 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04208v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language models (LLMs) have demonstrated impressive multilingual capabilities for well-resourced languages, yet their performance on low-resource African languages remains poorly understood and largely unevaluated. This paper presents Nsanku, a systematic benchmark that evaluates the zero-shot machine translation performance of 19 open-weight and proprietary LLMs across 43 Ghanaian languages paired with English. Evaluation sentences were sourced from the YouVersion Bible platform, providing 300 sentence pairs per language. Two complementary automatic metrics are employed: Bilingual Evaluation Understudy (BLEU) and Character n-gram F-Score (chrF), alongside an average accuracy score and a cross-language consistency dimension. Nsanku represents the most comprehensive LLM translation evaluation for Ghanaian languages conducted to date. Results show that gemini-2.5-flash achieves the highest overall average score of 26.88 (BLEU: 24.60, chrF: 29.16), followed by claude-sonnet-4-5 at 24.87 (BLEU: 22.46, chrF: 27.28) and gpt-4.1 at 23.20 (BLEU: 21.15, chrF: 25.24). Among open-weight models, kimi-k2-instruct-0905 leads at an average score of 20.87. A critical finding from the consistency analysis is that no model and no language reached the Leaders quadrant of high performance and high consistency simultaneously, indicating that current LLMs are not yet reliably usable for Ghanaian language translation at scale. Siwu achieved the highest per-language average score at 25.73 while Nkonya scored lowest at 11.65. Nsanku establishes a publicly available, community-extensible evaluation infrastructure for African language NLP research.

153. ❌ The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation

作者: Oona Itkonen, Jörg Tiedemann 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04196v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Knowledge transfer, especially across related languages, has been found beneficial for multilingual neural machine translation (MNMT), but some aspects are still under-explored and deserve further investigation. A joint vocabulary is most often applied to form a uniform word embedding space, but since the impact of a disjoint vocabulary on model performance is far less studied, there is no consensus on how much knowledge transfer is mainly due to vocabulary overlap. In this paper, we present systematic experiments with joint and disjoint vocabularies, and auxiliary languages related and unrelated to the source language. We design this experiment in an out-of-domain setup in order to emphasize transfer and the impact of the auxiliary language. As expected, we yield better results with more extensive vocabulary overlaps typical for related languages, but our experiments also show that domain-match and language relatedness are more important than a joint vocabulary.

154. ❌ MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

作者: Tung Sum Thomas Kwok, Qian Qian, Xiaofeng Lin, Dongxu Zhang, Jun Han, Zhichao Yang, Davin Hill, Tamer Soliman, Sanjit Singh Batra, Robert Tillman, Guang Cheng 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04180v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large Language Models exhibit strong reasoning and semantic understanding capabilities but often hallucinate in domains that require expert knowledge, among which fabrications, the generation of factually incorrect yet fluent statements, pose the greatest risk in medical contexts. Existing medical hallucination datasets inadequately capture fabrication phenomena due to limited fabrication coverage, stylistic disparities between human and LLM-authored texts, and distributional drift during hallucinated sample synthesis. To address this, we propose a data-centric pipeline to generate realistic and word-level fabrications that preserve syntactic and stylistic fidelity while introducing subtle factual deviations, resulting in MedFabric. Building upon this dataset, we introduce ETHER, a modular word-level fabrication detector integrating Text2Table Decomposition, Word Masking and Filling and Hybrid Sentence Pair Evaluation to enhance factual alignment. Empirical results demonstrate that MedFabric outperforms state-of-the-art detectors by over 15% on word-level fabrication benchmarks while maintaining consistent performance across structural similarities, offering a comprehensive framework for reliable and domain-specific factuality detection.

155. ❌ Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa

作者: Hoffmann Muki, Olukunle Owolabi 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04177v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification. We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies to 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero. Yet domain adaptation does not eliminate actor-based selection bias. Both adapted models show statistically significant actor bias comparable to vanilla LLMs; in Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Open-weight outputs are also fragile to geography-specific lexical framing: delegitimizing phrases produce flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, while perturbations salient in one context may not matter in another. Error trace profiling shows models mask normative bias through unfaithful rationale confabulations. In contrast, AfroConfliBERT and AfroConfliLLAMA are largely robust, with near-zero flip rates across perturbation categories. Overall, current models are not ready for unsupervised deployment in conflict monitoring. We call for fairness-aware fine-tuning to reduce actor-based selection bias, mandatory adversarial robustness evaluation against lexical manipulation, and context-specific human-in-the-loop oversight calibrated to regional difficulty.

156. ❌ Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

作者: Humam Khan, Md Tabrez Nafis, Shahab Saquib Sohail, Aqeel Khalique, Rehan Hasan Khan 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04171v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for academic writing. We have designed 80 prompts across four categories, namely, reference generation, factual explanation, abstract generation, and writing improvement. We evaluated the model using a 0-5 rubric score, which checks factual accuracy, reference validity, coherence, style consistency, and academic tone. A novel weighted metric, Hallucination Index (HI), was introduced to measure hallucination in the responses generated by the models. Some of the most widely used evaluation metrics often fail to check errors which alter sentiment in machine-translated text. We found that Grok and Copilot perform better on reference generation tasks, but they often struggle with abstract or stylistic prompts, with HI values of 0.67 and 0.70, respectively. Whereas, Gemini and ChatGPT have done well with having stronger tone control, but they lack in writing factual tasks and higher hallucination risk with HI scores of 0.53 and 0.57, respectively. Our study found that hallucination behavior does not depend solely on model architecture but also on the type of task and the prompting conditions we are providing. We propose that our work opens new research dimensions for future researchers.

157. ❌ FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

作者: Elitsa Yotkova, Violeta Kastreva, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04157v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods. We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.

158. ❌ Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation

作者: David Gringras, Misha Salahshoor 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04135v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-4o-mini zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse configuration details and abstracted upward into claims about “AI” that propagate through citations, media, and policy. We measure the ‘publication elicitation gap’ (the gap between these answers) in a pre-registered audit of 112,303 LLM-keyword-matched candidate records (2022-01 to 2026-04; 18,574 admissible, 4,766 full-paper texts retrievable), comparing tested models to the contemporaneous frontier on the Epoch AI Capabilities Index (ECI), reproduced under Arena Elo and Artificial Analysis. The median paper evaluates a model +10.85 ECI (~1.4x the distance between Claude Sonnet 3.7 and Claude Opus 4.5) behind the contemporaneous frontier at evaluation time (H1); an exploratory rational-lag baseline (H8) decomposes this into ~25% peer-review latency, ~75% excess lag. The gap is widening at +5.53 ECI/year (H2; 95% CI [+5.03, +5.83]). Meanwhile, only 3.2% of abstracts (21.2% of full-texts) disclose reasoning-mode status on reasoning-capable models (H4) and 52.5% (95% CI [48.2, 56.9]) state conclusions at the level of “AI” rather than the evaluated model(s), rising at OR = 1.23/year. Proposed remedies include API-access subsidies and editorial enforcement of reporting frameworks mandating configuration-surface disclosure (model snapshot, reasoning mode/effort, tool access, scaffolding, prompting, etc.); VERSIO-AI is a 13-item checklist (Core 3 desk-reject) extending existing frameworks at the elicitation surface, with per-DOI analysis at frontierlag.org.

159. ❌ Transformers with Selective Access to Early Representations

作者: Skye Gunasekaran, Téa Wright, Rui-Jie Zhu, Jason Eshraghian 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.03953v2

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Several recent Transformer architectures expose later layers to representations computed in the earliest layers, motivated by the observation that low-level features can become harder to recover as the residual stream is repeatedly transformed through depth. The cheapest among these methods add static value residuals: learned mixing coefficients that expose the first-layer value projection V_1 uniformly across tokens and heads. More expressive dense or dynamic alternatives recover finer-grained access, but at higher memory cost and lower throughput. The usefulness of V_1 is unlikely to be constant across tokens, heads, and contexts; different positions plausibly require different amounts of access to early lexical or semantic information. We therefore treat early-representation reuse as a retrieval problem rather than a connectivity problem, and introduce Selective Access Transformer (SATFormer), which preserves the first-layer value pathway while controlling access with a context-dependent gate. Across models from 130M to 1.3B parameters, SATFormer consistently improves validation loss and zero-shot accuracy over the static value-residual and Transformer baselines. Its strongest gains appear on retrieval-intensive benchmarks, where it improves over static value residuals by approximately 1.5 average points, while maintaining throughput and memory usage close to the baseline Transformer. Gate analyses suggest sparse, depth-dependent, head-specific, and category-sensitive access patterns, supporting the interpretation that SATFormer learns selective reuse of early representations rather than uniform residual copying. Our code is available at https://github.com/SkyeGunasekaran/SATFormer.

160. ❌ Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

作者: Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, Guohui Zhang, Wenhu Zhang, Hang Xu, Nan Jiang, Xin Han, Haoze Sun, Maoquan Zhang, Haoyang Huang, Nan Duan 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04128v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

161. ❌ Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities

作者: Devon Jarvis, Richard Klein, Benjamin Rosman, Steven James, Stefano Sarao Mannelli 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04127v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.

162. ❌ Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

作者: Tianyang Han, Hengyu Shi, Junjie Hu, Xu Yang, Zhiling Wang, Junhao Su 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.03862v2

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states in multi-step systems. To this end, we propose TraceLift, a planner-executor training framework that treats reasoning as a consumable intermediate artifact. During planner training, the planner emits tagged reasoning. A frozen executor turns this reasoning into the final artifact for verifier feedback, while an executor-grounded reward shapes the intermediate trace. This reward multiplies a rubric-based Reasoning Reward Model (RM) score by measured uplift on the same frozen executor, crediting traces that are both high-quality and useful. To make reasoning quality directly learnable, we introduce TRACELIFT-GROUPS, a rubric-annotated reason-only dataset built from math and code seed problems. Each example is a same-problem group containing a high-quality reference trace and multiple plausible flawed traces with localized perturbations that reduce reasoning quality or solution support while preserving task relevance. Extensive experiments on code and math benchmarks show that this executor-grounded reasoning reward improves the two-stage planner-executor system over execution-only training, suggesting that reasoning supervision should evaluate not only whether a trace looks good, but also whether it helps the model that consumes it. Our code is available at: https://github.com/MasaiahHan/TraceLift

163. ❌ Syn4D: A Multiview Synthetic 4D Dataset

作者: Zeren Jiang, Yushi Lan, Yihang Luo, Yufan Deng, Zihang Lai, Edgar Sucar, Christian Rupprecht, Iro Laina, Diane Larlus, Chuanxia Zheng, Andrea Vedaldi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05207v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4D is the ability to unproject any pixel into 3D to any time and to any camera. We conduct extensive evaluations across multiple downstream tasks to demonstrate the utility and effectiveness of the proposed dataset, including 4D scene reconstruction, 3D point tracking, geometry-aware camera retargeting, and human pose estimation. The experimental results highlight Syn4D’s potential to facilitate research in dynamic scene understanding and spatiotemporal modeling.

164. ❌ D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

作者: Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05204v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder’s in-context capabilities. This enables us to make the training as an on-policy self-distillation process. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on the text feature, while the teacher is conditioned on the multimodal feature of both the text prompt and the target image. Training minimizes the two predicted distributions over the student’s own roll-outs. By optimized on the model’s own trajectory and under it’s own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.

165. ❌ LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)

作者: Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan, Chen Gao, Xin Jin, Yong Li, Zhibo Chen, Sijing Wu, Kang Fu, Yunhao Li, Ziang Xiao, Huiyu Duan, Jing Liu, Qiang Hu, Xiongkuo Min, Guangtao Zhai, Manxi Sun, Zixuan Guo, Yun Li, Ziyang Chen, Manabu Tsukada, Zhengyang Li, Zhenglin Du, Yi Wen, Licheng Jiao, Fang Liu, Lingling Li, Yiwen Ren, Zhilong Song, Dubing Chen, Yucheng Zhou, Tianyi Yan, Huan Zheng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05187v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis. The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass. Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.

166. ❌ OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

作者: Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, Tianyu Pang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05185v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

167. ❌ Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging

作者: Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05161v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the Goldilocks effect through distributional divergence, and show that references with moderate similarity minimize a bias-variance trade-off in comparative visual reasoning. On the NOVA brain MRI benchmark, WALDO with Qwen2.5-VL-72B achieves $43.5_{\pm1.6}%$ mAP@30 (95% CI: [40.4, 46.7]), representing a 19% relative improvement over zero-shot baselines. Cross-model evaluation shows consistent gains: GPT-4o achieves $32.0_{\pm6.5}%$ and Qwen3-VL-32B achieves $32.0_{\pm6.6}%$ mAP@30. Paired McNemar tests confirm statistical significance ($p<0.01$). Source code is available at https://github.com/bkainz/WALDO_MICCAI26_demo .

168. ❌ PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

作者: Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen, Zehuan Huang, Yao Mu, Zhuo Chen, Chunchao Guo, Xihui Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05163v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a “physical architect” to plan a “Hierarchical Physical Blueprint” defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.

169. ❌ CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization

作者: Yu-Hsi Chen, Abd-Krim Seghouane 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05136v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), which unrolls the iterative Flury-Gautschi (FG) algorithm into fully differentiable neural layers. This approach integrates the statistical properties of CPCA into an end-to-end trainable framework, enforcing the discovery of a shared subspace across diverse domains while preserving interpretability. Experiments on four standard DG benchmarks demonstrate that CPCANet achieves state-of-the-art (SOTA) performance in zero-shot transfer. Moreover, CPCANet is architecture-agnostic and requires no dataset-specific tuning, providing a simple and efficient approach to learning robust representations under distribution shift. Code is available at https://github.com/wish44165/CPCANet.

170. ❌ A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry

作者: Jingsen Zhu, Silvia Sellán, Alexander Terenin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05095v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enables us to perform camera selection in a manner that is directly optimized for the intended use of the reconstructed data - meaning, we reduce uncertainty only in those regions that make a difference in the task at hand, as opposed to prior approaches that reduce it uniformly across space. We evaluate our method across three distinct downstream tasks: semantic classification, segmentation, and PDE-guided physics simulation. Experimental results demonstrate that our framework achieves superior task performance with fewer views compared to commonly used baselines and prior general uncertainty-reduction techniques.

171. ❌ External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images

作者: Yuxuan Chen, Arianna Bunnell, Yanqi Xu, Haoyan Yang, Thomas K. Wolfgruber, John A. Shepherd, Yiqiu Shen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05082v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-level AUROC across four density categories: A (fatty), B (scattered), C (heterogeneous), and D (extremely dense). As a downstream assessment, we also evaluated 10-year risk prediction by incorporating age and AI-derived density into the Tyrer-Cuzick model and comparing performance against a reference model using age and mammography-reported density. All three models performed best in extremely dense breasts (AUROC 0.868-0.899), with strong performance in fatty (0.814-0.838) and scattered density (0.764-0.799), and lower performance in heterogeneously dense breasts (0.699-0.729). DenseNet121 achieved the highest overall performance (micro-averaged AUROC 0.885), and performance across categories was comparable between internal and external testing. For risk modeling, age combined with AI-derived density yielded a lower AUROC than age combined with mammography-reported density (0.541 vs. 0.570; p = 0.23), with no statistically significant difference. These findings indicate that deep learning models generalize well to external data with different racial composition for breast density assessment. While performance is strongest in extremely dense breasts, heterogeneously dense remains more challenging, highlighting the need for targeted optimization.

172. ❌ A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping

作者: Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang, Justin Hagen, Harvir Ghuman, Ethan Garcia-O’Donnell, Syed Noor Qadri, Freddie Santiago, Mun Wai Lee 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05079v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong discontinuous refractive deformations. The benchmark includes both laboratory-captured real data and synthetic sequences generated for static scenes via physics-based light refraction modeling across four distortion levels and multiple surface wave types. We evaluate a spectrum of methods from simple baselines and classical registration algorithms to advanced learning-based approaches including DATUM and our proposed diffusion based V-cache for high and extreme distortions regimes. Evaluation uses both pixel-level (PSNR, SSIM), and perceptual (LPIPS, DINO, CLIP) metrics providing the first large scale analysis of geometric distortion removal. Our benchmark establishes a new foundation for developing and evaluating algorithms capable of reconstructing video from highly distorted optical environments. Our code and datasets are available at https://github.com/iafoss/refractive-mfir-benchmark.

173. ❌ FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching

作者: Andranik Sargsyan, Shant Navasardyan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05077v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image segmentation method built on the flow matching framework, which learns a time-dependent vector field to transport the image distribution to the corresponding mask distribution, optionally conditioned on a text prompt. Moreover, with our Position-Aware Instance Pairing (PAIP) training strategy, FlowDIS offers strong controllability through text prompts, enabling precise, pixel-level object segmentation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared with the best prior DIS method, FlowDIS achieves a 5.5% higher $F_β^ω$ measure and 43% lower MAE ($\mathcal{M}$) on the DIS-TE test set. The code is available at: https://github.com/Picsart-AI-Research/FlowDIS

174. ❌ Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

作者: Yuan Wu, Zhiqiang Yan, Jiawei Lian, Zhengxue Wang, Jian Yang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05072v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed projection space, where 3D reference points are uniformly sampled along pillars. However, such sampling struggles to capture the sparsity and height variations of real-world scenes, leading to ambiguous correspondences and unreliable feature aggregation. To address these challenges, we propose HiPR, a camera-LiDAR occupancy framework with Height-Guided Projection Reparameterization. HiPR first encodes LiDAR into a BEV height map to capture the maximum height of the point cloud. HiPR then adjusts the sampling range of each pillar using the height prior, enabling adaptive reparameterization of the projection space. As a result, the projected points are redistributed into geometrically meaningful regions rather than fixed ranges. Meanwhile, we mask out the invalid parts of the height map to avoid misleading the feature aggregation. In addition, to alleviate the training instability caused by noisy LiDAR-derived heights, we introduce a training-time Progressive Height Conditioning strategy, which gradually transitions the conditioning signal from ground-truth heights to LiDAR heights. Extensive experiments demonstrate that HiPR consistently outperforms existing state-of-the-art methods while maintaining real-time inference. The code and pretrained models can be found at https://github.com/Rayn-Wu/HiPR.

175. ❌ ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection

作者: Minh Anh Nguyen, Quang Huy Tran, Bao Ngoc Le, SuiYang Guang, Tuan Kiet Pham, Linh Chi Vo 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05057v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with text embeddings, but their predictions are often dominated by object affordance and phrase-level co-occurrence. As a result, a model may predict \textit{cut cake} from the presence of a knife and a cake without verifying whether the hand, tool, target, contact pattern, and object state jointly support the action. We propose \textbf{ScriptHOI}, a structured framework that represents each interaction phrase as a soft scripted state transition. Rather than treating a phrase as a single class token, ScriptHOI decomposes it into body-role, contact, geometry, affordance, motion, and object-state slots. A visual state tokenizer parses each detected human-object pair into corresponding state tokens, and a slot-wise matcher estimates both script coverage and script conflict. These two quantities calibrate HOI logits, expose missing visual evidence, and provide training constraints for incomplete annotations. To avoid suppressing valid but unannotated interactions, we further introduce interval partial-label learning, which constrains unannotated candidates with script-derived lower and upper probability bounds instead of assigning closed-world negatives. A counterfactual script contrast loss swaps individual script slots to discourage object-only shortcuts. Experiments on HICO-DET, V-COCO, and open-vocabulary HOI splits show that ScriptHOI improves rare and unseen interaction recognition while substantially reducing affordance-conflict false positives.

176. ❌ Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception

作者: Yuhu Guo, Zhikai Shen, Jiasheng Qu, Chenghao Qian, Yuming Huang, Bin Chen, Guoxing Fang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05053v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent states. The framework learns a continuous deformation manifold from paired high- and low-resolution simulations, enabling physically consistent, differentiable inference. Compared to the TacIPC, our method achieves over 65% faster simulation and {40% lower memory usage}, while maintaining better geometric fidelity. In tactile rendering and 3D surface reconstruction, our methods further improve accuracy by 25% and produce realistic depth images and surface mesh within a faster inference speed. These results demonstrate that the proposed reduced-order neural model enables high-detail, physically grounded tactile simulation with substantial efficiency gains for robotic interaction and optimization.

177. ❌ Few-Shot Learning Pipeline for Monkeypox Skin Disease Classification Using CNN Feature Extractors

作者: Md. Safirur Rashid, Sabbir Ahmed, Muhammad Usama Islam, Sumona Hoque Mumu, Md. Hasanul Kabir 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05034v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To overcome this limitation, we propose a few-shot learning (FSL) framework that employs SimpleShot, a lightweight, non-parametric, inductive classifier, for Monkeypox and pox-like skin disease recognition from limited labeled examples. The proposed pipeline passes the skin lesion images through a frozen, pretrained CNN backbone to obtain feature embeddings, which are then classified via SimpleShot using nearest-centroid comparisons in a normalized embedding space. We systematically benchmark six widely used CNN backbones as feature extractors under consistent experimental settings, enabling fair comparison. Experiments on three publicly available datasets (MSLD v1.0, MSID, and MSLD v2.0) are conducted across 2-way, 4-way, and 6-way tasks with 1-shot, 5-shot, and 10-shot configurations. Among all models, MobileNetV2_100 consistently achieves the highest accuracy. In addition, we present a cross-dataset evaluation for Monkeypox classification, revealing that binary Mpox-vs-Others transfer remains comparatively stable while multi-class performance degrades significantly under domain shift. Together, these results demonstrate the practical utility of combining inductive FSL methods with lightweight CNN backbones and highlight the importance of domain robustness for reliable real-world clinical deployment.

178. ❌ Computer-Aided Design Generation by Cascaded Discrete Diffusion Model

作者: Honghu Pan, Xiaoling Luo, Yongyong Chen, Zhenyu He, Pengyang Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05031v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically invalid symbols. To overcome this limitation, we propose a cascaded discrete diffusion framework for CAD generation, which consists of a command diffusion for generating CAD commands and a parameter diffusion conditioned on CAD commands. Unlike isotropic Gaussian perturbation, the forward process of our approach operates directly over categorical token distributions using delicate transition matrices. For commands, we adopt an absorbing-state transition matrix that progressively corrupts tokens to a designated symbol; for parameters, we introduce specific transition matrices tailored to heterogeneous attributes: a Gaussian kernel for coordinate continuity, a scale-invariant kernel for dimensional values, and a prior-preserving kernel for boolean attributes. The reverse process is achieved by two denoising networks: a Transformer-based encoder for command recovery, and a parameter network with extra local self-attention for command-level interaction and cross-attention for conditional injection. Experiments on the DeepCAD dataset show that the proposed approach surpasses existing autoregressive and continuous diffusion models on unconditional generation metrics, while qualitative results validate effective controllability in conditional generation tasks. Source codes will be released.

179. ❌ Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification

作者: Wen Wen, Hao Chen, Shiliang Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05027v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed semantic space, acting as a global semantic reference rather than a dominant learning signal. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones. Extensive experiments show that PAD substantially outperforms state-of-the-art methods across seen and unseen domains, achieving a strong balance between stability and plasticity. Project page is available at https://github.com/zu-zi/PAD.

作者: Gasser Elazab, Frank Neuhaus, Tilman Koß, Malte Splietker, Aditya Date, Michael Unterreiner, Maximilian Jansen, Olaf Hellwich 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05014v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. Our sensor suite includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Notably, our multi-LiDAR fusion yields ~500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans ~110 km and 4.7 hours across Germany and Italy. In addition, CARD provides 2D bounding boxes targeting road-topography irregularities, enabling accurate benchmarking for both geometry and perception tasks. Furthermore, we establish a standardized evaluation protocol for road surface irregularities on CARD and benchmark state-of-the-art depth estimation models to provide strong baselines. The CARD dataset is hosted on https://huggingface.co/CARD-Data.

181. ❌ Chaotic Contrastive Learning for Robust Texture Classification

作者: Joao B Florindo 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05012v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vision Transformers have set performance benchmarks, they often require extensive labeled datasets or struggle to generalize across domains due to an over-reliance on color and shape features. This paper introduces a novel framework that synergizes Self-Supervised Learning (SSL) with deterministic chaotic dynamics. We propose a chaotic contrastive pre-training strategy, where pixel-wise chaotic maps, specifically Logistic, Tent, and Sine maps, act as non-linear data augmentation techniques. These chaotic perturbations, grounded in ergodic theory, force the network to learn topologically robust features by mimicking complex environmental noise and reflectance variations. Furthermore, we introduce an attention-based feature ensemble that fuses high-level semantic representations from a supervised large backbone with low-frequency structural features from a chaos-pretrained tiny encoder. Experimental results on six texture benchmarks (FMD, UMD, KTH-TIPS2-b, DTD, GTOS, and 1200Tex) demonstrate the superiority of the proposed method, outperforming state-of-the-art approaches and achieving promising accuracies on all the analyzed datasets.

182. ❌ Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data

作者: Ali Shibli, Andrea Nascetti, Yifang Ban 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04989v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.

183. ❌ Attention-Based Chaotic Self-Supervision for Medical Image Classification

作者: Joao Batista Florindo, Amanda Pontes de Oliveira Ornelas 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04985v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet common methods like masked autoencoders (MAEs) may inadvertently destroy fine-grained diagnostic features by using random masking. In this paper, we propose a novel SSL pre-training strategy, the Chaotic Denoising Autoencoder (CDAE). Instead of masking, we apply a chaotic transformation to the input image, tasking an autoencoder to reconstruct the original. We hypothesize this forces the encoder to learn robust, domain-specific features by “inverting the chaos”. Furthermore, we propose an attentive fusion mechanism that combines features from our CDAE-trained encoder with a standard encoder, leveraging the strengths of both general and domain-specific representations. Our method is evaluated on two public medical datasets: ISIC 2018 (skin lesions) and APTOS 2019 (diabetic retinopathy). The proposed model achieves high performance, with an accuracy of 0.9221 and an F1-macro of 0.8530 on ISIC 2018, and an accuracy of 0.8644 and F1-macro of 0.7433 on APTOS 2019, demonstrating the efficacy of our approach.

184. ❌ Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification

作者: Jens van Bijsterveld, Daniele Avitabile, Fons J. Verbeek, Rita Pucci 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04904v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually specific, or can change drastically through time. We focus on techniques that will make machine learning models more responsive to skin pattern structure when extracting individual visual embeddings from images. For this, we explore image inpainting of task-specific masks as an auxiliary task to enhance ML-based individual identification from animal skin patterns. We propose a comparative analysis among four models as an encoder backbone for the individual identification task. We focus on the case study of zebrafish, which is a widely recognized biological model organism, and which exhibits individually identifying skin patterns. To evaluate encoder backbone performance, we present standard metrics for classification accuracy, embedding clustering metrics, and GradCAM visualizations.

185. ❌ ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)

作者: Raphaël Delécluse, Hazem Wannous, Laurent Guimas 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04977v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes three tracks: RGB Re-ID, Depth Re-ID, and RGB$\leftrightarrow$Depth cross-modal retrieval. Submissions are ranked using mAP and CMC-1 under a unified server-side evaluation. The final results show a clear difficulty ordering (RGB $>$ Depth $>$ Cross-Modal), highlighting both the challenge of modality-constrained retrieval and the feasibility of strong performance with modality-invariant learning. By releasing the dataset at https://zenodo.org/records/17909410, the evaluation scripts at https://github.com/RaphaelDel/ICPR-TVRID, and the accompanying documentation, TVRID establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-id.

186. ❌ VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA

作者: Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04870v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing Video TextVQA benchmarks remains limited. To better understand this gap, we conduct an upper-bound analysis through frame-wise question answering, counting a sample as correct if any frame yields the right answer, which significantly outperforms direct video-based inference and reveals a substantial performance gap. The results suggest that the primary bottleneck lies in the localization of key question-relevant evidence, rather than in reasoning capacity itself. Building on this insight, we propose a question-guided agent framework that explicitly anchors the relevant keyframes before answering. The approach operates effectively in a training-free setting and consistently surpasses direct video inference. With additional supervised fine-tuning (SFT) and reinforcement learning (RL), it achieves an average improvement of +12.12 in accuracy and +11.15 in ANLS across benchmarks, establishing new state-of-the-art results. Our study underscores the critical role of explicit keyframe anchoring for advancing Video TextVQA. The code will be publicly released.

187. ❌ 3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance

作者: Sapna Sachan, Amulya Kumar Mahto 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04856v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generates CT-like anatomical reference volumes inferred from US, without aiming to reproduce physically accurate Hounsfield Units. Paired 3D kidney US and CT volumes from the TRUSTED dataset are first spatially aligned using a landmark-based multimodal registration pipeline, creating high-quality paired inputs for supervised training of an adversarial framework. The proposed Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) model employs a 3D residual encoder-decoder generator augmented with a transformer bottleneck, enabling effective modeling of fine-grained local anatomical structures as well as long-range volumetric dependencies, while a 3D Conditional PatchGAN discriminator enforces local structural realism in the synthesized pseudo-CT volumes. Quantitative evaluation using PSNR and SSIM demonstrates that the proposed method outperforms established baselines in structural fidelity and perceptual image quality. The UD-pCT volumes provide real-time anatomical reference for operator guidance, potentially reducing acquisition variability and unnecessary CT use. A limitation of this study is the relatively small paired dataset, which may limit the generalizability of the proposed model.

188. ❌ QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes

作者: Xinze Li, Bohan Yang, Pengxu Chen, Yiyuan Wang, Hongcheng Luo, Wentao Cheng, Weifeng Su 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04844v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware stretching factor that enables the construction of a tile-aligned QuadBox, which covers the elliptical projection and largely excludes irrelevant tiles. Second, we introduce QPass, a single-pass tile traversal algorithm that exhaustively exploits the discrete nature of QuadBox, ensuring that the tile intersection check is performed with simple interval tests. Experiments on public datasets show that our method accelerates the rendering speed of 3DGS by 1.85$\times$. Code is available at \href{https://github.com/Powertony102/QuadBox}{https://github.com/Powertony102/QuadBox}.

189. ❌ MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

作者: Miguel Diaz Benito, Cecilia Diana Albelda, Alvaro Garcia Martin, Jesus Bescos Cano, Marcos Escudero-Vinolo, Juan C. SanMiguel 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04772v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval and generation system that allows users to find and generate clinically relevant images from trustworthy sources by mapping both text and images to a shared latent space, enabling semantically meaningful queries. The system is based on a fine-tuned medical version of CLIP (MedICaT-ROCO), trained with the ROCO dataset, obtained from PubMed Central. MIRAGE allows users to give prompts to retrieve images, generate synthetic ones through a medical diffusion model (Prompt2MedImage) and receive enriched descriptions from a large language model (Dolly-v2-3b). It also supports a dual search option, enabling the visual comparison of different medical conditions. A key advantage of the system is that it relies entirely on publicly available pretrained models, ensuring reproducibility and accessibility. Our goal is to provide a free, transparent and easy-to-use didactic tool for medical students, especially those without programming skills. The system features an interface that enables interactive and personalized visual learning through medical image retrieval and generation. The system is accessible to medical students worldwide without requiring local computational resources or technical expertise, and is currently deployed on Kaggle: http://www-vpu.eps.uam.es/mirage

190. ❌ Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation

作者: Anjith George, Sebastien Marcel 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04769v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Although recent HFR methods have achieved significant improvements in performance, many rely on computationally expensive models, making them impractical for deployment on resource-limited edge devices. In this work, we introduce a lightweight yet effective HFR framework by adapting a hybrid CNN-Transformer model originally developed for RGB homogeneous face recognition. Our approach enables efficient end-to-end training with only a small amount of paired heterogeneous data, while still maintaining strong performance on standard RGB face recognition benchmarks. This makes it suitable for both homogeneous and heterogeneous settings. Comprehensive experiments on several challenging HFR and face recognition benchmarks show that our method achieves state-of-the-art or competitive performance while keeping computational requirements low.

191. ❌ Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction

作者: Berk Sezer, Ali Görkem Küçük, Erol Şahin, Sinan Kalkan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04770v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smaller and less varied sets, failing to assess true robustness. To bridge these gaps, we introduce Gaze4HRI, a large-scale dataset (50+ subjects, 3,000+ videos, 600,000+ frames) designed to evaluate state-of-the-art performance against critical HRI variables: illumination, head-gaze conflict, as well as the motion of camera and gaze target in video. Our benchmark reveals that all evaluated methods fail in at least one condition, identifying steeply-downward gaze as a universal failure point. Notably, PureGaze trained on the ETH-X-Gaze dataset uniquely maintains resilience across all other conditions. These results challenge the recent focus in the literature on complex spatial-temporal modeling and Transformer-based architectures. Instead, our findings suggest that extensive data diversity, as exemplified by the ETH-X-Gaze dataset, serves as the primary driver of zero-shot robustness in unconstrained environments, while resilience-enhancing frameworks, such as PureGaze’s self-adversarial loss for gaze feature purification, provide a substantial further improvement. Ultimately, this study establishes a rigorous benchmark that provides practical guidelines for practitioners as well as reshaping future research. The dataset and codes are available at https://gazeforhri.github.io.

192. ❌ VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision

作者: Yasod Ginige, Ransika Gunasekara, Darsha Hewavitharana, Manjula Ariyarathne, Peshala Jayasekara, Ranga Rodrigo 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04750v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these issues by constructing viewpoint-conditioned feature vectors and area-specific feature comparisons in separate feature spaces. These interventions enable leveraging the advancements of existing RGB-pre-trained ViT feature extractors while effectively adapting them to address the challenges specific to the thermal domain. We test our system with RGBNT100 (IR) vehicle dataset and a thermal maritime dataset acquired by us. Our results surpass the state-of-the-art methods by 19.7% and 12.8% for the above datasets in mAP scores, respectively. We also plan to make our thermal dataset available, the first of its kind for maritime vessel identification.

193. ❌ Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation

作者: Jinzhen Han, JinByeong Lee, Jisung Kim, HongSik Yun 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04731v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and propose MorphoFormer, a joint BH/BF estimation framework built around two complementary mechanisms: (i) a BF-Guided Task Decoder (BGTD) that gates the height branch via cross-attention on a footprint-derived morphology context, and (ii) a Morphology Consistency Loss (MCL) that supervises a height-from-footprint surrogate against the ground-truth BH, indirectly forcing the BF feature to encode height-correlated structure. The encoder is a single-stage Swin backbone fed by Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs, trained and evaluated on a geo-blocked split of 54 cities. Against a Swin-MTL baseline at identical receptive field, MorphoFormer reduces BH test RMSE from 3.39 to 3.15 m (R^2 improves 0.62 -> 0.67) with BF R^2 stable at 0.80. Controlled ablations at identical capacity attribute most of this 0.24 m improvement to the two proposed mechanisms: removing BGTD raises BH RMSE by 0.11 m and removing MCL raises it by 0.11 m, with the residual approximately 0.02 m falling within the noise floor of encoder-side variations. Because both mechanisms act on cross-task representations rather than pixels, the design carries no intrinsic dependence on input resolution.

194. ❌ ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting

作者: Yingdong Gu, Shaocheng Yan, Zhenjun Zhao, Yuan Kou, Jianxin Luo, Pengcheng Shi, Jiayuan Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04730v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query features and the 3D Gaussian feature field, but this often results in mismatches due to an inherent bias in the learned Gaussian feature. We theoretically analyze the feature learning process in 3DGS, revealing that the widely adopted $α$-blending optimization inherently introduces bias into 3D point features. This bias stems from the entanglement between individual Gaussians and their neighboring Gaussians, making the learned features unsuitable for precise matching tasks. Motivated by these findings, we propose ULF-Loc, an unbiased landmark feature framework that replaces biased feature optimization with geometry-weighted feature fusion. We further introduce keypoint-consensus landmark sampling to select reliable Gaussians and local geometric consistency verification to reject mismatches caused by rendering artifacts. On the Cambridge Landmarks dataset, ULF-Loc reduces the mean median translation error by 17% compared to the state-of-the-art, while achieving superior efficiency with only 1/10 the training time and 1/6 the GPU memory of STDLoc.

195. ❌ Anny-Fit: All-Age Human Mesh Recovery

作者: Laura Bravo-Sánchez, Matthieu Armando, Romain Brégier, Grégory Rogez, Serena Yeung-Levy, Fabien Baradel 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04728v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals directly in the camera coordinate system, enforcing global spatial consistency. At the core of our approach is the use of multiple forms of expert knowledge – including metric depth maps, instance segmentation, 2D keypoints, and, VLM-derived semantic attributes such as age and gender – each obtained from dedicated off-the-shelf networks. These complementary signals jointly guide the optimization, constraining the depth-scale ambiguity characteristic of all-age scenes. Across diverse datasets, Anny-Fit consistently improves 2D reprojection accuracy (+13 to 16), relative depth ordering (+6 to 7), 3D estimation error (-9 to -29) and shape estimation (+25 to +82), producing more coherent scenes. Finally, we show that VLM-based semantic knowledge can be distilled into an HMR model via the pseudo-ground-truth annotations produced by Anny-Fit on training data, enabling it to learn semantically meaningful shape parameters while improving HMR performance. Our approach bridges adult-only and all-age modeling by enabling zero-shot adaptation of adult-trained HMR pipelines to the full age spectrum without retraining. Code is publicly available at https://github.com/naver/anny-fit.

196. ❌ Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition

作者: Alexander Vedernikov 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04713v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machine unlearning as a post-hoc sanitization mechanism for engagement recognition. Starting from a baseline trained on all subjects, we rank candidate harmful subjects using a model-dependent proxy, apply a lightweight approximate unlearning update, and compare the result against an oracle model retrained from scratch on the retained subjects only. We instantiate this protocol on DAiSEE and EngageNet using Tensor-Convolution and Convolution-Transformer Network (TCCT-Net) as a fixed platform and evaluate three matched model states under the same removal scenario: baseline, unlearned, and oracle. In representative K=3 forget-set settings, the unlearned model recovers 89.3% and 92.5% of the oracle gain on EngageNet and DAiSEE, respectively, at roughly one quarter of retraining cost. Across the tested small-audit regimes, effectiveness is strongest at an intermediate forget-set size, indicating that approximate subject-level unlearning is a useful low-cost correction mechanism, but one whose benefit depends on subject selection quality and removal regime.

197. ❌ HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

作者: Keunho Byeon, Jin Tae Kwak 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04682v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, and point-wise regression objectives often yield over-smoothed gene expression profiles, obscuring gene-specific spatial heterogeneity. To address these, we propose HEXST, a geometry-aligned Transformer for spatial gene expression prediction from histology. HEXST operates directly on hexagonal spot coordinates to enable efficient local-to-global contextual modeling via tailored shifted-window attention mechanism and hexagonal rotary positional encoding. To enhance gene-wise spatial contrast, HEXST complements point-wise regression with a contrast-sensitive differential objective and transcriptomic priors from a pretrained single-cell foundation model during training. Across seven spatial transcriptomics datasets, HEXST consistently outperforms state-of-the-art models, providing accurate and robust spatial gene expression predictions while preserving gene-wise contrast and spatial heterogeneity.

198. ❌ From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models

作者: Yihan Lin, Haoyang Li, Yang Li, Haitao Shen, Yihan Zhao, Chao Shao, Jing Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04678v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified VLA baseline, we instantiate and compare four representative integration strategies. Our results reveal a formulation-task correspondence: image-based latent actions benefit long-horizon reasoning and scene-level generalization, whereas action-based latent actions excel at complex motor coordination. Furthermore, we find that directly supervising the VLM with discrete latent action tokens yields the most effective performance. Finally, our experiments offer initial insights into the benefits of latent action supervision in mixed-data, suggesting a promising direction for VLA training. Code is available at https://github.com/RUCKBReasoning/From_Pixels_to_Tokens.

199. ❌ Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern

作者: Xiaopei Zhu, Guanning Zeng, Zhanhao Hu, Jun Zhu, Xiaolin Hu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04675v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$–360$^{\circ}$) RGB-T attacks, we construct 3D RGB-T models for human and adversarial clothing. NORP is a new adversarial pattern design using distinct visible and thermal materials without overlap, avoiding the light reduction in overlapping RGB-T patterns (ORP). To optimize the NORP on adversarial clothing, we propose a spatial discrete-continuous optimization (SDCO) method. We systematically evaluated our method on RGB-T detectors with different fusion architectures, demonstrating high attack success rates both in the digital and physical worlds. Additionally, we introduce a fusion-stage ensemble method that enhances the transferability of adversarial attacks across unseen RGB-T detectors with different fusion architectures.

200. ❌ Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling

作者: Xuhai Chen, Zhi Cen, Huaijin Pi, Sida Peng, Xiaowei Zhou, Yong Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04662v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}k$ (8.89 vs. 25.30) and $\text{FID}{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.

201. ❌ CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

作者: Qiming Li, Zekai Ye, Xiaocheng Feng, Weihong Zhong, Libo Qin, Ruihan Chen, Lei Huang, Baohang Li, Kui Jiang, Yaowei Wang, Ting Liu, Bing Qin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04641v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs’ attention to visual information is significantly enhanced when answering caption queries compared to non-caption queries. Inspired by this phenomenon, we propose Caption-guided Visual Attention Steering (CAST), a training-free, plug-and-play hallucination mitigation method that leverages the attention activation pattern corresponding to caption queries to enhance LVLMs’ visual perception capability. Specifically, we use probing techniques to identify attention heads that are highly sensitive to caption queries and estimate optimized steering directions for their outputs. This steering strengthens LVLM’s fine-grained visual perception capabilities, thereby effectively mitigating object hallucination. CAST reduced object hallucination by an average of 6.03% across five widely used LVLMs and five benchmarks including both discriminative and generative tasks, demonstrating state-of-the-art performance while adding little inference cost and preserving other foundational capabilities.

202. ❌ UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection

作者: Huan Zhang, Lianghong Tan, Yichu Xu, Jiangzhong Cao, Huanqi Wu, Linwei Zhu, Xu Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04635v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.

203. ❌ Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition

作者: Zishu Zhou, Zaipeng Xie, Xuanyao Jie 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04617v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet existing methods largely inherit assumptions from vision tasks and underexploit the inherent inter-window temporal structure in WHAR streams. In this paper, we revisit such temporal structure as a feature-conditioned inference signal rather than merely an output-space smoothing prior. We derive the insight that temporal continuity and observation-induced feature deviations provide complementary cues for determining when to preserve or release temporal inertia and where to route prediction refinement during likely transitions. Building upon this insight, we propose SIGHT, a lightweight and backpropagation-free TTA framework for WHAR, enabling real-time edge deployment. SIGHT estimates predictive surprise by comparing the current feature with a prototype-based expected state, and then uses the resulting feature deviation to guide geometry-aware transition routing based on prototype alignment and stream-level marginal habit tracking. Evaluations on real-world datasets confirm that SIGHT outperforms existing TTA baselines while reducing computational and memory costs.

204. ❌ Advancing Aesthetic Image Generation via Composition Transfer

作者: Kai Zou, Zhiwei Zhao, Bin Liu, Nenghai Yu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04609v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model composition in a semantic-agnostic manner. First, it supports composition transfer by extracting key composition-aware representations from a reference image and leveraging a tailored conditional guidance module to control composition based on pre-trained diffusion models. Second, when users specify only text themes without a composition reference, Composer supports theme-driven composition retrieval by leveraging the in-context learning capabilities of Large Vision-Language Models (LVLMs), achieving explicit composition planning. To enhance composition in a reference-free mode, we conduct text-to-composition fine-tuning on the trained control module to enable implicit composition planning. Furthermore, we curated a high-quality dataset comprising 2 million image-text pairs using state-of-the-art generative models to support model training. Experimental results demonstrate that Composer significantly enhances aesthetic quality in text-to-image tasks and facilitates personalized composition control and transfer, offering users precision and flexibility in the creative process.

205. ❌ DiCLIP: Diffusion Model Enhances CLIP’s Dense Knowledge for Weakly Supervised Semantic Segmentation

作者: Zhiwei Yang, Pengfei Song, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04593v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. However, previous WSSS methods solely adopt CLIP’s vision-language paired property for dense localization, neglecting its inherently limited dense knowledge across both visual and text modalities, which renders CAM generation suboptimal. In this work, we propose DiCLIP, a novel WSSS framework that leverages the generative diffusion model to enhance CLIP’s dense knowledge across two modalities. Specifically, Visual Correlation Enhancement (VCE) and Text Semantic Augmentation (TSA) modules are proposed for dense prediction enhancement. To improve the spatial awareness of visual features, our VCE module utilizes diffusion’s reliable spatial consistency to mitigate the over-smoothing issue in CLIP’s attention. It designs the Attention Clustering Refinement (ACR) module to reliably extract diverse correlation maps from the diffusion model. The correlation maps act as a diversity bias for CLIP’s self-attention, recursively pushing its visual features towards a more discriminative dense distribution. To augment the semantics of text embeddings, our TSA module argues that a single text modality is insufficient to encompass the variability of visual categories. Thus, we leverage diffusion’s generative power to maintain a dynamic key-value cache model, shifting CAM generation from a patch-text matching mechanism to a novel visual knowledge retrieval paradigm. With these enhancements, DiCLIP not only outperforms state-of-the-art methods on PASCAL VOC and MS COCO but also significantly reduces training costs. Code is publicly available at https://github.com/zwyang6/DiCLIP.

206. ❌ GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution

作者: Kunyu Li, Fei Wang, Lichao Zhang, Junjie Liu, Bihong Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04581v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better exploit LF geometry. For the NTIRE 2026 fidelity tracks, we use GTF as the main model, while a lightweight GTF-Tiny variant targets the efficiency track. On five standard LF SR benchmarks covering both real-captured and synthetic scenes, GTF reaches 32.78 dB without inference-time enhancement, and stronger inference settings with EPSW and test-time augmentation further improve performance. Under the NTIRE 2026 efficiency constraint, GTF-Tiny attains 32.57 dB with only 0.915M parameters and 19.81 GFLOPs. In the NTIRE 2026 Light Field Image Super-Resolution Challenge, our submissions rank 3rd on Track 1 and Track 3 and 4th on Track 2. Architecture-evolution, channel-width, and inference analyses further support the effectiveness of diagonal EPI modeling, directional fusion, and the lightweight design.

207. ❌ VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking

作者: Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04574v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL-UniTrack, a fully unified framework enhanced by visual-language prompts. By encoding features from both views within a single shared encoder, our method breaks the barrier of feature isolation to facilitate sufficient cross-view interaction. To overcome the ambiguity caused by relying solely on appearance matching, we design visual-language geometric prompting module, which fuses language descriptions with visual features to generate learnable prompts. These prompts are then fed into our prompt-guided cross-view adapter module to enable sufficient cross-view feature interaction and to guide the learning of view-specific feature representations. Furthermore, a confidence-modulated mutual distillation loss is proposed to regularize the training by mitigating noise propagation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the latest benchmark. The code can be downloaded in https://github.com/xuboyue1999/VL-UniTrack.git

208. ❌ Lightning Unified Video Editing via In-Context Sparse Attention

作者: Shitong Shao, Zikai Zhou, Haopeng Li, Yingwei Song, Wenliang Zhong, Lichen Bai, Zeke Xie 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04569v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \textbf{\texttt{LIVEditor}} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor achieves a $\sim$60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.

209. ❌ SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression

作者: Jiaqian Zhang, Hao Wei, Chenyang Ge, Yanhui Zhou 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04560v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular focus on Mamba. Unlike existing methods that rely on an inherently fixed scanning order and consequently impair semantic continuity and spatial correlation, we develop a semantic-aware Mamba block (SAMB) to enable scanning guided by dynamically clustered semantic features, thereby alleviating the strict causality constraints and long-range information decay inherent to Mamba. Inspired by singular value decomposition, we design an SVD-inspired redundancy reduction module (SVD-RRM) that performs a low-rank approximation on the latent features by introducing a learnable soft threshold, leading to channel-wise redundancy information reduction. The proposed SAMB is integrated into both the encoder and decoder of the compression framework, whereas the SVD-RRM is incorporated only in the encoder. Extensive experiments demonstrate that our method performs favorably against state-of-the-art approaches in terms of rate-distortion-perception tradeoff and model complexity. The source code and pretrained models will be available at https://github.com/Jasmine-aiq/SAMIC.

210. ❌ InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery

作者: Kaili Zheng, Kaiwen Wang, Xun Zhu, Chenyi Guo, Ji Wu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04554v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environment interaction information into human mesh recovery pipeline. By leveraging a human-object interaction detector, InterMesh enriches query representations with structured interaction semantics, enabling more accurate pose and shape estimation. We design lightweight modules, Contextual Interaction Encoder and Interaction-Guided Refiner, to integrate these features into existing HMR architectures with minimal overhead. We validate our approach through extensive experiments on 3DPW, MuPoTS, CMU Panoptic, Hi4D, and CHI3D datasets, demonstrating remarkable improvements over state-of-the-art methods. Notably, InterMesh reduces MPJPE by 9.9% on CMU Panoptic and 8.2% on Hi4D, highlighting its effectiveness in scenarios with complex human-object and inter-human interactions.

211. ❌ Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection

作者: Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04541v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n-Points (PnP) methods may struggle to achieve accurate results. To address this limitation, we propose Angle-I2P, an outlier rejection network that leverages angle-consistent geometric constraints and hierarchical attention. First, we design a scale-invariant, crossmodality geometric constraint based on angular consistency. This explicit geometric constraint guides the model in distinguishing inliers from outliers. Furthermore, we propose a global-tolocal hierarchical attention mechanism that effectively filters out geometrically inconsistent matches under rigid transformation, thereby improving the Inlier Ratio (IR) and Registration Recall (RR). Experimental results demonstrate that our method achieves state-of-the-art performance on the 7Scenes, RGBD Scenes V2, and a self-collected dataset, with consistent improvements across all benchmarks.

212. ❌ Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

作者: Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma, Shuaifeng Li, Yitong Qin, Hongbin Liu, Jiebo Luo, Zhen Lei 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04531v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.

213. ❌ Velox: Learning Representations of 4D Geometry and Appearance

作者: Anagh Malik, Dorian Chan, Xiaoming Zhao, David B. Lindell, Oncel Tuzel, Jen-Hao Rick Chang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04527v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks – video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation – and observe strong performances in all settings.

214. ❌ High-Fidelity Single-Image Head Modeling with Industry-Grade Topology

作者: Yunmu Wang, Zoubin Bi, Bowen Cai, Chenchu Rong, Jinlong Wang, Junchen Deng, Aocheng Huang, Jidong Jia, Huan Fu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04524v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines a rigged template across three stages – rig, joint, and vertex – achieving stable convergence and consistent topology. To mitigate the ill-posed nature of single-image 3D face reconstruction and ensure identity preservation, we employ a normal consistency objective jointly with landmark alignment. To further preserve local surface structure and enforce topological regularity, we introduce geometry-aware constraints based on Gaussian curvature and conformal consistency, along with auxiliary regularizations that correct fine artifacts such as lip seams and eyelid discontinuities. Our hierarchical optimization with geometry-aware regularization yields meshes with semantically meaningful edge flow and industry-grade topology. After geometry reconstruction, we extract UV-space texture and normal maps to preserve appearance details for visualization and downstream use. In a user study with 22 professional technical artists, our results were assessed as approaching industry-grade usability, and 95% of participants ranked our method as the top-performing approach, underscoring its effectiveness for real-world digital human production.

作者: Nand Kumar Mishra, Dhruv Mishra, Dr Manu Pratap Singh 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04518v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, identifier-conditioned normalization, cross-slice attention, and adaptive skip fusion. The method is evaluated on the Medical Segmentation Decathlon Task01 BrainTumour benchmark under matched optimization settings against standard 3D U-Net, Attention U-Net, Residual 3D U-Net, and V-Net baselines. In the reported 50-epoch comparison, DALight-3D achieves a mean Dice of 0.727 with 2.22M parameters, compared with 0.710 Dice and 3.20M parameters for Residual 3D U-Net. Component-wise ablations show consistent performance degradation when SepConv, identifier-conditioned normalization, CSA, or SSFB is removed. These results indicate that DALight-3D offers a favorable accuracy-efficiency trade-off within the present benchmark setting.

216. ❌ From Priors to Perception: Grounding Video-LLMs in Physical Reality

作者: Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04515v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we propose the Unified Attribution Theory: this dual failure stems not from perception deficiency, but from Semantic Prior Dominance – the reasoning mechanism is deeply hijacked by internal narrative scripts. To address this, we construct the Programmatic Adversarial Curriculum (PACC), the first high-fidelity adversarial video dataset synthesized based on physical laws, thoroughly decoupling visual artifacts from logical errors. Concurrently, we design the Visual-Anchored Reasoning Chain (VARC) to force models to explicitly ground their judgments in low-level visual facts prior to logical adjudication. Experiments demonstrate that without invasive architectural modifications, standard LoRA fine-tuning with the PACC curriculum effectively neutralizes prior interference in state-of-the-art (SOTA) models, yielding a substantial leap in physical reasoning capabilities.

217. ❌ Example-Based Object Detection

作者: ZhiXin Sun 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04501v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets. However, despite these advancements, false positives and false negatives still occur. In practical engineering applications, persistent misdetections or missed detections of the same object are unacceptable. Yet retraining the model every time such errors occur incurs substantial costs in terms of human effort, computational resources, and time. Therefore, how to leverage existing false positive and false negative samples to prevent such errors from recurring remains a highly challenging and urgent problem. To address this issue, we propose EBOD (Example-Based Object Detection), which integrates a prompt-based detector (SAM3) with robust feature matching modules (DINOv3 and LightGlue). The proposed framework effectively suppresses the repeated occurrence of false positives and false negatives by leveraging previous error examples, without requiring additional model retraining. Code is available at https://github.com/sunzx97/examples_based_object_detection.

218. ❌ Towards General Preference Alignment: Diffusion Models at Nash Equilibrium

作者: Jiaming Hu, Jiamu Bai, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04494v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human preferences can be adequately modeled by the Bradley–Terry (BT) model, which may fail to capture the full complexity of human preferences. In this paper, we formulate diffusion alignment from a game-theoretic perspective. We propose Diffusion Nash Preference Optimization (Diff.-NPO), an intuitive general preference framework for diffusion alignment. Diff.-NPO encourages the current policy to play against itself to achieve self improvement and lead to a better alignment. Empirically, we demonstrate the effectiveness of Diff.-NPO on the text-to-image generation task via various metrics. Diff.-NPO consistently outperforms existing preference-based diffusion alignment methods.

219. ❌ Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding

作者: Shuo Liu, Lei Shi, Haowen Liu, Jing Xu, Yufei Gao, Yucheng Shi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04475v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.

220. ❌ Stream-T1: Test-Time Scaling for Streaming Video Generation

作者: Yijing Tu, Shaojin Wu, Mengqi Huang, Wenchuan Wang, Yuxin Wang, Chunxiao Liu, Zhendong Mao 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04461v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering computational overhead while enabling fine-grained temporal control. Driven by this insight, we introduced Stream-T1, a pioneering comprehensive TTS framework exclusively tailored for streaming video generation. Specifically, Stream-T1 is composed of three units: (1) Stream -Scaled Noise Propagation, which actively refines the initial latent noise of the generating chunk using historically proven, high-quality previous chunk noise, effectively establishes temporal dependency and utilizing the historical Gaussian prior to guide the current generation; (2) Stream -Scaled Reward Pruning, which comprehensively evaluates generated candidates to strike an optimal balance between local spatial aesthetics and global temporal coherence by integrating immediate short-term assessments with sliding-window-based long-term evaluations; (3) Stream-Scaled Memory Sinking, which dynamically routes the context evicted from KV-cache into distinct updating pathways guided by the reward feedback, ensuring that previously generated visual information effectively anchors and guides the subsequent video stream. Evaluated on both 5s and 30s comprehensive video benchmarks, Stream-T1 demonstrates profound superiority, significantly improving temporal consistency, motion smoothness, and frame-level visual quality.

221. ❌ StableI2I: Spotting Unintended Changes in Image-to-Image Transition

作者: Jiayang Li, Shuo Cao, Xiaohui Li, Zhizhen Zhang, Kaiwen Zhu, Yule Duan, Yu Qiao, Jian Zhang, Yihao Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04453v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre–post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.

222. ❌ RemoteZero: Geospatial Reasoning with Zero Human Annotations

作者: Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min, Shimin Di, Yuhui Zheng 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04451v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabeled remote sensing data. To break this bottleneck, we introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.

223. ❌ Deep Reprogramming Distillation for Medical Foundation Models

作者: Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang, Hui Lin, Ya Zhang, Yanfeng Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04447v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

224. ❌ LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection

作者: Yutong Xiao, Ran Ran, Jiwei Wei, Shuchang Zhou, Ke Liu, Zheng Ziqiang, Caiyan Qin 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04445v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only on unique artifacts tends to cause overfitting to specific forgery patterns. To address this challenge, we propose LEGO (LoRA-Enabled Generator-Oriented Framework). The core mechanism of LEGO employs an MLP to modulate multiple LoRA (Low-Rank Adaptation) blocks, each pretrained to capture the unique artifacts of a specific generator, followed by attention-based feature fusion. Unlike conventional methods that seek a single universal solution, LEGO delegates unique artifact extraction to specialized LoRA modules by dividing its training procedure into two stages. Each LoRA module is individually trained on a single-generator dataset to learn generator-specific representations, then MLP and attention layers are trained on mixed datasets to dynamically regulate the contribution of each module. Benefiting from its modular yet robust design, LEGO can be naturally extended by incorporating new LoRA modules for adaptation to newly emerging next-generation datasets, while still achieving substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage.

作者: Chunwei Tian, Jingyuan Xie, Qi Zhang, Chao Li, Wangmeng Zuo, Shichao Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04439v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Our experimental results demonstrate that CMNet outperforms several novel methods, i.e., SCN and LAENet-SA for facial expression recognition. Codes can be obtained at https://github.com/hellloxiaotian/CMNet.

226. ❌ Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes

作者: Shuo Wang, Jilin Mei, Fuyang Liu, Wenfei Guan, Fanjie Kong, Zhihua Zhao, Shuai Wang, Chen Min, Yu Hu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04435v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward framework for pose-free off-road reconstruction. The key idea is to resolve temporal conflicts through spatially localized conditioning. Specifically, we introduce voxel-grounded temporal Gaussian aggregation, which partitions the canonical Gaussian space into spatial voxels and performs query-conditioned temporal attention within each voxel. Intra-voxel softmax normalization ensures that temporal selectivity and spatial occupancy become mutually reinforcing rather than conflicting. We furthermore introduce surface normal cues as auxiliary geometric guidance to regularize the geometry of Gaussian primitives. Extensive experiments on ORAD-3D and RELLIS-3D demonstrate that Ground4D consistently outperforms existing feedforward methods in reconstruction quality and generalizes zero-shot to unseen off-road domains. Project page and code:https://github.com/wsnbws/Ground4D.

227. ❌ Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning

作者: Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04425v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability to downstream tasks. Our framework is plug-and-play, allowing seamless integration with existing prompt learning methods. Extensive experiments on multiple benchmarks show that IPL consistently improves both interpretability and accuracy across five representative prompt learning methods, providing an effective and scalable extension to existing frameworks.

228. ❌ Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion

作者: Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04412v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce $\textbf{DiLAST}$: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representation for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.

229. ❌ Evaluation Cards for XAI Metrics

作者: Rokas Gipiškis, Olga Kurasova 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04410v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting as a central, under-addressed problem. We propose the XAI Evaluation Card, a documentation template analogous to model cards, designed to accompany any study that introduces an XAI evaluation metric. The card covers explicit declaration of target properties, grounding levels, metric assumptions, validation evidence, gaming risks, and known failure cases. We argue that adopting this template as a community norm would reduce evaluation fragmentation, support meta-analysis, and improve accountability in XAI research.

230. ❌ UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

作者: Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04409v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.

231. ❌ Detecting Deepfakes via Hamiltonian Dynamics

作者: Harry Cheng, Ming-Hui Liu, Tianyi Wang, Weili Guan, Liqiang Nie, Mohan Kankanhalli 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04405v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-energy equilibria. In contrast, generative models optimize for statistical similarity to real images but do not explicitly enforce structural constraints such as geometric smoothness, leaving deepfakes more likely to occupy unstable, high-energy states. To operationalize this, we introduce Hamiltonian Action Anomaly Detection (HAAD), comprising three contributions: \textbf{i)} We model the image latent manifold as a potential energy surface. Under this hypothesis, real images are expected to produce basin-like low-energy responses, whereas fake images are more likely to induce high-potential, high-gradient responses. \textbf{ii)} We employ Hamiltonian-inspired dynamics as a stability probe. By releasing latent states from rest, samples near stable regions remain bounded, while high-gradient samples produce larger trajectory responses. \textbf{iii)} We quantify these dynamic behaviors through two trajectory statistics, \ie, Hamiltonian action and energy dissipation. Extensive experiments show that HAAD outperforms evaluated state-of-the-art baselines on challenging cross-dataset transfer benchmarks, supporting a physics-inspired stability prior for digital forensics.

232. ❌ Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring

作者: Jieying Wang, Xinqi Cai, Caifeng Shan, Wenjin Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04397v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition – most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting conditions during driving. To address this gap, we propose a highly-adaptive exposure controlling framework that proactively adjusts exposure parameters based on predictive modeling of historical skin reflections. Unlike standard auto-exposure, our method is specifically optimized for rPPG measurement, ensuring the skin region of interest (ROI) remains within the optimal dynamic range for rPPG signal extraction. As an important contribution of this study, we introduce ExpDrive, a public in-vehicle physiological monitoring dataset comprising synchronized facial video and reference ECG from 48 subjects captured under real driving conditions. Extensive experiments demonstrate that our method consistently outperforms fixed exposure and standard auto-exposure strategies. Specifically, it reduces the Mean Absolute Error (MAE) by 6.31 bpm (from 14.1 to 7.79 bpm) and significantly increases the success rate by 32.3 percentage points (p < 0.001) (from 24.9% to 57.2%) across challenging driving scenarios. Notably, it clearly improved the performance of non-contact heart-rate monitoring in both low-light (rainy) and high-glare (sunny) conditions, validating the efficacy of exposure-aware acquisition design.

233. ❌ Intermediate Representations are Strong AI-Generated Image Detectors

作者: Zhenhan Huang, Pin-Yu Chen, Tejaswini Pedapati, Jianxi Gao 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04358v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The rapid advancement in generative AI models has enabled the creation of photorealistic images. At the same time, there are growing concerns about the potential misuse and dangers of generated content, as well as a pressing need for effective AI-generated image detectors. However, current training-based detection techniques are typically computationally costly and can hardly be generalized to unseen data domains, while training-free methods fall short in detection performance. To bridge this gap, we propose a search-based method employing data embedding sensitivity in intermediate layers to detect AI-generated images. Given a set of real and AI-generated images, our method examines the similarity between original image embeddings and perturbed image embeddings, and detects AI-generated images based on the similarity. We examine the proposed method on two comprehensive benchmarks: GenImage and Forensics Small. Our method exhibits improved performance across different datasets compared to both training-free and training-based state-of-the-art methods. On average, our method achieves the largest performance gain on the Forensics Small benchmark by 39.61% compared to the best training-free method and 5.14% compared to the best training-based method in AUROC score.

234. ❌ InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making

作者: Mustafa Sakhaia, Kaung Sithua, Min Khant Soe Okea, Maciej Wielgosza 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04355v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Autonomous driving systems rely heavily on robust sensor fusion to perceive complex envi- ronments. Traditional setups using RGB cameras and LiDAR often struggle in high-dynamic- range scenes or high-speed scenarios due to motion blur and latency. Dynamic Vision Sensors (DVS), or event cameras, offer a paradigm shift by capturing asynchronous brightness changes with microsecond temporal resolution and high dynamic range. In this paper, we propose an extended architecture of the state-of-the-art InterFuser model, integrating DVS as an additional modality to enhance perception reliability. We introduce a novel token-based fusion strategy that incorporates accumulated event frames into the transformer-based backbone of InterFuser. Our method leverages the complementary nature of RGB, LiDAR, and DVS data. We evaluate our approach on the Car Learning to Act (CARLA) Leaderboard benchmarks, demonstrating that the inclusion of DVS improves the robustness of the driving agent, achieving a competitive Driving Score of 77.2 and a superior Route Completion of 100%. The results indicate that event-based vision is a promising direction for improving safety and performance in adverse lighting and dynamic conditions.

作者: Rihuan Ke 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04332v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This work proposes a learning-based statistical refinement method for improving the denoising results of a given denoiser without knowing the precise noise distribution or accessing clean images or calibration data. While there are many existing successful denoising approaches for handling different kinds of noise, they typically require accurate modelling of the images and the noise (implicitly or explicitly), and hence the denoising results can be suboptimal due to different practical factors such as imperfect models, unreliable noise assumptions, or low quality data. In particular, when clean image samples are not available and there is a lack of knowledge of the underlying noise distribution, which is the case in various practical situations, the results may not well align with the noise statistics. The unawareness of the useful statistical information leads to suboptimal results. This work aims to make the best use of the statistical information to improve the consistency between the given denoising results and the noise statistics, under the assumption that the noise is conditionally pixel-wise independent given the clean signal. A method, based on a Bayesian formulation of an auxiliary signal in the noisy data, is proposed for evaluating the consistency of the denoising results, without precise information on noise distribution. By leveraging the statistical information from noisy data, the method enhances the statistical noise consistency and improves denoising quality.

236. ❌ Covariance-Aware Goodness for Scalable Forward-Forward Learning

作者: Xiaoyi Jiang, Bashir M. Al-Hashimi, Kai Xu 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04346v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The Forward-Forward algorithm eliminates global gradient flow and full network activations storage. However, in convolutional settings, existing BP-free FF methods significantly under-perform backpropagation on complex benchmarks such as ImageNet-100 and Tiny-ImageNet. We identify this gap as a structural bottleneck in goodness extraction: standard sum-of-squares formulation collapses feature volumes into channel-wise activation energies which omits critical second-order dependencies. To address this, we propose a framework centered on three key components. First, Bi-axis Covariance Goodness(BiCovG) explicitly augments the standard goodness function with structured second-order information along two axes: cross-channel projections that model inter-feature covariance, and nested multi-scale aggregation that encodes spatial correlation statistics. This provides a tractable approximation to covariance-aware goodness without the prohibitive O(C^2) complexity of explicit matrix estimation. Second, a lightweight Logistic Fusion module aggregates layer-wise predictions, amplifying the contribution of deeper representations. Third, the Feature Alignment Layer(FAL) introduces a zero-initialized correction at block boundaries to mitigate representation misalignment in deep locally trained networks. By introducing these three components, we effectively double the depth of viable Forward-Forward learning, extending robust layer utilization from shallow baselines to 16 layer architectures like VGG-16. The resulting BP-free model achieves 73.01% on ImageNet-100 and 50.30% on Tiny-ImageNet. As a practical extension, Hybrid Goodness Blocks control the scope of gradient propagation via configurable block sizes, further narrowing the ImageNet-100 gap to 3.6% and matching BP on Tiny-ImageNet, while still reducing peak memory by approximately 50% relative to BP.

237. ❌ Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles

作者: Maryam Sadat Hosseini Azad, Shahriar Baradaran Shokouhi 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04299v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Scene understanding is a vital part of autonomous driving systems, which requires the use of deep learning models. Deep learning methods are intrinsically black box models, which lack transparency and safety in autonomous driving. To make these systems transparent, multi-task visual understanding has become crucial for explainable autonomous driving perception systems, where simultaneous prediction of multiple driving behaviors and their underlying explanations is essential for safe navigation and human trust in autonomous vehicles. In order to design an accurate and cross-cultural explainable autonomous driving system, we introduce a comprehensive confidence threshold sensitivity analysis that evaluates various threshold values to identify optimal decision boundaries for different tasks. Our analysis demonstrates that traditional fixed threshold approaches are suboptimal for multi-task scenarios. Through extensive evaluation, we demonstrate that our adaptive threshold selection methodology improves F1-scores across different tasks. In addition, we introduce IUST-XAI-AD, a novel dataset consisting of 958 images with human annotations for driving decisions and corresponding reasoning. This dataset addresses the critical gap in domain-specific evaluation benchmarks for distinct driving contexts and provides a more challenging test environment compared to existing datasets. Experimental results demonstrate that confidence threshold sensitivity analysis can significantly improve model performance, while the introduction of the IUST-XAI-AD dataset reveals important insights about cross-cultural driving behavior patterns. The combined contributions of this work provide both methodological advances and practical evaluation tools that can accelerate the development of more reliable, explainable, and culturally-adaptive autonomous driving systems for global deployment.

238. ❌ Estimating the expected output of wide random MLPs more efficiently than sampling

作者: Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton, Paul Christiano 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05179v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.

239. ❌ Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

作者: Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05189v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

240. ❌ Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer

作者: Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05176v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.

241. ❌ Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting

作者: Freyaa Chawla, Ahan Chawla, Rishi Singh, Joe Germino, Grigorii Khvatskii 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05144v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for the high school students, but also on how AI tools accelerated the process that enabled the high school students to focus on higher order problem formulation and solution. Although the participants entered the project with limited background in both AI and finance, they showed strong enthusiasm for technical market analysis and ETF price prediction. Traditional learning settings would first teach the necessary methods in a classroom setting and only later let students apply them. In contrast, our project emphasized workflow design: students identified the sequence of steps needed to address the problem and then used AI-driven tools to execute each step. We note that the high school students developed the necessary code through iterating with the AI tools, and we used our daily stand-ups to debug and answer conceptual questions. Each of the student was able to dig deeper into their area of interest whether computer science or finance, while collaboratively making a significant advance over the summer of 2025. This project was an important pedagogical exercise on how AI tools can be used for mentoring high school students, allowing them to focus on their specific interests and using the daily stand-ups to focus on problem definition and conceptual understanding. Despite their limited technical qualifications, the students were able to leverage AI tools to build meaningful models with real-world application.

242. ❌ Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction

作者: Dan Wilson, Mohamed Akrout 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05134v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model’s latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.

243. ❌ Transformed Latent Variable Multi-Output Gaussian Processes

作者: Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka, Magnus Rattray, Mauricio A Álvarez 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05133v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions, such as employing low-rank or sum-of-separable kernels, which can limit expressiveness. We propose the Transformed Latent Variable MOGP (T-LVMOGP), a novel framework that scales MOGPs to a massive number of outputs while preserving the capacity to capture meaningful inter-output dependencies. T-LVMOGP constructs a flexible multi-output deep kernel by mapping inputs and output-specific latent variables into an embedding space using a Lipschitz-regularised neural network. Combined with stochastic variational inference, our model effectively scales to high-dimensional output settings. Across diverse benchmarks, including climate modelling with over 10,000 outputs and zero-inflated spatial transcriptomics data, T-LVMOGP outperforms baselines in both predictive accuracy and computational efficiency.

244. ❌ Conditional outlier detection for clinical alerting

作者: Milos Hauskrecht, Michal Valko, Shyam Visweswaran, Iyad Batal, Gilles Clermont, Gregory Cooper 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05124v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac surgical patients. We base the evaluation on the opinions of a panel of experts. The results support that anomaly-based alerting can have reasonably low false alert rates and that stronger anomalies are correlated with higher alert rates.

245. ❌ Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals

作者: Sahar Askari, Mohammad Mahdi Mirza Ali Mohammadi, Fatemeh Ensafdoust, Amin Golnari, Saeid Sanei 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05120v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG), electromyography (EMG), and galvanic skin response (GSR) signals. Our approach involves rigorous preprocessing followed by a domain-specific feature extraction pipeline targeting time-domain, frequency-domain, and derived physiological indices. To address high dimensionality, we employ SHAP-based elite feature selection, retaining the top 250 features to reduce computational overhead while preserving predictive power. Hyperparameter optimization for extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) models is conducted using Bayesian optimization via Optuna. Finally, a weighted soft-voting ensemble is constructed to leverage the complementary strengths of both gradient boosting frameworks. The results demonstrate that the proposed ensemble achieves a test accuracy of 80.91% and a macro-F1 score of 0.79, significantly outperforming single-modality baselines and traditional machine learning models. Ablation studies confirm an 8% performance gain over the best single modality (EEG), validating the necessity of multimodal fusion. SHAP analysis further validates the physiological plausibility of the model, revealing that the EEG contributes the majority of predictive weight, GSR and EMG features provide critical discriminatory signals for high-arousal and motor-intensive maneuvers.

246. ❌ On the Hardness of Junking LLMs

作者: Marco Rando, Samuel Vaiter 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05116v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated prompt design. Recently, it has been observed that harmful behaviors can be elicited even without the adversarial prompt, relying solely on optimized token sequences. This suggests the existence of natural backdoors, i.e., token sequences naturally emerged during LLMs training that trigger unsafe outputs without any meaningful instruction. However, despite these observations, this setting remains largely unexplored, and in particular the hardness of finding natural backdoors has not been assessed yet. In this work, we provide a first proof-of-concept study investigating the hardness of this task, which we refer to as the junking problem. We formalize it as the problem of finding token sequences that maximize the probability of generating a target prefix of harmful responses, propose a greedy random-search method to assess is such sequences can be discovered easily. Our results show that this problem is harder than standard jailbreak attacks, confirming the importance of semantic information in prompt design. At the same time, we find that our simple strategy is sufficient to solve it with a high success rate, suggesting that natural backdoors are present and easily recoverable. Finally, through perplexity analysis, we observe that the discovered token sequences lie in low-probability regions of the model distribution, supporting the hypothesis that they emerged implicitly from the training process.

247. ❌ How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences

作者: Mariia Seleznova 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05113v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime $t=o(\sqrt n)$, in which the infinite-width approximation remains valid; (ii) a critical regime $t\sim c\sqrt n$, in which non-negligible deviations from infinite-width predictions appear and a nontrivial joint scaling limit emerges; and (iii) a supercritical regime $t\gg \sqrt n$, in which finite-width effects dominate. Thus, our results pinpoint the precise recurrent depth scale at which infinite-width theory breaks down in long-range linear recurrences. In turn, this shows when standard initialization schemes, such as Glorot, become unstable. More broadly, our results demonstrate that finite-width effects accumulate more rapidly with depth in recurrent models than in feedforward ones, leading to qualitatively different signal propagation behavior.

248. ❌ Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

作者: Daniel Wurgaft, Can Rager, Matthew Kowal, Vasudev Shyam, Sheridan Feucht, Usha Bhalla, Tal Haklay, Eric Bigelow, Raphael Sarfati, Thomas McGrath, Owen Lewis, Jack Merullo, Noah Goodman, Thomas Fel, Atticus Geiger, Ekdeep Singh Lubana 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05115v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and a behavior manifold $M_y$ to output probability distributions. We then test the link $M_h \leftrightarrow M_y$ via interventions: we find that steering along $M_h$, which we term manifold steering, yields behavioral trajectories that follow $M_y$, while linear steering – which assumes a Euclidean geometry – cuts through off-manifold regions and hence produces unnatural outputs. Moreover, optimizing interventions in activation space to produce paths along $M_y$ recovers activation trajectories that trace the curvature of $M_h$. We demonstrate this bidirectional relationship between the geometry of representation and behavior across tasks and modalities. In language models, we use reasoning tasks with cyclic and sequential geometries as well as in-context learning tasks with more complex graph geometries. In a video world model, we use a task with geometry corresponding to physical dynamics. Overall, our work shows that geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals. This recasts the core problem of steering from finding the right direction to finding the right geometry.

249. ❌ Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime

作者: Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu, Dawei Yin, Dou Shen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05112v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame this inefficiency as a pass-rate control problem and show that a 50% pass rate is the most informative operating point: it maximizes reward entropy, the probability of surviving group filtering, RLOO advantage energy under GRPO, and success–failure contrastive structure. Guided by this principle, we propose Prefix Sampling (PS), which replays trajectory prefixes to steer skewed groups toward this regime: successful prefixes serve as head starts for mostly failing groups, while failing prefixes serve as handicaps for mostly passing groups. In stateful agent environments, prefix states are reconstructed through replay while replayed tokens are excluded from the loss, restricting optimization to continuations generated by the current policy. On SWE-bench-style agentic RL, PS delivers end-to-end wall-clock speedups of 2.01x on Qwen3-14B and 1.55x on Qwen3-32B while preserving or improving final verified performance. For 14B, the SWE-bench Verified peak rises from the baseline peak of 0.273 to 0.295 under PS. Additional mathematical reasoning experiments on AIME 2025 show the same pass-rate control pattern and decompose the gains into replay, bidirectional coverage, and adaptive control.

250. ❌ Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning

作者: Harin Lee, Min-hwan Oh 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05102v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $δ\in (0,1]$, thereby characterizing the regret distribution across the full range of $δ$. We present a simple UCBVI-style algorithm with exploration bonus $\min{c_{1,k}/N, c_{2,k}/\sqrt{N}}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/δ))$, confirming the conjecture of Lattimore & Szepesvári (2020, Section 17.1) for the first time.

251. ❌ Proximal Projection for Doubly Sparse Regularized Models

作者: Jia Wei He, R. Ayesha Ali, Gerarda Darlington 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05093v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented by a Gaussian graphical model, the structure of the predictor graph can be exploited during regularization. Our proposed model exploits this underlying predictor graph structure by decomposing the estimated coefficient vector into a sum of latent variables that correspond to the sum of each node contribution to the coefficient vector. Regularization is then performed on the latent variables rather than on the coefficient vector directly. We use a penalty function that permits a clear user-defined trade-off between the L1 and L2 penalties and propose a novel proximal projection during optimization. Further, our implementation computes the projection operator for the intersection of selected groups, which conserves more computing resources compared to predictor duplication methods, especially for high-dimensional data. Through simulation, we evaluate the performance of our approach under different graph structures and node counts, and present results on real-world data. Results suggest that our method exhibits stable performance relative to other singly or doubly sparse graphical regression models.

252. ❌ Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis

作者: Yunfei Bai, Aaron Tesfa Tsion, Raul Rosales, Barbara Shollock, Wei He 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05088v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificates (EPCs) support regulation and retrofit planning, but their reliance on on-site inspections limits timely city-scale assessment. This study introduces a gated multimodal model to predict Standard Assessment Procedure (SAP) energy efficiency and Environmental Impact (EI) scores by integrating EPC tabular variables, assessor-written free text, and Geographic Information System (GIS)-derived spatial features describing footprint geometry, height, area, and orientation. Sample-wise gating learns property-specific modality weights, while an auxiliary band classification head stabilises training. In a Westminster, London case study, the model predicts SAP and EI scores with MAEs of 4.03 and 4.76 points and R2 values of 0.757 and 0.748, respectively, achieving a mean MAE of 4.39. Ablation results show that full multimodal fusion outperforms unimodal and bimodal baselines for both score prediction and band-level classification. Interpretability analyses provide decision-relevant evidence: gating weights indicate strong reliance on assessor text; SHAP highlights main fuel, built form, and construction age band; text occlusion prioritises roof and wall fields; and spatial attribution is dominated by height and footprint area, with sensitivity to footprint shape. The validated framework is further applied to retrofit scenarios for wall insulation, roof insulation, and window glazing upgrades, indicating projected improvements in SAP, EI, annual energy cost, and equivalent CO2 emissions. Overall, the framework provides scalable property-level evidence for retrofit screening, intervention prioritisation, and net-zero housing transitions.

253. ❌ Order Matters: Improving Domain Adaptation by Reordering Data

作者: Andrea Napoli, Paul White 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05084v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique which reduces the discrepancy estimation error by optimising the order in which the training data are sampled. We consider two specific domain discrepancy losses (correlation alignment and the maximum mean discrepancy), formulate their stochastic estimation error as a function of the data sampling order, and propose a practical optimisation algorithm. Our simulations demonstrate reduced variance compared to related methods, and experiments on two domain shift image classification benchmarks show improved target domain accuracy.

254. ❌ Provable imitation learning for control of instability in partially-observed Vlasov–Poisson equations

作者: Xiaofan Xia, Qin Li, Wenlong Mou 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05081v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We consider the stabilization of Vlasov–Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full phase-space state, practical feedback is typically limited to sparse macroscopic diagnostics. We therefore study imitation learning methods that distill a fully observed expert policy into controllers operating only on macroscopic measurements. We show the stability guarantees of the learned policy, where the error floor depends on the minimal behavior cloning loss achievable under the observation constraints. We further characterize this minimal loss in terms of a notion of entropy that quantifies the complexity of the initial distribution. Our results demonstrates the theoretical feasibility of learning stabilizing feedback policies for kinetic plasma dynamics from macroscopic observations, and exhibits the adaptivity of the learning approach to low-complexity structures. Through extensive numerical experiments, we validate our theory and show that the learned policies can stabilize the system using only macroscopic observations, within a significantly longer time horizon than non-adaptive baseline controllers.

255. ❌ Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry

作者: Jules Exbrayat, Renan Bouis, Elie Sezestre, Viorel Balan, Arnaud Cornelis, Damien Hebras, Catherine Euvrard 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05062v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent Effects (LDE). An accurate and efficient CMP model enables design teams to correct surface unevenness before fabrication, reducing costs and accelerating the design phase. However, existing models often rely on Density Step Height (DSH) modeling, which is time-consuming for calibration and requires substantial hardware resources for fine-grained predictions. In this paper, we propose combining the advantages of two surface analysis techniques, White Light Interfer-ometry (WLI) and Atomic Force Microscopy (AFM), to train a deep learning model. This model aims to predict full-chip post-CMP nanotopography with nanometer-scale accuracy. Our deep learning model is based on a Convolutional Neural Network (CNN) and follows a two-step pipeline. The model is trained on each technique separately, resulting in a detailed full-chip CMP model.

256. ❌ Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data

作者: Eni Solomon Laughter 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05050v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver behavioral patterns), applied to 1,060,119 valid car-following observations from the NGSIM trajectory dataset (2,932 vehicles). Six kinematic features are extracted, and deceleration events are detected under two threshold conditions (-0.5 m/s^2 and -0.3 m/s^2). K-means clustering identifies behavioral modes, and one-way ANOVA with eta-squared effect sizes ranks each feature’s discriminative power. Three key findings emerge: (1) threshold selection fundamentally shapes behavioral inference - the stricter threshold yields three interpretable modes while the permissive threshold collapses these to two; (2) hard braking prioritizes gap-closing rate (eta^2 = 0.715) while moderate braking emphasizes visual looming (eta^2 = 0.574); and (3) spacing headway is negligible (eta^2 <= 0.014) across both thresholds. These findings provide empirically grounded candidates for perceptual cue prioritization and have direct implications for ADAS warning system design and autonomous vehicle control.

257. ❌ The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence

作者: Kejun Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05029v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean causal fidelity – the fraction of encoder sensitivity allocated to system degrees of freedom – is 0.49, and only 2.5% of configurations exceed 0.70. The failure intensifies with dimension: at N=100, the optimal encoder becomes causally blind (fidelity ~10^{-8}) while achieving 92% lower prediction error than the causal representation. We prove this is not an optimization artifact but a structural property of the predictive objective: when environment modes are slower or less noisy than system modes, every minimizer of the population risk encodes the former. The set of dynamics exhibiting this predictive-causal gap is open and of positive measure in parameter space. In a nonlinear Duffing-GRU sweep, unconstrained predictors learn environment-dominant representations in 55% of tasks (95% CI 41–68%) versus 24% under operational grounding (p=2.3e-3); the median out-of-distribution MSE inflation under environment shift is 1.82x versus 1.00x. Operational grounding – restricting the loss to system observables – partially suppresses the gap, but causal fidelity is never recovered without an explicit system-environment boundary. The results identify the predictive-causal gap as a structural limit of learning, with implications for self-supervised representation learning, world models, and the scaling paradigm.

258. ❌ Hypergraph Generation via Structured Stochastic Diffusion

作者: Christopher Nemeth 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05024v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on relaxed incidence matrices via a structured stochastic diffusion. The forward process combines a hypergraph-specific two-sided heat operator with an Ornstein–Uhlenbeck component, preserving structure-aware noising near the data while yielding an explicit Gaussian terminal law. Conditional on an observed hypergraph, this forward process is linear-Gaussian, so conditional means, covariances, scores, and reverse-drift targets are available in closed form. We therefore learn a permutation-equivariant state-only reverse-drift field in incidence space by regressing onto exact conditional targets, and generate samples by simulating a learned reverse-time SDE from the Gaussian base law. We establish exactness in the ideal state-only setting together with finite-horizon stability guarantees, and empirically show improved hypergraph generation quality relative to strong baselines.

259. ❌ CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels

作者: Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng, Yun Lin, Shixuan Sun, Minyi Guo, Jin Song Dong 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05023v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention. We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.

260. ❌ Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning

作者: Shawn Ray 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05020v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(|E|)$ cost; and random edge samples yield an unbiased Horvitz-Thompson estimator and a normalized sample mean with $O(1/\sqrt{m})$ concentration in the sampled edge count $m$. For fixed sparse graphs we prove forwarding-index distortion bounds for expanders and a spectral refinement under low-rank distance structure; for random $d$-regular graphs we prove an unconditional probabilistic $\widetilde{\mathcal{O}}(D_{\max}/\sqrt{n})$ bound. On VMAS we verify recovery, unbiasedness, concentration, and wall-clock scaling, with a PettingZoo TVD panel checking non-Gaussian transfer. In a 500-iteration $n=100$ PPO run, Bernoulli-$0.1$ Graph-SND tracks full SND while reducing per-call metric time by about $10\times$, and frozen-policy GPU timing up to $n=500$ follows the predicted $\binom{n}{2}/|E|$ speedup. Random $d$-regular expanders empirically achieve $\mathrm{SND}_{G}^{\mathrm{u}}/\mathrm{SND} \in [0.9987, 1.0013]$ at $Θ(n \log n)$ edges. In DiCo diversity control at $n=50$, Bernoulli-$0.1$ Graph-SND preserves set-point tracking with paired reward differences indistinguishable from zero across nine matched cells while cutting per-call metric cost by ${\sim}9.5\times$. Together, these results show that the SND aggregation bottleneck can be removed without changing the metric’s semantics, yielding a drop-in sparse alternative that scales beyond complete-graph SND and supports both passive measurement and closed-loop diversity control.

261. ❌ Learned Neighbor Trust for Collaborative Deployment in Model-Agnostic Decentralized Learning

作者: Michael Lanier, Luise Ge, Sastry Kompella, Yevgeniy Vorobeychik 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05009v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devices are heterogeneous, data is scarce and skewed, and a node’s strongest neighbors may far exceed its own local capacity. We study how nodes should train so that their predictions compose well at deployment, and how each node should learn whom to trust. Under a server-free, model-agnostic protocol where nodes exchange only queries and soft predictions, we propose Learned Neighbor Trust (LNTrust) wherein each node learns a compact trust function over its neighborhood from local validation evidence. This trust function gates auxiliary distillation during training and defines a deployment ensemble at inference, so that collaboration learned during training transfers directly to deployment. Across datasets and topologies, LNTrust improves deployed accuracy over the strongest output-only baseline by large margins while using significantly less communication than previous methods.

262. ❌ Scalable inference of spatial regions and temporal signatures from time series

作者: Jiayu Weng, Alec Kirkley 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05008v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a priori either explicitly or implicitly. Utilizing the minimum description length principle from information theory, here we propose an efficient and fully nonparametric framework for the regionalization of spatial time series. Our method jointly infers a spatial partition along with a set of representative time series archetypes (“drivers”) that best compress a spatiotemporal dataset, with a runtime log-linear in the number of time series. We demonstrate that this method can accurately recover planted regional structure and drivers in synthetic time series, and can extract meaningful structural regularities in large-scale empirical air quality and vegetation index records. Our method provides a principled and scalable framework for spatially contiguous partitioning, allowing interpretable temporal patterns and homogeneous regions to emerge directly from the data itself.

263. ❌ Agentic Vulnerability Reasoning on Windows COM Binaries

作者: Hwiwon Lee, Jongseong Kim, Lingming Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.05000v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agents the static context, COM activation metadata, and debugger feedback needed to move from vulnerability discovery to verified PoC generation. On a benchmark of 20 COM objects covering 40 vulnerability cases, SLYP achieves 0.973 F1, outperforming production coding agents by up to 0.208 F1 and the state-of-the-art static analyzer by 3.3x in bug discovery. For PoC generation, production coding agents in their default setup (without our COM inspection and dynamic debugging tools) verify essentially no cases on either frontier model, whereas SLYP’s interactive toolsets enable it to autonomously synthesize working PoCs for 67.5% of cases on the strongest configuration. Deployed on production Windows services, SLYP discovers 28 previously unknown vulnerabilities across nine COM services, all confirmed by the Microsoft Security Response Center (MSRC) with 16 CVEs assigned and $140,000 in bounties. Furthermore, SLYP is designed with generalizable binary analysis and debugging interfaces, making it readily applicable to other commercial off-the-shelf (COTS) binaries beyond Windows COM services.

264. ❌ Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

作者: Jinju Lee 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04998v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generation as a standalone task and addresses a question that arises whenever such a model is adapted across genres: how much old-domain data must be retained during fine-tuning to acquire a new domain without forgetting the old? I study jazz fine-tuning starting from a pop-pretrained 25M-parameter Music Transformer (84.24% top-1 chord accuracy on a held-out pop test set). The available jazz corpus is an order of magnitude smaller than the pop corpus, so every fine-tune run uses all 1,513 jazz training sequences. The swept variable is the volume of pop “rehearsal” data mixed alongside, taking values in {0, 1K, 2.5K, 5K, 10K}. Every fine-tuned model gains 7 to 9 points of jazz top-1. Pop accuracy collapses by 2.14 points under jazz-only fine-tuning, recovers to baseline at approximately 2.5K rehearsal samples (1.65x the jazz volume), and saturates beyond that point. A complementary observation: the metric-best run (F3, 2.5K mix) is not always the perceptually preferred one. The pop-leaning (10K) and jazz-leaning (1K) endpoints carry more committed stylistic identities that the author more often selects as finished output in informal listening. I discuss what this suggests for music co-creation tools but make no perceptual claim, since no formal listening study has been conducted. All six checkpoints are released on the HuggingFace Hub at https://huggingface.co/PearlLeeStudio.

265. ❌ DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion

作者: Khaled Ahmed, Ghada Omar 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04997v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters – $σ_1$, $σ_2$, $d_1$, $d_2$ – and reconstructs conductivity-depth profiles using a differentiable soft-step decoder. The optimized architecture (379K parameters) features a Temporal Convolutional Network (TCN) encoder paired with a late-time branch and an auxiliary seafloor-depth head. This design achieves a 25.3% loss reduction over baseline models, with high predictive accuracy ($R^2 = 0.898$ for $σ_2$) and an inversion speed of 3.5~ms per sample on an A100 GPU. The framework demonstrates high robustness to noise through curriculum-based amplitude augmentation, maintaining a mean $\bar{R}^2$ of 0.858 at $\pm2%$ random amplitude error, compared to $0.363$ without augmentation. DualTCN generalizes effectively to three-layer extensions (seawater/resistive layer/basement), accurately resolving basement conductivity ($R^2 \approx 0.88$), though thin-layer resolution remains a physical limitation ($R^2 \approx 0.23$). In comparative benchmarks, DualTCN significantly outperforms traditional local optimization methods like Levenberg-Marquardt and L-BFGS-B, yielding a mean $\bar{R}^2 = 0.877$ versus 0.129-0.439 for multi-start baselines, while operating at up to 21,000$\times$ lower computational cost. Finally, the framework incorporates uncertainty quantification via Monte Carlo (MC) Dropout. While well-calibrated for $σ_1$ (PICP90 = 0.944), inherent signal limitations at short offsets (200m) lead to under-coverage for $d_2$ (PICP90 = 0.572), which can be mitigated through post-hoc temperature scaling or split conformal prediction.

266. ❌ Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

作者: Anastasis Kratsios, A. Martina Neuman, Philipp Petersen 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04995v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that persists under ReLU realizability; (c) an advantage that arises only under realizability; and (d) an advantage that disappears under realizability. This demonstrates that representational constraints interact profoundly with the effect of adaptivity.

267. ❌ Delving into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series

作者: Ruichao Guo, Xingyao Han, Luo Wenshui, Zhe Liu, Chen Gong, Hesheng Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04957v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarantee under the exchangeability assumption, which requires the joint data distribution to be unchanged under permutation. However, in graph-structured time series, inherent cross-node coupling can violate the exchangeability condition, making direct application of CP unreliable. Inspired by the spectral graph theory, such coupling resides in global trends and can be characterized by the low-frequency components, while high-frequency components are nearly exchangeable. Therefore, we propose a novel concept named Spectral Graph Conditional Exchangeability (SGCE), which conditions exchangeable high-frequency components on low-frequency ones to preserve global trends and enable effective CP in the spectral domain. Based on SGCE, we further propose Spectral Conformal prediction via wAveLEt transform (SCALE). SCALE uses graph wavelets to decompose low/high-frequency components and conformalizes high-frequency residuals via adaptive gating over a low-frequency embedding. Experimental results on real-world traffic datasets show that SCALE not only achieves valid coverage but also consistently improves the coverage-efficiency trade-off over the state-of-the-art CP methods.

268. ❌ Order-based Rehearsal Learning

作者: Yu-Xuan Tao, Tian-Zuo Wang, Zhi-Hua Zhou 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04955v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Although an order is less informative than a graph, it can be sufficient to identify the influence of decisions from observational data, suggesting that learning the entire graph is not always necessary. To learn the order, we develop an information-theoretic method that imposes no restrictions on the form of structural functions or the type of noise distributions. For AUF decision-making, we construct an order-based sampler to approximate the influence of decisions and, combined with a surrogate objective for maximizing the post-decision success probability, reduce the AUF task to a differentiable optimization problem. Experiments show that our order learning method outperforms existing methods, and that our AUF approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines that are given the true graph.

269. ❌ KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

作者: Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04956v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than method design. Category explains nearly three times more variance in semantic correctness than method (9.4% vs 3.3% explained deviance), and 72% of Fusion tasks fail across all five methods while Math tasks are solved consistently. Second, iterative refinement improves correctness, but not performance. Across GEAK iterations, compile rate rises from 52.3% to 68.8% while average speedup declines from $1.58\times$ to $1.44\times$; newly rescued kernels consistently underperform persistently correct ones ($1.16\times$ vs $1.58\times$ speedup in round~0$\to$1). Third, correctness does not imply efficiency. 46.6% of correct kernels are slower than the PyTorch eager baseline, and cross-hardware speedup variance reaches $21.4\times$. Besides, quantization remains completely unsolved (0/30 successes) despite non-trivial compilation rates, revealing systematic misunderstanding of numerical computation contracts rather than surface-level syntax errors. These findings suggest that future progress depends on handling global coordination, explicitly modeling numerical precision, and incorporating hardware efficiency into generation. The code is available at https://github.com/BonnieW05/KernelBenchX

270. ❌ On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

作者: Koen van der Blom, Diederick Vermetten 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04954v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

271. ❌ Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

作者: Klaus-Rudolf Kladny, Maximilian Mordig, Bernhard Schölkopf, Michael Muehlebach 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04952v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ). In a first stage, AIR-MoE performs coarse shortlisting by assigning tokens to VQ codewords to construct a candidate set of experts. In a second stage, fine scoring computes exact routing scores restricted to this shortlist. This two-stage procedure approximates true top-k routing while avoiding full expert scoring and, in contrast to prior work, imposing no structural constraints on expert parameters. AIR-MoE serves as a drop-in replacement for standard routers and requires no modifications to the model architecture or loss function. We further provide a lower bound on the mass recall achieved by AIR-MoE that yields insights into its inner workings. Empirically, we demonstrate that AIR-MoE achieves improved performance compared to existing routing approaches in granular MoE settings.

272. ❌ Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks

作者: Xuan Qi, Yi Wei, Fanqi Yu, Furao shen, Vittorio Murino, Cigdem Beyan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04946v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.

273. ❌ Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift

作者: Jonathan R. Landers 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04932v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincaré inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deployment path as the governing quantity under explicit along-path regularity and domination assumptions. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regularization (DTR) and a matched monitoring proxy. Rather than smoothing the network isotropically, DTR penalizes sensitivity only along estimated drift directions. We validate the theorem-to-method pipeline in four experiments: a synthetic benchmark for the time-domain inequality, a controlled synthetic comparison against isotropic Jacobian regularization, and two frozen-deployment studies on the UCI Air Quality and Tetouan power-consumption datasets. DTR reduces risk volatility and directional gain in the controlled low-rank regime, beats isotropic smoothing there, and gives validation-selected deployment gains on both real datasets when the Air Quality drift subspace is estimated from target-orthogonal sensor motion. Moderate drift-subspace misspecification is tolerable while orthogonal misspecification largely removes the benefit.

274. ❌ Neural Discovery of Strichartz Extremizers

作者: Nicolás Valenzuela, Ricardo Freire, Claudio Muñoz 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04918v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no systematic numerical attack has been attempted. We propose a simple neural-network-based pipeline that searches for extremizers as critical points of the Strichartz ratio, and apply it in three settings. First, on the Schrödinger group we recover the Gaussian extremizers of Foschi and Hundertmark–Zharnitsky in dimensions $d=1,2$ to within $10^{-3}$ relative error, with no analytical prior. Second, on $59$ further admissible pairs in $d=1$ where the answer is conjectural, the method consistently finds Gaussians, supporting the conjecture that Gaussians are the universal extremizers in the admissible range. Third, on the critical Airy–Strichartz inequality at $γ=1/q$, where existence is open, the optimization does not converge to any $L^2$ profile: instead, the iterates organize themselves as mKdV breathers $B(0,\cdot;α,1,0,0)$ with growing internal frequency $α$, and the discovered ratio approaches the Frank–Sabin universal lower bound $\widetilde A_{q,r}$ from below with a power-law gap $\simα^{-0.9}$. We confirm the same picture with an independent Hermite-basis ansatz. We propose a precise conjecture: the supremum equals $\widetilde A_{q,r}$ and is approached, but not attained, along the breather family. The pipeline thus serves both as a validator on known cases and as a discovery tool when no extremizer exists.

275. ❌ Koopman Identification of Nonlinear Systems via Reservoir Liftings

作者: Weibin Gu, Chen Yang, Lu Shi 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04917v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees well-posedness and favorable numerical conditioning of the lifted Koopman approximation. A correlation-based spectral radius selection algorithm aligns reservoir memory with dominant system timescales. Analysis reveals how the finite memory of the reservoir determines which Koopman eigenfunctions remain observable from the lifted features. Evaluation on synthetic benchmarks demonstrates that RC-Koopman achieves a favorable balance between reconstruction accuracy of the underlying nonlinear dynamics and dynamical stability, compared to Extended Dynamic Mode Decomposition (EDMD) and Hankel-based lifting approaches. Code available at: https://github.com/NEAR-the-future/RC-Koopman.git

276. ❌ Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning

作者: Xinyan Han, Yan Lu, Xiaoyu Lin, Yuanyuan Jiang, Yuanrui Wang, Xuanyue Li, Wenchao Zou, Xingxuan Zhang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04911v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memorization of training samples, thereby weakening privacy protection. This tradeoff arises because small training sets make it difficult for dataset-specific generative models to distinguish generalizable structure from sample-specific patterns. To address this, we propose DiffICL, which formulates tabular data generation as an in-context learning problem. Instead of fitting each dataset from scratch,DiffICL leverages pretrained structural priors learned from a large collection of datasets, enabling it to infer data distributions from limited context rather than memorizing individual samples. We evaluate DiffICL on 14 real-world datasets. Results show that DiffICL improves both data quality and privacy, and generate synthetic data that provides effective data augmentation. Our findings suggest that the quality-privacy tradeoff can be improved through better training paradigms.

277. ❌ Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features

作者: Mehrab Mahdian, Ferenc Ender, Tamas Pardy 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04905v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-structure relationships and to identify the relative importance of processing variables. However, most existing studies rely on a single ML model, implicitly assuming that the resulting feature importance is robust and reproducible. In this study, the consistency of feature importance across multiple ML model families was systematically evaluated using a curated dataset of 96 polyvinyl alcohol (PVA) electrospinning experiments. Twenty-one ML models representing linear, tree-based, kernel-based, neural network, and instance-based approaches were trained and compared. To provide a unified interpretability framework, SHAP (SHapley Additive exPlanations) values were used to calculate feature importance consistently across all models. A rank-based statistical analysis was then performed to quantify inter-model agreement and assess the robustness of parameter rankings. The results demonstrate that predictive performance and interpretive reliability are fundamentally distinct properties. Although several models achieved comparable predictive accuracy, substantial differences were observed in their feature importance rankings. Solution concentration emerged as the most robust and consistently influential parameter (variability = 0), whereas flow rate and applied voltage exhibited high ranking variability (variability > 0.9), indicating strong model dependence. These findings suggest that feature importance derived from a single ML model may be unreliable, particularly for small experimental datasets, and highlight the importance of cross-model validation for achieving trustworthy interpretation in ML-assisted electrospinning research.

278. ❌ A geometric relation of the error introduced by sampling a language model’s output distribution to its internal state

作者: Albert F. Modenbach 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04899v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf instruction-tuned model, with transformations clustering by board region and respecting piece importance. Our findings suggest that token space geometry directly reflects how models internally represent problems.

279. ❌ Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

作者: Noel Thomas 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04895v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from NeurIPS, ICML, ICLR, AISTATS, UAI, TMLR, JMLR, and AutoML-Conf (2022-2025) finds that 98% never vary B/|A| as a controlled axis. On the same GDSC2 benchmark, changing only the budget reverses the ranking: at B=50, Greedy outperforms UCB by 0.050 Hit@1, while at B=100, UCB outperforms Greedy by 0.035. We capture this transition with the Portable Regime Score PRS=(B/|A|)(1-rho), where rho is the prior rank correlation and can be estimated from pilot contexts before the main comparison. Across 79 conditions spanning chemistry, drug-response biology, and HPO, a hierarchical model gives beta=0.50 (p=1.1e-9), and 19% of conditions fall in an equivalence zone where |advantage|<0.01 Hit@1. In five published reversal cases, PRS predicts the winner from pre-comparison observables. A No-Free-Leaderboard proposition explains why unconditional rankings are unstable: when CATE changes sign across regimes, the reported ATE becomes a function of benchmark mixture. RegimePlanner, which estimates rho online and switches acquisition accordingly, wins all 16 HPO-B search spaces at B=100 and exceeds the matched {Greedy,UCB} per-context oracle on GDSC2 by 18%. Pre-registered predictions achieve 27/40=67.5% overall accuracy and above 90% within EMA prior families. The practical protocol is simple: report B/|A|, rho, K, and metric alongside any claimed acquisition advantage.

280. ❌ To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition

作者: Yangchen Yu, Qian Chen, Jia Li, Zhenzhen Hu, Jinpeng Hu, Lizi Liao, Erik Cambria, Richang Hong 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04877v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be mitigated by cross-modal calibration, while severe conflicts arise from intrinsically contradictory (e.g., sarcasm) or misleading signals, for which forced fusion may amplify errors. Recognizing this, we propose Dual-Path Conflict Resolution (DCR), a unified framework that learns when to fuse and when to drop modalities. Path I (Affective Fusion Distiller, AFD) performs reverse distillation from audio/visual teachers to a textual student using temporally weighted class evidence, thereby enhancing representation-level calibration and improving fusion when alignment is beneficial. Path II (Affective Discernment Agent, ADA) formulates MER as a contextual bandit that selects among fusion and unimodal predictions based on a dual-view state and a calibration-aware reward, enabling decision-level arbitration under irreconcilable conflicts without requiring per-modality reliability labels. By taking into account the full multimodal context and coupling soft calibration with hard arbitration, DCR reconciles conflicts that can be aligned while bypassing misleading modalities when fusion is harmful. Across five benchmarks covering both dialogue-level and clip-level MER, DCR consistently outperforms competitive baselines or achieves highly competitive results. Further ablations, conflict-specific subset evaluation, and modality-selection analysis verify that AFD and ADA are complementary and jointly improve robust conflict-aware emotion recognition.

281. ❌ Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations

作者: Zhangyong Liang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04853v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver’s structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Lipschitz contribution remains $\mathcal{O}(τ)$, yielding a Gronwall stability factor bounded uniformly in the step size and independent of the spatial resolution. The network is trained end-to-end through a solver-in-the-loop objective that unrolls the full iteration and penalises trajectory error in a Bourgain-type norm, aligning learning with multi-step solver dynamics rather than isolated one-step targets. Under stated assumptions, the global error satisfies $C(\varepsilon_{net}+δ),τ^γ\ln(1/τ)$, where $\varepsilon_{net}$ measures the network approximation quality and $δ$ the training shortfall. Experiments on three dispersive benchmarks with rough data show that HIN-LRI improves accuracy over analytical integrators, splitting methods, and neural PDE surrogates, with stable spatial refinement, effective out-of-distribution transfer, and modest online overhead.

282. ❌ PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data

作者: Thomas S. Robinson, Ranjit Lall 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04838v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The standard constraint-based paradigm for causal discovery with incomplete data – impute first, test second – is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compares cross-validated models that include and exclude the candidate variable while receiving the same imputed conditioning set, forcing imputation error to cancel in their loss difference rather than contaminate the test statistic. A provably consistent variance estimator jointly accounts for uncertainty arising from cross-validation and multiple imputation – to our knowledge, the first formal unification of these two inferential frameworks. In simulations, existing imputation-based CI tests exhibit false positive rates of 28–45% when data are missing not at random (MNAR), whereas PAIR-CI averages below the nominal 5% level across data-generating processes and missingness mechanisms. These gains are largest in nonlinear settings and grow with causal graph size: when integrated into the PC algorithm, PAIR-CI reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable equivalents, and up to 44% on the 56-variable HAILFINDER network, with stable performance in all settings.

283. ❌ Bridging Input Feature Spaces Towards Graph Foundation Models

作者: Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua, Bruno Ribeiro, Carola-Bibiane Schönlieb 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04834v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across datasets, limiting their use as foundation models. In this work, we propose ALL-IN, a simple and theoretically grounded method that enables transferability across datasets with different input features. Our approach projects node features into a shared random space and constructs representations via covariance-based statistics, thus eliminating dependence on the original feature space. We show that the computed node-covariance operators and the resulting node representations are invariant in distribution to permutations of the input features. We further demonstrate that the expected operator exhibits invariance to general orthogonal transformations of the input features. Empirically, ALL-IN achieves strong performance across diverse node- and graph-level tasks on unseen datasets with new input features, without requiring architecture changes or retraining. These results point to a promising direction for input-agnostic, transferable graph models.

284. ❌ Replay-Based Continual Learning for Physics-Informed Neural Operators

作者: Yizheng Wang, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04832v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address this issue, this work introduces continual learning into physics-informed neural operators, with particular emphasis on neural operators built upon the Transolver architecture, and proposes a simple yet effective replay-based continual learning strategy. The proposed method is fully physics-informed and does not require labeled data, relying solely on input fields together with physical constraints for training. When new OOD data become available, a small number of past data are incorporated through a distillation-based constraint to preserve previously acquired knowledge and alleviate catastrophic forgetting. Meanwhile, a transfer learning LoRA is employed to enable rapid adaptation to the new data. The proposed framework is systematically validated on three representative physical problems, including the Darcy flow problem in fluid mechanics, a two-dimensional hyperelastic brain tumor problem in biomechanics, and a three-dimensional linear elastic Triply Periodic Minimal Surfaces problem in solid mechanics. The results demonstrate that the proposed method effectively mitigates catastrophic forgetting on previously learned data while maintaining fast adaptability to new data. Compared with conventional joint training strategies, the proposed method significantly improves training efficiency while reducing additional memory usage and computational cost.

285. ❌ Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models

作者: Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04830v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.

286. ❌ Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity

作者: Junxiang Wu, Zhiqiang Kou, Hongwei Zeng, Wenke Huang, Biao Liu, Hanlin Gu, Yuheng Jia, Di Jiang, Yang Liu, Xin Geng, Qiang Yang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04827v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noisy. Motivated by privacy-sensitive applications, we study Federated Label Distribution Learning (Fed-LDL), where data isolation further induces heterogeneous annotation quality across clients, making local updates unevenly reliable and breaking sample-size-based aggregation (e.g., FedAvg). To address this trust dilemma, we propose FedQual, a quality-aware Fed-LDL framework with two coupled mechanisms: (i) quality-adaptive client training guided by a global semantic anchor that calibrates low-quality clients while preserving high-quality autonomy, and (ii) reliability-aware server aggregation that reweights client contributions by effective reliable information rather than raw sample size. To enable rigorous evaluation, we construct four new Fed-LDL benchmarks (FER-LDL, FI-LDL, PIPAL-LDL, and KADID-LDL) with controlled annotation quality disparity. We further provide a theoretical guarantee showing that under heterogeneous supervision quality, client-specific calibration is strictly better than any uniform calibration. Extensive experiments on the proposed benchmarks demonstrate the effectiveness of FedQual.

287. ❌ Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding

作者: Taiga Hayashi, Yuya Seki, Kotaro Terada, Yosuke Mukasa, Shuta Kikuchi, Shu Tanaka 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04825v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous variables via one-hot encoding, uniform random initial sampling can leave many binary variables never active in the initial training data, and the corresponding FM parameters receive no direct gradient updates from the observed responses. We address this by designing the initial training data to achieve complete marginal bit coverage, namely, ensuring that every binary variable obtained by one-hot encoding takes the value one at least once. We use two space-filling sampling methods, Latin hypercube sampling (LHS) and the Sobol’ sequence, yielding LHS-FMQA and Sobol’-FMQA. On the human-powered aircraft wing-shape optimization benchmark with 17 and 32 design variables, both proposed methods achieved numerically higher mean final cruising speeds than the baseline FMQA, with the advantage more pronounced on the 32-variable problem.

288. ❌ Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs

作者: Zhenchao Sun, Shuai Ma, Ping Lu, Chongyang Tao 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04819v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in modeling intrinsic polarity-related properties of SAT, such as the complementary relationship between the positive and negative literals of a variable. To address these limitations, we propose a polarity-aware representation learning framework over clause-literal hypergraphs. We model SAT formulas as clause-literal hypergraphs augmented with a clause incidence graph to capture higher-order structural interactions. We then introduce a polarity-aware decomposed mechanism that separates variable representations into polarity invariant and equivariant components, explicitly modeling the relationship between positive and negative literals, with the resulting literal representations propagated along the hypergraph structure. We further incorporate a polarity-inversion consistency regularization to reinforce polarity-consistent representations during training. Experimental results on multiple SAT datasets demonstrate the effectiveness of the proposed approach.

289. ❌ A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction

作者: Wenjing Liu, Yujia Lei, Qu Wang 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04813v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic dependencies in user-service interactions, which limits their prediction performance. To address this issue, this paper proposes a dynamic QoS prediction framework based on the Biased Nonnegative Block Term Tensor Decomposition Model, termed BNBT. Specifically, the proposed framework is developed from three aspects: (1) block term tensor decomposition is employed to enhance the representation capability of latent feature learning; (2) linear bias terms are incorporated to further improve prediction accuracy; and (3) a tensor-oriented single-element-dependent nonnegative multiplicative update algorithm, called SLF-NMUT, is designed for efficient parameter estimation. Extensive experiments on real-world QoS datasets demonstrate that the proposed BNBT framework consistently outperforms several state-of-the-art QoS prediction methods in terms of prediction accuracy.

290. ❌ Bilinear Mamba-Koopman Neural MPC for Varying Dynamics

作者: Matan Pagi, Zohar Sorek 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04793v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution. We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than 1% parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone-descent and KKT convergence results under standard trust-region assumptions. Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provide a simple and effective mechanism for robust MPC under varying conditions.

291. ❌ AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures

作者: Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04754v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed multipliers (including one exact baseline) from the EvoApproxLib library. Results show that, without retraining, the Dense baseline is the most resilient topology across all CNN architectures, whereas on ViT-Small, all topologies degrade at comparable rates regardless of routing strategy. After approximate-aware retraining, recovery varies substantially across architectures, topologies, and multipliers. ResNet-20 achieves full recovery across the entire multiplier range, whereas VGG architectures recover at moderate multipliers but fail irreversibly at aggressive ones for all topologies except Cluster MoE on VGG11_bn; on ViT-Small, Hard MoE outperforms Dense under aggressive approximation at equal normalized inference cost. These results pave the way for future approximate MoE hardware-software co-design strategies.

292. ❌ MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction

作者: Aike Potze, Fred van Eeuwijk, Ioannis N. Athanasiadis 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04744v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local growing conditions, thereby shifting the relative performance of crop genotypes. Predicting these relative changes in yield is critical for food security. Yet, this problem remains an open challenge in plant breeding, and relatively unexplored within the AI community. We propose MixINN, an approach that first isolates high-quality genotype-environment interaction labels using mixed models, and then predicts these interactions for new crop varieties in future environmental conditions with a deep neural network. We evaluate our method on a corn multi-environment trial across the continental United States and show improved prediction of genotype ranking over current plant breeding methods. MixINN demonstrated superior performance in identifying the 20% most productive corn genotypes, leading to a 5.8% higher average yield, which further improved to 7.2% when targeting specific growing environments. These are competitive results for real-world breeding programs, demonstrating the potential of AI research in accelerating the development of climate-adapted crops, and improving future food security under climate change.

293. ❌ OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization

作者: Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04738v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and rotation, attempt to address this issue, the performance remains unsatisfactory. In this paper, we propose Outlier Self-Absorption Quantization (OSAQ), which performs additive weight suppression guided by the second-order low-rank property for low-bit weight-only quantization of LLMs. Specifically, we observe that the Hessian exhibits low-rank consistency across different inputs, with certain directions consistently showing vanishing curvature. Leveraging this property, we identify a stable null space of the Hessian and then construct an additive weight transformation by linearly combining the vectors within this null space, thereby suppressing weight outliers without affecting the task loss. This additive transformation can be absorbed into the weights offline, requiring no inter-layer transformations and introducing no inference overhead. Moreover, the construction is efficiently achieved by a closed-form solution, without resource-intensive training or iterative procedures. Extensive experiments demonstrate that OSAQ effectively suppresses outliers and enhances low-bit quantization performance. For instance, in 2-bit quantization, OSAQ, when integrated with GPTQ, achieves over 40% lower perplexity compared to vanilla GPTQ.

294. ❌ Using Common Random Numbers for Simulation-based Planning with Rollouts

作者: Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04732v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.

295. ❌ Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols

作者: Jaewook Kim, Hyeoncheol Kim 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04727v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction practices. We identify issues in attention dimension settings that affect performance estimates, and demonstrate that improper ordering of student attempts, such as ignoring ServerTimestamp, can violate temporal causality and lead to overly optimistic results. To ensure consistent evaluation, hyperparameters are selected via grid search guided by a single designated fold and then fixed uniformly across all folds during cross-validation. We further analyze the role of assignment-wise characteristics and systematically explore the impact of maximum sequence length. Using this protocol, we re-evaluate PKT models on the CodeWorkout dataset. Our results show that, under controlled and consistent settings, the performance gap between attention-enhanced models and standard DKT is significantly reduced, and increased architectural complexity does not consistently translate into superior performance. Beyond individual model comparisons, this work provides practical guidance for reliable and comparable evaluation in programming knowledge tracing.

296. ❌ Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation

作者: Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04723v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user histories. This constraint restricts the model’s capacity to fully capture long-term user preferences. In some scenarios, modeling item interactions purely through attention may also not be the most effective approach to extract sequential patterns. In this work, we propose ConvRec, an alternative method with linear computational and memory complexity that employs convolutional layers in a hierarchical, down-scaled fashion to generate compact, yet expressive sequence representations. To further enhance the model’s ability to capture diverse sequential patterns, each layer aggregates the neighboring items gradually to reach a comprehensive sequence representation. Extensive experiments on four real-world datasets demonstrate that our approach outperforms state-of-the-art sequential recommendation models, highlighting the potential of convolution-based architectures for efficient and effective sequence modeling in recommendation systems. Our implementation code and datasets are available here https://github.com/ismll-research/ConvRec.

297. ❌ SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

作者: Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04712v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.

298. ❌ ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC

作者: Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04709v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner’s long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.

299. ❌ A Zero-Inflated Beta Mixture Model for Marginal Mediation Analysis with Compositional Microbiome Mediators

作者: Seungjun Ahn, Quran Wu, Alicia Yang, Zhigang Li 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04372v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

The role of the microbiome in disease pathogenesis is an emerging field with strong evidence suggesting that dysbiosis is associated with precancerous and cancerous states. Microbiome data present substantial challenges for causal mediation analysis due to sparsity, compositional constraints, and latent heterogeneity. To address these issues, we propose a zero-inflated beta mixture (ZIBM) method for mediation analysis with compositional microbiome mediators. The proposed method accommodates excess zeros through a zero-inflation component and captures heterogeneity in non-zero relative abundances using a beta mixture distribution. Within the potential-outcomes framework, the ZIBM provides estimates of marginal microbiome-mediated causal effects, and model parameters are estimated using an expectation-maximization algorithm. Simulation studies demonstrate that the ZIBM yields more accurate estimation and reliable inference under conditions commonly observed in microbiome data, compared with existing approaches. An application to a real microbiome study further illustrates its practical utility. These results indicate that the proposed method provides a more flexible and robust statistical framework for mediation analysis involving compositional microbiome data.

300. ❌ Benchmarking open-source tools for in silico antiviral drug discovery

作者: Daniel C. Elton, Preston W. Estep 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04265v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom dataset we fine-tuned the DrugFormDTA binding affinity prediction model (Khokhlov et al. 2025). We then benchmarked 15 open-source binding affinity prediction tools on a custom test set of 853 antiviral compounds spread across 16 different protein targets from 10 virus species. Models tested include Boltz-2, GNINA, FlowDock, Interformer, AutoDock-GPU, and others. We found that Boltz-2 and DrugFormDTA ranked highest overall among ML-based approaches, and GNINA did best among docking approaches, with notable variance across specific viral proteins. Fine-tuning DrugFormDTA on our custom cleaned antiviral dataset boosted performance from $r=0.5$ to $r=0.7$. As part of this work we also compiled a library of approved drugs and a comprehensive list of investigational and approved antiviral drugs that can be viewed at https://antivirals-database.radvac.org. Together, this work provides a foundation for future work towards new tools and platforms for rapid drug repurposing and rapid design of antiviral combinations.

301. ❌ ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation

作者: Cong Liu, Milong Ren, Jiaqi Guan, Chengyue Gong, Jinyuan Sun, Xinshi Chen, Wenzhi Xiao 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04118v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce ProtDBench, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.

302. ❌ Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction

作者: Emil Sharafutdinov, Ingemar André 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04119v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Ancestral sequence reconstruction (ASR) aims to infer extinct protein sequences at internal nodes of a phylogenetic tree. Classical ASR methods are typically based on continuous-time Markov substitution models, but they treat sites largely independently and handle insertions and deletions only weakly or not at all. We introduce a tree-conditioned edit-flow model for variable-length ASR. Given two descendant sequences and their branch distances to a shared ancestor, the model reconstructs the ancestor through paired bidirectional edit trajectories constrained to agree on a common ancestral state. On a benchmark of experimentally evolved sequences with only context-independent substitutions, the model does not match the accuracy of the best classical method, yet still achieves reasonable performance despite being trained on natural sequences that include insertions, deletions, and substitutions. On a benchmark of natural homologous sequences with abundant insertions and deletions, the model most accurately localizes inferred evolutionary change.

303. ❌ Multistate Coupled Diabatic Neural Network potential for the quantum non-adiabatic Photofragmentation of CH$_2^+$

作者: Pablo del Mazo-Sevillano, Susana Gomez-Carrasco, Alfredo Aguado, Octavio Roncero 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04756v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Tracking the complex non-adiabatic transitions in far-ultraviolet photodissociation demands highly accurate diabatic potential energy matrices (PEMs) across numerous excited states. To address this, we introduce a fully automated diabatization method that leverages artificial neural networks to fit PEMs. Our approach divides the PEM into a physically grounded zeroth-order diagonal term, which is then corrected by a neural network matrix to capture electronic couplings. By enforcing symmetry constraints on off-diagonal elements and sharing degenerate diabatic states between the $A’$ and $A’’$ irreducible representations, the { diabatization} process becomes completely automatic. We validate this method using time-dependent wavepacket calculations to simulate the photodissociation of CH$_2^+$, incorporating relevant states up to $\approx 13.6$~eV. Finally, we compute partial cross-sections for all fragmentation channels – including total and partial fragmentation yielding \ce{CH+}, \ce{CH}, \ce{H2}, and \ce{H2+} diatoms – revealing a notably high cross-section for the formation of the \ce{CH} radical.

304. ❌ On Electropolymerized Fingerprints and their Potential for Identification and Encryption

作者: Antoine Baron, Luc Brulin, Corentin Scholaert, Yannick Coffinier, Fabien Alibart, Sébastien Pecqueur 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04599v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

While human technology is ruled by determinism, biological systems exploit a subtle balance of control and stochasticity. This balance, evident in the morphogenesis of textural patterns imprinted on leaves, fur or skin can help hierarchize organisms both as a representative of their species and as unique individuals. In this study, we identified that, by exploiting electrochemistry, it is possible to generate such versatile but specific textures, to imprint patterns of a conducting polymer on a conducting substrate. It is shown that the 1D morphogenesis of conducting polymer dendrites on wires translates, on 2D surfaces, as highly heterogeneous coatings of dark spots, rosettes or marbled patterns. Despite their inherent stochasticity, these patterns are characteristic of the physical conditions they grew in, and particularly of the chemical content of the electroactive solution used for their electropolymerization. A statistical study demonstrates that these patterns could be used as fingerprints to physically tag the identity of a solution within a specific class. By the identification of a new electrochemical process which allows generating physical fingerprints with optical, electrical and chemical contrast on an electrode, this research paves the way toward a disruptive low-cost technology which could allow any end-user to generate personal tags on a glass slide or on a micro-chip, to engrave physically-encrypted personal information for various applications.

305. ❌ Angular Gausslets

作者: Steven R. White 期刊/来源: arxiv 发布日期: 2026-05-06 arXiv链接: http://arxiv.org/abs/2605.04517v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Gausslets are one of the few basis constructions for electronic structure that combine locality, orthonormality, variable resolution, and an accurate diagonal approximation for the electron-electron interaction, but the original construction is tied to one dimension. Radial gausslets extended this idea to atoms while leaving the angular degrees of freedom in spherical harmonics, so the atomic interaction remained only partially diagonal in the combined basis. Here we introduce generalized gausslets on the sphere and combine them shell by shell with radial gausslets to form an atom-centered basis in which the electron-electron interaction takes a two-index integral-diagonal form. The angular basis starts from localized spherical Gaussians and uses injection to make a low-$\ell$ spherical-harmonic subspace exact. Tests of the kinetic spectrum, low-$\ell$ Coulomb matrix elements, spherium, first-row Hartree–Fock calculations, and He exact diagonalization show systematic convergence with increasing angular resolution. We also develop DMRG methods for this basis, including compact MPOs, correlated small-space starting states, Givens-rotation transfers between nearby angular sizes, and embedded sampled variance extrapolation (ESVE). We show that this combination of ingredients can be used to solve the Be atom, with extrapolations in the number of angular functions but with fixed radial resolution, to within about 0.1 mH of the complete basis set limit exact energy. This shows that DMRG calculations of first row atoms which include both static and accurate dynamic correlation on the same footing are feasible.

306. ❌ Macromolecular tribology at flowing solid/liquid interfaces

作者: Malo Velay, Jean Comtet 期刊/来源: arxiv 发布日期: 2026-05-05 arXiv链接: http://arxiv.org/abs/2605.04318v1

评分: 0.0 / 26.6 ❌

评分详情

关键词	权重	相关度
“Large Language Models” OR “LLMs” OR “Foundation Models”	1.0	0.0/10
“Mixture of Experts” OR “MoE” OR “Sparse Models”	1.0	0.0/10
“Small Language Models” OR “SLMs” OR “On-device AI”	1.0	0.0/10
“Scaling Laws” AND “Data Quality”	1.0	0.0/10
“Pre-training” OR “Continual Pre-training” OR “Domain Adaptation”	1.0	0.0/10
“Post-training” OR “Supervised Fine-tuning” OR “SFT”	1.0	0.0/10
“Instruction Tuning” OR “Alignment” OR “Value Alignment”	1.0	0.0/10
“RLHF” OR “RLAIF” OR “Direct Preference Optimization” OR “DPO”	1.0	0.0/10
“PEFT” OR “LoRA” OR “Parameter-efficient Fine-tuning”	1.0	0.0/10
“Retrieval-Augmented Generation” OR “RAG” OR “Retrieval-Generation”	1.0	0.0/10
“Context Window Extension” OR “Long Context LLMs”	1.0	0.0/10
“KV Cache Compression” OR “Linear Attention” OR “FlashAttention”	1.0	0.0/10
“Chain of Thought” OR “CoT Reasoning” OR “Multi-step Reasoning”	1.0	0.0/10
“System 2 Thinking” OR “Slow Thinking” OR “In-depth Reasoning”	1.0	0.0/10
“Monte Carlo Tree Search” OR “MCTS” AND “LLM”	1.0	0.0/10
“Self-Correction” OR “Self-Improvement” OR “Self-Reflection”	1.0	0.0/10
“LLM Agents” OR “Autonomous Agents” OR “Agentic Workflow”	1.0	0.0/10
“Tool Use” OR “Function Calling” OR “API Tool Use”	1.0	0.0/10
“Multi-agent Systems” OR “Agent Coordination”	1.0	0.0/10
“Quantization” OR “Model Compression” OR “Low-bit Weights”	1.0	0.0/10
“Speculative Decoding” OR “Inference Acceleration”	1.0	0.0/10
“Hallucination Mitigation” OR “Factuality” OR “Truthfulness”	1.0	0.0/10
“Mechanistic Interpretability” OR “Explainable AI”	1.0	0.0/10
“World Models” AND “General World Models”	1.0	0.0/10
“Model Merging” OR “Model Soups” OR “Weight Averaging”	1.0	0.0/10
“In-context Learning” OR “Many-shot Learning”	1.0	0.0/10
“AI for Science” OR “Bioinformatics” OR “Cheminformatics”	1.0	0.0/10

评分理由: 评分失败: Error code: 402 - {’error’: {‘message’: ‘Insufficient Balance’, ’type’: ‘unknown_error’, ‘param’: None, ‘code’: ‘invalid_request_error’}}

!!! tip deepseek-chat TL;DR

评分失败，无法生成摘要

摘要 (Abstract)

Molecular-scale interactions between solvated macromolecules and solid surfaces govern a large number of processes, from biology to engineering. Yet, despite extensive characterization at the macroscopic level, our molecular understanding of polymer/surface interactions remains limited, particularly under out-of-equilibrium conditions. Here, we combine wide-field single-molecule microscopy with microfluidic transport to directly track the nanoscale dynamics of individual fluorescently tagged macromolecular PEG adsorbates, and investigate their subtle couplings with interfacial hydrodynamic flows. At equilibrium, we evidence marked surface dependence, with macromolecular dynamics switching from heterogeneous non-Brownian diffusion on hydrophilic glass to bidimensional Brownian-like transport in an interfacial physisorbed state on hydrophobic self-assembled monolayers. While for hydrophilic glass, the effect of the flow is restricted to an advective contribution during solvent-mediated flights, we uncover for the hydrophobic surfaces a peculiar regime of mixed macromolecular friction, whereby the adsorbed chain rubs on the solid wall while being continuously dragged by the near-surface hydrodynamic flow through interfacial slippage. Through joint analysis of equilibrium and out-of-equilibrium transport, we finely disentangle these molecular level frictional interactions with both the solid surface and the interfacial liquid. Beyond population-averaged dynamics, we further unveil a broad distribution of friction coefficients associated to individual chains, which we attribute conformational heterogeneities with sluggish reorganization timescale. By enabling direct observations of molecular-scale interfacial dynamics, our approach provides a novel molecular picture of macromolecular friction and adsorbate/surface interactions at flowing solid/liquid interfaces.

Token 消耗统计

总计: 1,330,854 tokens（输入 1,330,854 / 输出 0）