先进制造业知识服务平台
国家科技图书文献中心机械分馆  工信部产业技术基础公共服务平台  国家中小企业公共服务示范平台

会议文集


文集名AAAI Special Track (AI Alignment)
会议名39th AAAI Conference on Artificial Intelligence (AAAI-25), 37th Conference on Innovative Applications of Artificial Intelligence (IAAI-25), 15th Symposium on Educational Advances in Artificial Intelligence (EAAI-25)
中译名《第三十九届AAAI人工智能会议,第三十七届人工智能创新应用会议,第十五届人工智能教育进展讨论会,卷26》
机构Association for the Advancement of Artificial Intelligence (AAAI)
会议日期25 February - 4 March 2025
会议地点Philadelphia, Pennsylvania, USA
出版年2025
馆藏号358223


题名作者出版年
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language ModelsSomnath Banerjee; Sayan Layek; Soham Tripathy; Shanu Kumar; Animesh Mukherjee; Rima Hazra2025
Bridging the Knowledge Gap: Understanding User Expectations for Trustworthy LLM StandardsMichaela Benk; Leane Wettstein; Nadine Schlicker; Florian von Wangenheim; Nicolas Scharowski2025
Scaling Trends for Data Poisoning in LLMsDillon Bowen; Brendan Murphy; Will Cai; David Khachaturov; Adam Gleave; Kellin Pelrine2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised KernelsBenedikt Bruckner; Alessio Lomuscio2025
Risk Controlled Image RetrievalKaiwen Cai; Chris Xiaoxuan Lu; Xingyu Zhao; Wei Huang; Xiaowei Huang2025
Political Bias Prediction Models Focus on Source Cues, Not SemanticsSelin Chun; Daejin Choi; Taekyoung Kwon2025
Searching for Unfairness in Algorithms' Outputs: Novel Tests and InsightsIan Davidson; S. S. Ravi2025
In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via SearchEmir Demirovic; Christian Schilling; Anna Lukina2025
Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality DistributionCarlos Eiras-Franco; Anna Hedstrom; Marina M. -C. Hohne2025
Retrieving Versus Understanding Extractive Evidence in Few-Shot LearningKarl Elbakian; Samuel Carton2025
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference DatasetsDuanyu Feng; Bowen Qin; Chen Huang; Youcheng Huang; Zheng Zhang; Wenqiang Lei2025
SMLE: Safe Machine Learning via Embedded OverapproximationMatteo Francobaldi; Michele Lombardi2025
MIA-Tuner: Adapting Large Language Models as Pre-training Text DetectorWenjie Fu; Huandong Wang; Chen Gao; Guanghua Liu; Yong Li; Tao Jiang2025
The Partially Observable Off-Switch GameAndrew Garber; Rohan Subramani; Linus Luu; Mark Bedaywi; Stuart Russell; Scott Emmons2025
UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion ModelsZihan Guan; Mengxuan Hu; Sheng Li; Anil Kumar Vullikanti2025
Robust Multi-Objective Preference Alignment with Online DPORaghav Gupta; Ryan Sullivan; Yunxuan Li; Samrat Phatale; Abhinav Rastogi2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language ModelsXiaomeng Hu; Pin-Yu Chen; Tsung-Yi Ho2025
Joint Scoring Rules: Competition Between Agents Avoids Performative PredictionRubi Hudson2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat TemplatesFengqing Jiang; Zhangchen Xu; Luyao Niu; Bill Yuchen Lin; Radha Poovendran2025
Dynamic Algorithm Termination for Branch-and-Bound-based Neural Network VerificationKonstantin Kaulen; Matthias Konig; Holger H. Hoos2025
1234