先进制造业知识服务平台
国家科技图书文献中心机械分馆 工信部产业技术基础公共服务平台 国家中小企业公共服务示范平台
主页
外文期刊
OA 期刊
电子期刊
外文会议
中文期刊
标准
网络数据库
专业机构
高级检索
关于我们
版权声明
使用帮助
会议文集
文集名
AAAI Special Track (AI Alignment)
会议名
39th AAAI Conference on Artificial Intelligence (AAAI-25), 37th Conference on Innovative Applications of Artificial Intelligence (IAAI-25), 15th Symposium on Educational Advances in Artificial Intelligence (EAAI-25)
中译名
《第三十九届AAAI人工智能会议,第三十七届人工智能创新应用会议,第十五届人工智能教育进展讨论会,卷26》
机构
Association for the Advancement of Artificial Intelligence (AAAI)
会议日期
25 February - 4 March 2025
会议地点
Philadelphia, Pennsylvania, USA
出版年
2025
馆藏号
358223
题名
作者
出版年
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
Somnath Banerjee; Sayan Layek; Soham Tripathy; Shanu Kumar; Animesh Mukherjee; Rima Hazra
2025
Bridging the Knowledge Gap: Understanding User Expectations for Trustworthy LLM Standards
Michaela Benk; Leane Wettstein; Nadine Schlicker; Florian von Wangenheim; Nicolas Scharowski
2025
Scaling Trends for Data Poisoning in LLMs
Dillon Bowen; Brendan Murphy; Will Cai; David Khachaturov; Adam Gleave; Kellin Pelrine
2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
Benedikt Bruckner; Alessio Lomuscio
2025
Risk Controlled Image Retrieval
Kaiwen Cai; Chris Xiaoxuan Lu; Xingyu Zhao; Wei Huang; Xiaowei Huang
2025
Political Bias Prediction Models Focus on Source Cues, Not Semantics
Selin Chun; Daejin Choi; Taekyoung Kwon
2025
Searching for Unfairness in Algorithms' Outputs: Novel Tests and Insights
Ian Davidson; S. S. Ravi
2025
In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search
Emir Demirovic; Christian Schilling; Anna Lukina
2025
Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution
Carlos Eiras-Franco; Anna Hedstrom; Marina M. -C. Hohne
2025
Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning
Karl Elbakian; Samuel Carton
2025
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng; Bowen Qin; Chen Huang; Youcheng Huang; Zheng Zhang; Wenqiang Lei
2025
SMLE: Safe Machine Learning via Embedded Overapproximation
Matteo Francobaldi; Michele Lombardi
2025
MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector
Wenjie Fu; Huandong Wang; Chen Gao; Guanghua Liu; Yong Li; Tao Jiang
2025
The Partially Observable Off-Switch Game
Andrew Garber; Rohan Subramani; Linus Luu; Mark Bedaywi; Stuart Russell; Scott Emmons
2025
UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models
Zihan Guan; Mengxuan Hu; Sheng Li; Anil Kumar Vullikanti
2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta; Ryan Sullivan; Yunxuan Li; Samrat Phatale; Abhinav Rastogi
2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
Xiaomeng Hu; Pin-Yu Chen; Tsung-Yi Ho
2025
Joint Scoring Rules: Competition Between Agents Avoids Performative Prediction
Rubi Hudson
2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang; Zhangchen Xu; Luyao Niu; Bill Yuchen Lin; Radha Poovendran
2025
Dynamic Algorithm Termination for Branch-and-Bound-based Neural Network Verification
Konstantin Kaulen; Matthias Konig; Holger H. Hoos
2025
1
2
3
4
国家科技图书文献中心
全球文献资源网
京ICP备05055788号-26
京公网安备11010202008970号 机械工业信息研究院 2018-2025