Paper Digest: NeurIPS 2024 Papers & Highlights
论文摘要:NeurIPS 2024 论文与亮点
The Conference on Neural Information Processing Systems (NeurIPS) is one of the top machine learning conferences. In 2024, it is to be held in Vancouver. To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights to quickly get the main idea of each paper.
神经信息处理系统会议(NeurIPS)是顶级机器学习会议之一。2024 年,该会议将在温哥华举行。为了帮助社区快速了解本次会议上展示的工作,论文摘要团队处理了所有被接受的论文,并为每篇论文生成了一句亮点句子(通常是主要主题)。鼓励读者阅读这些机器生成的亮点,以快速获取每篇论文的主要思想。
Note: NeurIPS-2024 accepts more than 4,500 papers, this page only includes 500 of them selected by our daily paper digest ranking algorithm. To browse all accepted papers or learn more about the NeurIPS-2024 statistics, readers can read All 4,500 NeurIPS-2024 accepted papers in a separate page, which takes quite some time to load. On this pape, readers are also able to filter papers by keywords. For example, using ‘related code’ as the filter keyword will produce a list of all papers with code available to download.
注意:NeurIPS-2024 接收超过 4,500 篇论文,本页面仅包含我们每日论文摘要排名算法选出的 500 篇。要浏览所有被接受的论文或了解更多关于 NeurIPS-2024 的统计信息,读者可以在单独的页面上阅读所有 4,500 篇被接受的 NeurIPS-2024 论文,这需要一些时间来加载。在本页面上,读者还可以通过关键词筛选论文。例如,使用“相关代码”作为筛选关键词将生成所有可下载代码的论文列表。
To search or review papers within NIPS-2024 related to a specific topic, please use the search by venue (NIPS-2024), review by venue (NIPS-2024) and question answering by venue (NIPS-2024) services. To browse papers by author, here is a list of all ~17,000 authors (NIPS-2024). You may also like to explore our “Best Paper” Digest (NeurIPS), which lists the most influential NeurIPS papers since 1987.
要搜索或审查与特定主题相关的 NIPS-2024 论文,请使用按会议搜索(NIPS-2024)、按会议审查(NIPS-2024)和按会议问答(NIPS-2024)服务。要按作者浏览论文,这里有大约 17,000 位作者的列表(NIPS-2024)。您还可以查看我们的“最佳论文”摘要(NeurIPS),其中列出了自 1987 年以来最具影响力的 NeurIPS 论文。
This list is created by the Paper Digest Team. Experience the cutting-edge capabilities of Paper Digest, an innovative AI-powered research platform that empowers you to write, review, get answers and more. Try us today and unlock the full potential of our services for free!
此列表由 Paper Digest 团队创建。体验 Paper Digest 的前沿能力,这是一种创新的人工智能驱动的研究平台,能够帮助您撰写、审阅、获取答案等。今天就试用我们,免费解锁我们服务的全部潜力!
TABLE 1: Paper Digest: NeurIPS 2024 Papers & Highlights
表 1:论文摘要:NeurIPS 2024 论文与亮点
Paper 论文 | Author(s) 作者 | |
---|---|---|
1 | SGLang: Efficient Execution of Structured Language Model Programs SGLang:高效执行结构化语言模型程序 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce SGLang, a system for efficient execution of complex language model programs. 亮点:我们介绍 SGLang,一个用于高效执行复杂语言模型程序的系统。 |
Lianmin Zheng; Liangsheng Yin; Zhiqiang Xie; Chuyue (Livia) Sun; Jeff Huang; Cody Hao Yu; Shiyi Cao; Christos Kozyrakis; Ion Stoica; Joseph Gonzalez; Clark Barrett; Ying Sheng; |
2 | You Don’t Need Data-Augmentations in Self-Supervised Learning 在自监督学习中你不需要数据增强 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we challenge the importance of invariance and data-augmentation in JEAs at scale. 亮点:在这项工作中,我们质疑了不变性和数据增强在大规模 JEAs 中的重要性。 |
Théo Moutakanni; Maxime Oquab; Marc Szafraniec; Maria Vakalopoulou; Piotr Bojanowski; |
3 | The Mamba in The Llama: Distilling and Accelerating Hybrid Models 《骆驼中的曼巴:提炼和加速混合模型》 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Recent research suggests that state-space models (SSMs) like Mamba can be competitive with Transformer models for language modeling with advantageous deployment characteristics. Given the focus and expertise on training large-scale Transformer models, we consider the challenge of converting these pretrained models into SSMs for deployment. 亮点:最近的研究表明,像 Mamba 这样的状态空间模型(SSMs)在语言建模方面可以与 Transformer 模型竞争,并具有有利的部署特性。鉴于我们在训练大规模 Transformer 模型方面的专注和专业知识,我们考虑将这些预训练模型转换为 SSMs 以便部署的挑战。 |
Junxiong Wang; Daniele Paliotta; Avner May; Alexander Rush; Tri Dao; |
4 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision FlashAttention-3:快速准确的异步低精度注意力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We develop three main techniques to speed up attention on Hopper GPUs: exploiting asynchrony of the Tensor Cores and TMA to (1) overlap overall computation and data movement via warp-specialization and (2) interleave block-wise matmul and softmax operations, and (3) block quantization and incoherent processing that leverages hardware support for FP8 low-precision. 亮点:我们开发了三种主要技术来加速 Hopper GPU 上的注意力机制:利用 Tensor Cores 和 TMA 的异步性,(1) 通过 warp-specialization 重叠整体计算和数据移动,(2) 交错块级矩阵乘法和 softmax 操作,以及(3) 块量化和不一致处理,利用硬件对 FP8 低精度的支持。 |
Jay Shah; Ganesh Bikshandi; Ying Zhang; Vijay Thakkar; Pradeep Ramani; Tri Dao; |
5 | Improving Alignment and Robustness with Short Circuiting 通过短路技术提高对齐性和鲁棒性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: AI systems are can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that short-circuits models as they respond with harmful outputs. 亮点:AI 系统可能采取有害行动,并且对对抗性攻击高度脆弱。我们提出了一种方法,受到最近在表示工程方面的进展的启发,该方法在模型以有害输出作出响应时进行短路处理。 |
Andy Zou; Long Phan; Justin Wang; Derek Duenas; Maxwell Lin; Maksym Andriushchenko; J. Zico Kolter; Matt Fredrikson; Dan Hendrycks; 安迪·周; 龙·潘; 贾斯廷·王; 德里克·杜埃纳斯; 麦克斯韦·林; 马克西姆·安德留申科; J. Zico Kolter; 马特·弗雷德里克森; 丹·亨德里克斯; |
6 | Repurposing Language Models Into Embedding Models: Finding The Compute-Optimal Recipe 将语言模型重新用于嵌入模型:寻找计算最优配方 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we study how to contrastively train text embedding models in a compute-optimal fashion, given a suite of pretrained decoder-only language models. 亮点:在本文中,我们研究如何以计算最优的方式对比训练文本嵌入模型,给定一组预训练的解码器语言模型。 |
Albert Q. Jiang; Alicja Ziarko; Bartosz Piotrowski; Wenda Li; Mateja Jamnik; Piotr Miłoś; 阿尔伯特·Q·姜; 阿利恰·齐亚尔科; 巴托什·皮奥特罗夫斯基; 温达·李; 马特亚·贾姆尼克; 皮奥特·米沃什; |
7 | Multi-language Diversity Benefits Autoformalization 多语言多样性有利于自动形式化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we create mma, a large, flexible, multi-language, and multi-domain dataset of informal-formal pairs, by using a language model to translate in the reverse direction, that is, from formal mathematical statements into corresponding informal ones. 亮点:在这项工作中,我们创建了 mma,这是一个大型、灵活的多语言和多领域的非正式-正式对照数据集,通过使用语言模型进行反向翻译,即从正式的数学陈述翻译成相应的非正式陈述。 |
Albert Q. Jiang; Wenda Li; Mateja Jamnik; 阿尔伯特·Q·姜; 温达·李; 马特贾·贾姆尼克; |
8 | The FineWeb Datasets: Decanting The Web for The Finest Text Data at Scale FineWeb 数据集:大规模提取网络中最优质文本数据 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. 亮点:在这项工作中,我们介绍了 FineWeb,这是一个由 96 个 Common Crawl 快照衍生的 15 万亿标记数据集,其性能优于其他开放预训练数据集LLMs。 |
Guilherme Penedo; Hynek Kydlíček; Loubna Ben allal; Anton Lozhkov; Margaret Mitchell; Colin Raffel; Leandro Von Werra; Thomas Wolf; |
9 | Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers Hydra:通过广义矩阵混合器的双向状态空间模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In particular, we propose a natural bidirectional extension of the Mamba model Hydra, parameterized as a quasiseparable matrix mixer, which demonstrates superior performance over other sequence models including Transformers on non-causal tasks. 亮点:特别是,我们提出了一种自然的双向扩展 Mamba 模型 Hydra,参数化为准可分离矩阵混合器,在非因果任务上表现优于包括 Transformers 在内的其他序列模型。 |
Sukjun Hwang; Aakash Lahoti; Ratish Puduppully; Tri Dao; Albert Gu; |
10 | Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models 从变换器到 SSM:将二次知识提炼到亚二次模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present a method that is able to distill a pre-trained Transformer architecture into alternative architectures such as state space models (SSMs). 亮点:在本研究中,我们提出了一种方法,能够将预训练的 Transformer 架构提炼为其他架构,如状态空间模型(SSMs)。 |
Aviv Bick; Kevin Li; Eric Xing; J. Zico Kolter; Albert Gu; |
11 | Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms 直接对齐算法中奖励模型过度优化的缩放法则 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In particular, we find that DAA methods deteriorate not only across a wide range of KL-budgets, but also often before even a single epoch of the dataset is completed. Through extensive empirical experimentation this work formulates the reward over-optimization or hacking problem for DAAs and explores its consequences across objectives, training regimes, and model scales. 亮点:特别是,我们发现 DAA 方法不仅在广泛的 KL 预算范围内恶化,而且通常在数据集的单个周期完成之前就已经出现恶化。通过广泛的实证实验,本文对 DAA 的奖励过度优化或破解问题进行了公式化,并探讨了其在目标、训练机制和模型规模上的影响。 |
Rafael Rafailov; Yaswanth Chittepu; Ryan Park; Harshit Sushil Sikchi; Joey Hejna; Brad Knox; Chelsea Finn; Scott Niekum; 拉斐尔·拉法伊洛夫; 雅斯万特·奇特普; 瑞安·帕克; 哈尔希特·苏希尔·西克奇; 乔伊·赫伊纳; 布拉德·诺克斯; 切尔西·芬; 斯科特·尼库姆; |
12 | MINT-1T: Scaling Open-Source Multimodal Data By 10x: A Multimodal Dataset with One Trillion Tokens MINT-1T:将开源多模态数据扩展 10 倍:一个包含一万亿个标记的多模态数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In response, we introduce MINT-1T, the most extensive and diverse open-source Multimodal INTerleaved dataset to date. 亮点:作为回应,我们推出了 MINT-1T,这是迄今为止最广泛和多样化的开源多模态交错数据集。 |
Anas Awadalla; Le Xue; Oscar Lo; Manli Shu; Hannah Lee; Etash Guha; Sheng Shen; Mohamed Awadalla; Silvio Savarese; Caiming Xiong; Ran Xu; Yejin Choi; Ludwig Schmidt; |
13 | QUEEN: QUantized Efficient ENcoding for Streaming Free-viewpoint Videos 女王:量化高效编码用于流媒体自由视角视频 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). 亮点:在这项工作中,我们提出了一种用于流式 FVV 的量化和高效编码 (QUEEN) 的新框架,采用 3D 高斯点状法 (3D-GS)。 |
Sharath Girish; Tianye Li; Amrita Mazumdar; Abhinav Shrivastava; david luebke; Shalini De Mello; |
14 | Yo’LLaVA: Your Personalized Language and Vision Assistant Yo’LLaVA:您的个性化语言和视觉助手 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Similarly, when looking at a friend’s image, the interest lies in seeing their activities (e.g., *my friend* is holding a cat), rather than merely observing generic human actions (e.g., *a man* is holding a cat). In this paper, we introduce the novel task of personalizing LMMs, so that they can have conversations about a specific subject. 亮点:同样,当查看朋友的图像时,兴趣在于看到他们的活动(例如,*my friend* 正在抱着一只猫),而不仅仅是观察一般的人类行为(例如,*a man* 正在抱着一只猫)。在本文中,我们引入了个性化 LMMs 的新任务,以便它们能够就特定主题进行对话。 |
Thao Nguyen; Haotian Liu; Yuheng Li; Mu Cai; Utkarsh Ojha; Yong Jae Lee; |
15 | Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length 巨齿鲨:高效的 LLM 预训练和无限上下文长度的推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce MEGALODON, an neural architecture for efficient sequence modeling with unlimited context length. 亮点:我们介绍 MEGALODON,一种用于高效序列建模的神经架构,具有无限的上下文长度。 |
Xuezhe Ma; Xiaomeng Yang; Wenhan Xiong; Beidi Chen; LILI YU; Hao Zhang; Jonathan May; Luke Zettlemoyer; Omer Levy; Chunting Zhou; |
16 | Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback 解读 DPO 和 PPO:理清从偏好反馈中学习的最佳实践 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Despite its widespread use, the way preference-based learning is applied varies wildly, with differing data, learning algorithms, and evaluations used, making disentangling the impact of each aspect difficult. In this work, we identify four core aspects of preference-based learning: preference data, learning algorithm, reward model, and policy training prompts, systematically investigate the impact of these components on downstream model performance, and suggest a recipe for strong learning for preference feedback. 亮点:尽管偏好学习被广泛使用,但其应用方式差异很大,使用的数据、学习算法和评估各不相同,这使得分离每个方面的影响变得困难。在本研究中,我们确定了偏好学习的四个核心方面:偏好数据、学习算法、奖励模型和策略训练提示,系统地调查这些组件对下游模型性能的影响,并提出了一种强学习偏好反馈的方案。 |
Hamish Ivison; Yizhong Wang; Jiacheng Liu; Zeqiu Wu; Valentina Pyatkin; Nathan Lambert; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
17 | ReVideo: Remake A Video with Motion and Content Control ReVideo:使用运动和内容控制重制视频 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion. 亮点:在本文中,我们提出了一种新颖的尝试,旨在重新制作视频(ReVideo),该方法通过对内容和运动的具体规定,允许在特定区域进行精确的视频编辑,从而与现有方法有所不同。 |
Chong Mou; Mingdeng Cao; Xintao Wang; Zhaoyang Zhang; Ying Shan; Jian Zhang; 崇牟; 曹明灯; 王新涛; 张兆阳; 单颖; 张健; |
18 | LLM Circuit Analyses Are Consistent Across Training and Scale LLM 电路分析在训练和规模上是一致的 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we track how model mechanisms, operationalized as circuits, emerge and evolve across 300 billion tokens of training in decoder-only LLMs, in models ranging from 70 million to 2.8 billion parameters. 亮点:在本研究中,我们追踪模型机制(作为电路进行操作)如何在解码器仅 LLMs 的 3000 亿个训练标记中出现和演变,模型参数范围从 7000 万到 28 亿。 |
Curt Tigges; Michael Hanna; Qinan Yu; Stella Biderman; |
19 | Stylus: Automatic Adapter Selection for Diffusion Models Stylus:扩散模型的自动适配器选择 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt’s keywords. 亮点:我们介绍了 Stylus,它能够高效地根据提示的关键词选择和自动组合特定任务的适配器。 |
Michael Luo; Justin Wong; Brandon Trabucco; Yanping Huang; Joseph Gonzalez; zhifeng Chen; Ruslan Salakhutdinov; Ion Stoica; |
20 | VHELM: A Holistic Evaluation of Vision Language Models VHELM:视觉语言模型的整体评估 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs to present the Holistic Evaluation of Vision Language Models (VHELM). 亮点:此外,它们在评估程序和评估范围上存在差异,这使得比较模型变得困难。为了解决这些问题,我们将 HELM 框架扩展到 VLMs,以提出视觉语言模型的整体评估(VHELM)。 |
Tony Lee; Haoqin Tu; Chi Heem Wong; Wenhao Zheng; Yiyang Zhou; Yifan Mai; Josselin Roberts; Michihiro Yasunaga; Huaxiu Yao; Cihang Xie; Percy Liang; |
21 | Neural Model Checking 神经模型检查 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce a machine learning approach to model checking hardware designs. 亮点:我们引入了一种机器学习方法来对硬件设计进行模型检查。 |
Mirco Giacobbe; Daniel Kroening; Abhinandan Pal; Michael Tautschnig; |
22 | Observational Scaling Laws and The Predictability of Langauge Model Performance 观察性缩放定律与语言模型性能的可预测性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose an alternative, $observational$ approach that bypasses model training and instead builds scaling laws from $\sim$80 publically available models. 亮点:我们提出了一种替代的$observational$方法,该方法绕过模型训练,而是从约 80 个公开可用的模型中构建缩放法则。 |
Yangjun Ruan; Chris Maddison; Tatsunori Hashimoto; 阮阳军; 克里斯·马迪森; 橋本達則; |
23 | LocCa: Visual Pretraining with Location-aware Captioners LocCa: 具有位置感知的标题生成器的视觉预训练 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This opens up the largely-unexplored potential of using natural language as a flexible and powerful interface for handling diverse pretraining tasks. In this paper, we demonstrate this with a novel visual pretraining paradigm, LocCa, that incorporates location-aware tasks into captioners to teach models to extract rich information from images. 亮点:这开启了将自然语言作为灵活而强大的接口来处理多样化预训练任务的广泛未开发潜力。在本文中,我们通过一种新颖的视觉预训练范式 LocCa 进行演示,该范式将位置感知任务纳入图像描述生成器,以教导模型从图像中提取丰富的信息。 |
Bo Wan; Michael Tschannen; Yongqin Xian; Filip Pavetic; Ibrahim Alabdulmohsin; Xiao Wang; André Susano Pinto; Andreas Steiner; Lucas Beyer; Xiaohua Zhai; 博万; 迈克尔·查南; 杨琴·谢; 菲利普·帕维蒂克; 易卜拉欣·阿尔阿卜杜勒莫欣; 小王; 安德烈·苏萨诺·平托; 安德烈亚斯·施泰纳; 卢卡斯·贝耶; 肖华·翟; |
24 | Parameter-Inverted Image Pyramid Networks 参数反转图像金字塔网络 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). 亮点:然而,图像金字塔使用相同的大规模模型处理多种分辨率的图像,这需要显著的计算成本。为了解决这个问题,我们提出了一种新颖的网络架构,称为参数反转图像金字塔网络(PIIP)。 |
Xizhou Zhu; Xue Yang; Zhaokai Wang; Hao Li; Wenhan Dou; Junqi Ge; Lewei Lu; Yu Qiao; Jifeng Dai; |
25 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs 寒武纪-1:一个完全开放的、以视觉为中心的多模态探索 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. 亮点:我们介绍 Cambrian-1,一个以视觉为中心设计的多模态LLMs(MLLMs)系列。 |
Shengbang Tong; Ellis Brown; Lvhui Chen; Sanghyun Woo; Adithya Jairam Vedagiri IYER; Sai Charitha Akula; Shusheng Yang; Jihan Yang; Manoj Middepogu; Ziteng Wang; Xichen Pan; Rob Fergus; Yann LeCun; Saining Xie; 盛邦通; 埃利斯·布朗; 吕会辰; 吴相贤; 阿迪提亚·贾伊拉姆·维达吉里·伊耶; 萨伊·查里萨·阿库拉; 舒生·杨; 姜涵; 曹曼乔·米德波古; 王子腾; 潘熙臣; 罗布·费格斯; 扬·勒昆; 谢赛宁; |
26 | Chain-of-Thought Reasoning Without Prompting 无提示的思维链推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Our study takes a novel approach by asking: Can LLMs reason effectively without any prompting? 亮点:我们的研究采用了一种新颖的方法,提出了以下问题:LLMs 能否在没有任何提示的情况下有效推理? |
Xuezhi Wang; Denny Zhou; |
27 | Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-trained Models 隐私后门:通过毒化预训练模型增强成员推断 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we unveil a new vulnerability: the privacy backdoor attack. 亮点:在本文中,我们揭示了一种新的脆弱性:隐私后门攻击。 |
Yuxin Wen; Leo Marchyok; Sanghyun Hong; Jonas Geiping; Tom Goldstein; Nicholas Carlini; |
28 | Humanoid Locomotion As Next Token Prediction 类人运动作为下一个标记预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We cast real-world humanoid control as a next token prediction problem, akin to predicting the next word in language. 亮点:我们将现实世界的人形控制视为下一个标记预测问题,类似于预测语言中的下一个单词。 |
Ilija Radosavovic; Bike Zhang; Baifeng Shi; Jathushan Rajasegaran; Sarthak Kamat; Trevor Darrell; Koushil Sreenath; Jitendra Malik; |
29 | Image2Struct: A Benchmark for Evaluating Vision-Language Models in Extracting Structured Information from Images Image2Struct:评估视觉-语言模型从图像中提取结构化信息的基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce three tasks in the domain of web pages, LaTeX, and music and two new metrics that allow efficient and automatic comparison between a pair of images. 亮点:我们在网页、LaTeX 和音乐领域引入了三个任务,以及两个新的指标,允许在一对图像之间进行高效和自动的比较。 |
Josselin Roberts; Tony Lee; Chi Heem Wong; Michihiro Yasunaga; Yifan Mai; Percy Liang; |
30 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding LongVideoBench:一个用于长上下文交错视频语言理解的基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark that features video-language interleaved inputs up to an hour long. 亮点:尽管取得了进展,但可用于衡量这种发展的公共基准仍然很少。为了弥补这一差距,我们推出了 LongVideoBench,这是一个问答基准,具有长达一小时的视频语言交错输入。 |
Haoning Wu; DONGXU LI; Bei Chen; Junnan Li; 吴浩宁; 李东旭; 陈贝; 李俊南; |
31 | MAmmoTH2: Scaling Instructions from The Web MAmmoTH2: 来自网络的缩放说明 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose a paradigm to efficiently harvest 10 million naturally existing instruction data from the pre-training web corpus to enhance LLM reasoning. 亮点:我们提出了一种范式,以高效地从预训练网络语料库中收集 1000 万条自然存在的指令数据,以增强LLM推理。 |
Xiang Yue; Tianyu Zheng; Ge Zhang; Wenhu Chen; 向月;郑天宇;张戈;陈文虎; |
32 | Learning-to-Cache: Accelerating Diffusion Transformer Via Layer Caching 学习缓存:通过层缓存加速扩散变换器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. |
Xinyin Ma; Gongfan Fang; Michael Bi Mi; Xinchao Wang; |
33 | The Art of Saying No: Contextual Noncompliance in Language Models 拒绝的艺术:语言模型中的情境不遵从 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should *not* comply with user requests. 亮点:我们提出了一种全面的上下文不合规分类法,描述了模型在何时以及如何*不*遵循用户请求。 |
Faeze Brahman; Sachin Kumar; Vidhisha Balachandran; Pradeep Dasigi; Valentina Pyatkin; Abhilasha Ravichander; Sarah Wiegreffe; Nouha Dziri; Khyathi Chandu; Jack Hessel; Yulia Tsvetkov; Noah Smith; Yejin Choi; Hannaneh Hajishirzi; |
34 | Efficient LLM Jailbreak Via Adaptive Dense-to-sparse Constrained Optimization 高效的 LLM 越狱通过自适应稠密到稀疏约束优化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces a novel token-level attack method, Adaptive Dense-to-Sparse Constrained Optimization (ADC), which has been shown to successfully jailbreak multiple open-source LLMs. 亮点:本文介绍了一种新颖的令牌级攻击方法,适应性稠密到稀疏约束优化(ADC),已被证明能够成功越狱多个开源LLMs。 |
Kai Hu; Weichen Yu; Tianjun Yao; Xiang Li; Wenhe Liu; Lijun Yu; Yining Li; Kai Chen; Zhiqiang Shen; Matt Fredrikson; 凯·胡;韦晨·余;天俊·姚;向·李;文和·刘;丽君·余;怡宁·李;凯·陈;志强·沈;马特·弗雷德里克森; |
35 | JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models JailbreakBench:一个用于评估大型语言模型越狱鲁棒性的开放基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: And third, numerous works are not reproducible, as they withhold adversarial prompts, involve closed-source code, or rely on evolving proprietary APIs. To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as *jailbreak artifacts*; (2) a jailbreaking dataset comprising 100 behaviors—both original and sourced from prior work—which align with OpenAI’s usage policies; (3) a standardized evaluation framework at https://github.com/JailbreakBench/jailbreakbench that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard at https://jailbreakbench.github.io/ that tracks the performance of attacks and defenses for various LLMs. 亮点:第三,许多作品无法重复,因为它们隐瞒了对抗性提示,涉及闭源代码,或依赖于不断发展的专有 API。为了解决这些挑战,我们引入了 JailbreakBench,这是一个开源基准,包含以下组件:(1)一个不断发展的最先进的对抗性提示库,我们称之为*监狱突破文物*;(2)一个包含 100 种行为的监狱突破数据集——既有原创的,也有来自先前工作的——与 OpenAI 的使用政策一致;(3)一个标准化的评估框架,网址为 https://github.com/JailbreakBench/jailbreakbench,包含明确定义的威胁模型、系统提示、聊天模板和评分函数;(4)一个排行榜,网址为 https://jailbreakbench.github.io/,跟踪各种LLMs的攻击和防御性能。 |
Patrick Chao; Edoardo Debenedetti; Alexander Robey; Maksym Andriushchenko; Francesco Croce; Vikash Sehwag; Edgar Dobriban; Nicolas Flammarion; George J. Pappas; Florian Tramer; Hamed Hassani; Eric Wong; 帕特里克·赵;爱德华多·德贝内代蒂;亚历山大·罗比;马克西姆·安德留申科;弗朗切斯科·克罗切;维卡什·塞瓦格;埃德加·多布里班;尼古拉斯·弗拉马里翁;乔治·J·帕帕斯;弗洛里安·特拉默;哈梅德·哈萨尼;埃里克·黄; |
36 | Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising Remix-DiT:混合扩散变换器用于多专家去噪 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we investigate an alternative approach involving multiple experts for denoising, and introduce Remix-DiT, a novel method designed to enhance output quality at a low cost. 亮点:在本研究中,我们探讨了一种涉及多个专家的去噪替代方法,并引入了 Remix-DiT,这是一种旨在以低成本提高输出质量的新方法。 |
Gongfan Fang; Xinyin Ma; Xinchao Wang; 方功凡; 马新银; 王新超; |
37 | MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models MaskLLM:可学习的半结构稀疏性用于大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or “N:M”) Sparsity in LLMs, aimed at reducing computational overhead during inference. 亮点:本工作介绍了 MaskLLM,一种可学习的剪枝方法,建立了LLMs中的半结构化(或“N:M”)稀疏性,旨在减少推理过程中的计算开销。 |
Gongfan Fang; Hongxu Yin; Saurav Muralidharan; Greg Heinrich; Jeff Pool; Jan Kautz; Pavlo Molchanov; Xinchao Wang; |
38 | Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution 随机摊销:加速特征和数据归因的统一方法 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: These methods require efficient approximations, and although learning a network that directly predicts the desired output is a promising solution, training such models with exact labels is often infeasible. We therefore explore training amortized models with noisy labels, and we find that this is inexpensive and surprisingly effective. 亮点:这些方法需要高效的近似,尽管学习一个直接预测期望输出的网络是一个有前景的解决方案,但用精确标签训练这样的模型通常是不可行的。因此,我们探索了使用噪声标签训练摊销模型,并发现这既便宜又出乎意料地有效。 |
Ian Covert; Chanwoo Kim; Su-In Lee; James Zou; Tatsunori Hashimoto; 伊恩·科弗特; 金灿宇; 李秀仁; 詹姆斯· Zou; 橋本達則; |
39 | GenAI Arena: An Open Evaluation Platform for Generative Models GenAI Arena:一个用于生成模型的开放评估平台 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This paper proposes an open platform \arena to evaluate different image and video generative models, where users can actively participate in evaluating these models. 亮点:本文提出了一个开放平台 \arena 来评估不同的图像和视频生成模型,用户可以积极参与对这些模型的评估。 |
Dongfu Jiang; Max KU; Tianle Li; Yuansheng Ni; Shizhuo Sun; Rongqi Fan; Wenhu Chen; 董福江;马克斯·库;田乐·李;袁晟·倪;时卓·孙;荣齐·范;文华·陈; |
40 | TaskBench: Benchmarking Large Language Models for Task Automation TaskBench:用于任务自动化的大型语言模型基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, there is a lack of systematic and standardized benchmarks to promote the development of LLMs in task automation. To address this, we introduce TaskBench to evaluate the capability of LLMs in task automation. 亮点:然而,缺乏系统化和标准化的基准来促进LLMs在任务自动化中的发展。为了解决这个问题,我们引入了 TaskBench 来评估LLMs在任务自动化中的能力。 |
Yongliang Shen; Kaitao Song; Xu Tan; Wenqi Zhang; Kan Ren; Siyu Yuan; Weiming Lu; Dongsheng Li; Yueting Zhuang; 盛永亮; 宋凯涛; 谭旭; 张文琦; 任侃; 袁思宇; 陆伟明; 李东生; 庄跃廷; |
41 | SafeSora: Towards Safety Alignment of Text2Video Generation Via A Human Preference Dataset SafeSora:通过人类偏好数据集实现文本到视频生成的安全对齐 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To mitigate the risk of harmful outputs from large vision models (LVMs), we introduce the *SafeSora* dataset to promote research on aligning text-to-video generation with human values. 亮点:为了降低大型视觉模型(LVMs)产生有害输出的风险,我们引入了*SafeSora*数据集,以促进文本到视频生成与人类价值观对齐的研究。 |
Josef Dai; Tianle Chen; Xuyao Wang; Ziran Yang; Taiye Chen; Jiaming Ji; Yaodong Yang; |
42 | What Matters When Building Vision-language Models? 构建视觉-语言模型时什么是重要的? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. 亮点:尽管关于这个主题的文献丰富,但我们观察到,关于 VLM 设计的关键决策往往没有得到合理的支持。我们认为,这些缺乏支持的决策阻碍了该领域的进展,因为这使得识别哪些选择能够提高模型性能变得困难。 |
Hugo Laurençon; Leo Tronchon; Matthieu Cord; Victor Sanh; |
43 | Rethinking Score Distillation As A Bridge Between Image Distributions 重新思考分数蒸馏作为图像分布之间的桥梁 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Unfortunately, SDS has a number of characteristic artifacts that limit its utility in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from some current source distribution to a target distribution. 亮点:不幸的是,SDS 具有许多特征性伪影,这限制了其在通用应用中的实用性。在本文中,我们通过将 SDS 及其变体视为从某个当前源分布到目标分布的最优成本传输路径的解决方案,推动了对 SDS 及其变体行为的理解。 |
David McAllister; Songwei Ge; Jia-Bin Huang; David Jacobs; Alexei Efros; Aleksander Holynski; Angjoo Kanazawa; 大卫·麦卡利斯特; 宋伟; 黄家斌; 大卫·雅各布斯; 阿列克谢·埃夫罗斯; 亚历山大·霍林斯基; 安久·金泽瓦 |
44 | TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models TurboHopp:使用一致性模型加速分子骨架跳跃 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, the practical application of 3D-SBDD generative models is hampered by their slow processing speeds. To address this bottleneck, we introduce TurboHopp, an accelerated pocket-conditioned 3D scaffold hopping model that merges the strategic effectiveness of traditional scaffold hopping with rapid generation capabilities of consistency models. 亮点:然而,3D-SBDD 生成模型的实际应用受到其处理速度缓慢的限制。为了解决这一瓶颈,我们引入了 TurboHopp,这是一种加速的口袋条件 3D 支架跳跃模型,它将传统支架跳跃的战略有效性与一致性模型的快速生成能力相结合。 |
Kiwoong Yoo; Owen Oertell; Junhyun Lee; Sanghoon Lee; Jaewoo Kang; |
45 | Graph-based Uncertainty Metrics for Long-form Language Model Generations 基于图的长文本语言模型生成的不确定性度量 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty — which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. 亮点:最近在大型语言模型(LLMs)方面的进展显著提高了文本生成能力,但这些系统仍然被认为会产生幻觉,并且对长文本LLM生成的细粒度不确定性估计仍然具有挑战性。在这项工作中,我们提出了图不确定性——它将LLM生成与其中的主张之间的关系表示为一个二分图,并利用一系列图中心性度量来估计主张级别的不确定性。 |
Mingjian Jiang; Yangjun Ruan; Prasanna Sattigeri; Salim Roukos; Tatsunori Hashimoto; |
46 | Fractal Patterns May Illuminate The Success of Next-Token Prediction 分形模式可能揭示下一个令牌预测的成功 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We study the fractal structure of language, aiming to provide a precise formalism for quantifying properties that may have been previously suspected but not formally shown. 亮点:我们研究语言的分形结构,旨在提供一种精确的形式化方法,以量化可能之前被怀疑但未正式证明的特性。 |
Ibrahim Alabdulmohsin; Vinh Tran; Mostafa Dehghani; |
47 | Can LLMs Implicitly Learn Numeric Parameter Constraints in Data Science APIs? Can LLMs 隐式学习数据科学 API 中的数值参数约束吗? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, this assumption has not been rigorously studied in the literature. In this paper, we empirically investigate the proficiency of LLMs to handle these implicit numerical constraints when generating DS programs. 亮点:然而,这一假设在文献中尚未得到严格研究。本文通过实证研究LLMs在生成 DS 程序时处理这些隐含数值约束的能力。 |
Yinlin Deng; Chunqiu Steven Xia; Zhezhen Cao; Meiziniu Li; LINGMING ZHANG; 邓银林; 夏春秋; 曹哲珍; 李美子牛; 张灵明; |
48 | Iterative Reasoning Preference Optimization 迭代推理偏好优化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work we develop an iterative approach that optimizes the preference between competing generated Chain-of-Thought (CoT) candidates by optimizing for winning vs. losing reasoning steps that lead to the correct answer. 亮点:在这项工作中,我们开发了一种迭代方法,通过优化导致正确答案的胜利与失败推理步骤,来优化竞争生成的思维链(CoT)候选者之间的偏好。 |
Richard Yuanzhe Pang; Weizhe Yuan; He He; Kyunghyun Cho; Sainbayar Sukhbaatar; Jason Weston; 理查德·袁哲·庞;袁维哲;何赫;崔京贤;赛音巴雅尔·苏赫巴特尔;杰森·韦斯顿; |
49 | Geometric-Averaged Preference Optimization for Soft Preference Labels 几何平均偏好优化用于软偏好标签 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. 亮点:在本研究中,我们引入了分布式软偏好标签,并通过在损失函数中使用LLM输出似然的加权几何平均来改进直接偏好优化(DPO)。 |
Hiroki Furuta; Kuang-Huei Lee; Shixiang (Shane) Gu; Yutaka Matsuo; Aleksandra Faust; Heiga Zen; Izzeddin Gur; |
50 | Fully Transparent Self-Alignment for Code Generation 完全透明的自我对齐用于代码生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. 亮点:我们提出了 SelfCodeAlign,这是第一个完全透明和开放的自对齐代码管道LLMs,无需大量人工注释或蒸馏。 |
Yuxiang Wei; Federico Cassano; Jiawei Liu; Yifeng Ding; Naman Jain; Zachary Mueller; Harm de Vries; Leandro Von Werra; Arjun Guha; LINGMING ZHANG; |
51 | Are More LLM Calls All You Need? Towards The Scaling Properties of Compound AI Systems 更多的 LLM 通话就是你所需要的吗?关于复合人工智能系统的扩展特性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we initiate the study of scaling properties of compound inference systems. 亮点:在本文中,我们开始研究复合推理系统的缩放特性。 |
Lingjiao Chen; Jared Quincy Davis; Boris Hanin; Peter Bailis; Ion Stoica; Matei A Zaharia; James Zou; |
52 | Large Scale Transfer Learning for Tabular Data Via Language Modeling 通过语言建模进行表格数据的大规模迁移学习 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, while recent foundation models have reduced the need for developing task-specific datasets and predictors in domains such as language modeling and computer vision, this transfer learning paradigm has not had similar impact in the tabular domain. In this work, we seek to narrow this gap and present TabuLa-8B, a language model for tabular prediction. 亮点:然而,尽管最近的基础模型减少了在语言建模和计算机视觉等领域开发特定任务数据集和预测器的需求,但这种迁移学习范式在表格领域并没有产生类似的影响。在这项工作中,我们旨在缩小这一差距,并提出 TabuLa-8B,一个用于表格预测的语言模型。 |
Josh Gardner; Juan Perdomo; Ludwig Schmidt; 乔什·加德纳;胡安·佩尔多莫;路德维希·施密特; |
53 | DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents DiscoveryWorld:一个用于开发和评估自动化科学发现代理的虚拟环境 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work we introduce DiscoveryWorld, a virtual environment that enables benchmarking an agent’s ability to perform complete cycles of novel scientific discovery in an inexpensive, simulated, multi-modal, long-horizon, and fictional setting. 亮点:在本研究中,我们介绍了 DiscoveryWorld,这是一个虚拟环境,能够在一个廉价的、模拟的、多模态的、长时间跨度的虚构环境中,对智能体进行完整的科学发现周期的能力进行基准测试。 |
Peter A Jansen; Marc-Alexandre Côté; Tushar Khot; Erin Bransom; Bhavana Dalvi Mishra; Bodhisattwa Prasad Majumder; Oyvind Tafjord; Peter Clark; 彼得·A·扬森;马克-亚历山大·科特;图沙尔·霍特;埃琳·布兰森;巴瓦娜·达尔维·米什拉;博迪萨特瓦·普拉萨德·马朱姆德;奥伊文德·塔福德;彼得·克拉克; |
54 | Learning to Reason Via Program Generation, Emulation, and Search 通过程序生成、仿真和搜索学习推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To adapt the COGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. 亮点:为了将 COGEX 模型适应新任务,我们提出了一种执行程序搜索的方法,以找到一个单一程序,其伪执行在应用于给定数据集的所有实例时能够产生最佳性能。 |
Nathaniel Weir; Muhammad Khalifa; Linlu Qiu; Orion Weller; Peter Clark; 纳撒尼尔·韦尔; 穆罕默德·哈利法; 邱林璐; 奥利安·韦勒; 彼得·克拉克; |
55 | QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs QuaRot:在旋转LLMs中进行无异常值的 4 位推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. 亮点:我们介绍了 QuaRot,一种基于旋转的新量化方案,能够以 4 位对LLMs进行端到端量化,包括所有权重、激活和 KV 缓存。 |
Saleh Ashkboos; Amirkeivan Mohtashami; Maximilian Croci; Bo Li; Pashmina Cameron; Martin Jaggi; Dan Alistarh; Torsten Hoefler; James Hensman; |
56 | Depth Anything V2 深度任何 V2 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. 亮点:我们旨在揭示关键发现,以铺平建立强大单目深度估计模型的道路,而不追求花哨的技术。 |
Lihe Yang; Bingyi Kang; Zilong Huang; Zhen Zhao; Xiaogang Xu; Jiashi Feng; Hengshuang Zhao; 李和 杨; 冰怡 康; 子龙 黄; 振 赵; 小刚 许; 佳士 冯; 恒霜 赵; |
57 | I Don’t Know: Explicit Modeling of Uncertainty with An [IDK] Token 我不知道:使用 [IDK] 令牌显式建模不确定性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose a novel calibration method that can be used to combat hallucinations. 亮点:在这项工作中,我们提出了一种新颖的校准方法,可以用来对抗幻觉。 |
Roi Cohen; Konstantin Dobler; Eden Biran; Gerard de Melo; |
58 | DFBA: Data Free Backdoor Attacks DFBA: 数据自由后门攻击 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. 重点: 在本研究中,我们提出 DFBA,一种新的无需重训练和无数据的后门攻击,且不改变模型架构。 |
Bochuan Cao; Jinyuan Jia; Chuxuan Hu; Wenbo Guo; Zhen Xiang; Jinghui Chen; Bo Li; Dawn Song; 曹博川; 贾金源; 胡楚轩; 郭文博; 向真; 陈景辉; 李博; 宋晨 |
59 | Tree of Attacks: Jailbreaking Black-Box LLMs Automatically 攻击树:黑箱越狱 LLMs 自动化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we present *Tree of Attacks with Pruning* (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. 亮点:在这项工作中,我们提出了*Tree of Attacks with Pruning* (TAP),一种仅需对目标LLM进行黑箱访问的自动化越狱生成方法。 |
Anay Mehrotra; Manolis Zampetakis; Paul Kassianik; Blaine Nelson; Hyrum Anderson; Yaron Singer; Amin Karbasi; |
60 | VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding VRSBench:一个用于遥感图像理解的多功能视觉-语言基准数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. 亮点:我们引入了一个新的基准,旨在推动通用大规模视觉-语言模型在遥感图像中的发展。 |
Xiang Li; Jian Ding; Mohamed Elhoseiny; |
61 | Quantifying The Bitter Lesson: How Safety Benchmarks Measure Capabilities Instead of Safety 量化苦涩的教训:安全基准如何衡量能力而非安全性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In the spirit of the Bitter Lesson, we ask whether such effort is wasteful. To quantify this question, we leverage spectral analysis to measure an underlying capabilities component, the direction in benchmark-performance-space which explains most variation in model performance. 重点:在苦涩教训的精神下,我们询问这种努力是否是浪费的。为了量化这个问题,我们利用谱分析来测量一个潜在的能力组件,即在基准性能空间中解释模型性能大部分变异的方向。 |
Richard Ren; Steven Basart; Adam Khoja; Alexander Pan; Alice Gatti; Long Phan; Xuwang Yin; Mantas Mazeika; Gabe Mukobi; Ryan Kim; Stephen Fitz; Dan Hendrycks; 理查德·任; 史蒂文·巴萨特; 亚当·科贾; 亚历山大·潘; 爱丽丝·加蒂; 龙·潘; 尤旺·尹; 曼塔斯·马泽卡; 加布·穆科比; 瑞安·金; 斯蒂芬·菲茨; 丹·亨德里克斯; |
62 | Smoothie: Label Free Language Model Routing Smoothie: 标签无关语言模型路由 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. 亮点:我们提出了 Smoothie,一种受弱监督启发的路由方法,要求不需要标记数据。 |
Neel Guha; Mayee Chen; Trevor Chow; Ishan Khare; Christopher Ré; |
63 | Even Sparser Graph Transformers 甚至更稀疏的图变换器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We establish theoretical conditions when a narrow network’s attention scores can match those of a wide network, and show that Spexphormer achieves good performance with drastically reduced memory requirements on various graph datasets. 亮点:我们建立了狭窄网络的注意力得分可以与宽网络匹配的理论条件,并展示了 Spexphormer 在各种图数据集上以大幅减少的内存需求实现了良好的性能。 |
Hamed Shirzad; Honghao Lin; Balaji Venkatachalam; Ameya Velingker; David Woodruff; Danica J. Sutherland; 哈梅德·希尔扎德;洪浩·林;巴拉吉·文卡塔查拉姆;阿梅亚·维林克;大卫·伍德拉夫;达尼卡·J·萨瑟兰; |
64 | Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks 多模态基础模型是否理解企业工作流程?商业流程管理任务的基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. 亮点:我们的贡献包括:(1)一个包含 2928 个文档化工作流演示的数据集;(2)6 个来自真实应用的新颖 BPM 任务,涵盖工作流文档、知识转移到流程改进;以及(3)一个自动化评估工具。 |
Michael Wornow; Avanika Narayan; Ben Viggiano; Ishan Khare; Tathagat Verma; Tibor Thompson; Miguel Hernandez; Sudharsan Sundar; Chloe Trujillo; Krrish Chawla; Rongfei Lu; Justin Shen; Divya Nagaraj; Joshua Martinez; Vardhan Agrawal; Althea Hudson; Nigam Shah; Christopher Ré; 迈克尔·沃诺; 阿瓦尼卡·纳拉扬; 本·维吉亚诺; 伊尚·卡雷; 塔塔戈特·维尔玛; 提博尔·汤普森; 米格尔·埃尔南德斯; 苏达尔桑·孙达; 克洛伊·特鲁希略; 克里什·查瓦拉; 龍飛·盧; 贾斯廷·申; 迪维亚·纳加拉杰; 约书亚·马丁内斯; 瓦尔丹·阿格拉瓦尔; 阿尔西娅·哈德森; 尼甘·沙; 克里斯托弗·瑞; |
65 | SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering SWE-agent:代理-计算机接口实现自动化软件工程 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We investigate how the role of interface design affects the performance of language model agents. As a result of this exploration, we introduce SWE-agent: a system that facilitates language model agents to autonomously use computers to solve software engineering tasks. 亮点:我们研究了界面设计的角色如何影响语言模型代理的性能。通过这项探索,我们引入了 SWE-agent:一个使语言模型代理能够自主使用计算机解决软件工程任务的系统。 |
John Yang; Carlos Jimenez; Alexander Wettig; Kilian Lieret; Shunyu Yao; Karthik Narasimhan; Ofir Press; 约翰·杨;卡洛斯·希门尼斯;亚历山大·韦蒂格;基利安·利雷特;姚顺宇;卡尔蒂克·纳拉西曼;奥菲尔·普雷斯; |
66 | What Can Foundation Models’ Embeddings Do? 基础模型的嵌入可以做什么? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models’ embeddings with unified image and dataset-level understanding spanning modality and granularity. 亮点:为了进一步释放基础模型的潜力,我们提出了 FIND,这是一个通用接口,用于将基础模型的嵌入与跨模态和粒度的统一图像和数据集级理解对齐。 |
Xueyan Zou; Linjie Li; Jianfeng Wang; Jianwei Yang; Mingyu Ding; Junyi Wei; Zhengyuan Yang; Feng Li; Hao Zhang; Shilong Liu; Arul Aravinthan; Yong Jae Lee; Lijuan Wang; |
67 | 3DCoMPaT200: Language Grounded Large-Scale 3D Vision Dataset for Compositional Recognition 3DCoMPaT200:基于语言的用于组合识别的大规模 3D 视觉数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To foster richer and fine-grained part-level 3D understanding, we introduce 3DCoMPaT200, a large-scale dataset tailored for compositional understanding of object parts and materials, with 200 object categories with approximately 5 times larger object vocabulary compared to 3DCoMPaT and almost 4 times larger part categories. 亮点:为了促进更丰富和细致的部分级 3D 理解,我们引入了 3DCoMPaT200,这是一个大型数据集,旨在对物体部件和材料进行组合理解,包含 200 个物体类别,与 3DCoMPaT 相比,物体词汇量大约增加了 5 倍,部件类别几乎增加了 4 倍。 |
Mahmoud Ahmed; Xiang Li; Arpit Prajapati; Mohamed Elhoseiny; 马哈茂德·艾哈迈德; 向丽; 阿尔皮特·普拉贾帕提; 穆罕默德·埃尔霍塞尼; |
68 | You Only Cache Once: Decoder-Decoder Architectures for Language Models 你只缓存一次:用于语言模型的解码器-解码器架构 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. 亮点:我们介绍了一种解码器-解码器架构 YOCO,用于大型语言模型,它仅缓存一次键值对。 |
Yutao Sun; Li Dong; Yi Zhu; Shaohan Huang; Wenhui Wang; Shuming Ma; Quanlu Zhang; Jianyong Wang; Furu Wei; 孙玉涛; 董丽; 朱怡; 黄绍汉; 王文慧; 马树铭; 张全璐; 王建勇; 魏福如; |
69 | Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning 通过端到端稀疏字典学习识别功能重要特征 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose end-to-end (e2e) sparse dictionary learning, a method for training SAEs that ensures the features learned are functionally important by minimizing the KL divergence between the output distributions of the original model and the model with SAE activations inserted. 亮点:我们提出了一种端到端(e2e)稀疏字典学习方法,用于训练自编码器(SAE),通过最小化原始模型的输出分布与插入 SAE 激活的模型的输出分布之间的 KL 散度,确保所学习的特征在功能上是重要的。 |
Dan Braun; Jordan Taylor; Nicholas Goldowsky-Dill; Lee Sharkey; 丹·布劳恩;乔丹·泰勒;尼古拉斯·戈尔多斯基-迪尔;李·沙基; |
70 | Benchmarking LLMs Via Uncertainty Quantification 通过不确定性量化进行基准测试 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, current evaluation platforms, such as the widely recognized HuggingFace open LLM leaderboard, neglect a crucial aspect — uncertainty, which is vital for thoroughly assessing LLMs. To bridge this gap, we introduce a new benchmarking approach for LLMs that integrates uncertainty quantification. 亮点:然而,目前的评估平台,如广受认可的 HuggingFace 开放LLM排行榜,忽视了一个关键方面——不确定性,这对于全面评估LLMs至关重要。为了弥补这一空白,我们提出了一种新的LLMs基准测试方法,整合了不确定性量化。 |
Fanghua Ye; Mingming Yang; Jianhui Pang; Longyue Wang; Derek Wong; Emine Yilmaz; Shuming Shi; Zhaopeng Tu; 方华 叶; 明明 杨; 建辉 庞; 龙跃 王; 德里克 黄; 埃米内 依尔马兹; 书铭 施; 兆鹏 涂; |
71 | Learning Segmentation from Point Trajectories 从点轨迹中学习分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present a way to train a segmentation network using long-term point trajectories as a supervisory signal to complement optical flow. 亮点:在本研究中,我们提出了一种利用长期点轨迹作为监督信号来训练分割网络的方法,以补充光流。 |
Laurynas Karazija; Iro Laina; Christian Rupprecht; Andrea Vedaldi; |
72 | Sparse Maximal Update Parameterization: A Holistic Approach to Sparse Training Dynamics 稀疏最大更新参数化:稀疏训练动态的整体方法 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Without stable dynamics and effective training recipes, it is costly to test sparsity at scale, which is key to surpassing dense networks and making the business case for sparsity acceleration in hardware. A holistic approach is needed to tackle these challenges and we propose S\textmuPar as one such approach. S\textmuPar ensures activations, gradients, and weight updates all scale independently of sparsity level. 亮点:在没有稳定动态和有效训练方案的情况下,以大规模测试稀疏性是昂贵的,这对于超越密集网络和为硬件中的稀疏加速提供商业案例至关重要。需要一种整体方法来应对这些挑战,我们提出了 S\textmuPar 作为一种解决方案。S\textmuPar 确保激活、梯度和权重更新都独立于稀疏性水平进行缩放。 |
Nolan Dey; Shane Bergsma; Joel Hestness; 诺兰·德伊;肖恩·伯格斯马;乔尔·赫斯特尼斯; |
73 | DataComp-LM: In Search of The Next Generation of Training Sets for Language Models DataComp-LM:寻找下一代语言模型训练集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce DataComp for Language Models, a testbed for controlled dataset experiments with the goal of improving language models. 亮点:我们推出了用于语言模型的 DataComp,这是一个用于受控数据集实验的测试平台,旨在改善语言模型。 |
Amro Abbas; Alon Albalak; Kushal Arora; Hritik Bansal; Yonatan Bitton; Yair Carmon; Khyathi Chandu; Mayee Chen; Giannis Daras; Achal Dave; Alex Dimakis; Alaaeldin El-Nouby; Fartash Faghri; Alex Fang; Samir Yitzhak Gadre; Josh Gardner; Saurabh Garg; Dhruba Ghosh; Aaron Gokaslan; Dirk Groeneveld; Etash Guha; Suchin Gururangan; Reinhard Heckel; Cheng-Yu Hsieh; Gabriel Ilharco; Maor Ivgi; Jenia Jitsev; Matt Jordan; Sham Kakade; Sedrick Scott Keh; Maciej Kilian; Pang Wei Koh; Thomas Kollar; Jeffrey Li; Kyle Lo; Kalyani Marathe; Jean Mercat; Niklas Muennighoff; Marianna Nezhurina; Thao Nguyen; Sewoong Oh; Hadi Pouransari; Sarah Pratt; Sunny Sanyal; Ludwig Schmidt; Vaishaal Shankar; Rulin Shao; Georgios Smyrnis; Luca Soldaini; Shuran Song; Alexander Toshev; Igor Vasiljevic; Stephanie Wang; Mitchell Wortsman; Rui Xin; Luke Zettlemoyer; Hanlin Zhang; Jieyu Zhang; |
74 | A Careful Examination of Large Language Model Performance on Grade School Arithmetic 对大语言模型在小学算术方面表现的仔细审查 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability.To investigate this claim rigorously, we commission Grade School Math 1000 (GSM1k). 亮点:然而,越来越多的关注表明,这种表现实际上反映了数据集污染,其中与基准问题高度相似的数据泄漏到训练数据中,而不是体现真正的推理能力。为了严格调查这一说法,我们委托了学前数学 1000 (GSM1k)。 |
Hugh Zhang; Jeff Da; Dean Lee; Vaughn Robinson; Catherine Wu; William Song; Tiffany Zhao; Pranav Raja; Charlotte Zhuang; Dylan Slack; Qin Lyu; Sean Hendryx; Russell Kaplan; Michele Lunati; Summer Yue; |
75 | Neural Gaffer: Relighting Any Object Via Diffusion 神经光照器:通过扩散重新照明任何物体 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. 亮点:在本研究中,我们提出了一种新颖的端到端二维重光照扩散模型,称为 Neural Gaffer,该模型仅需一张任意物体的单幅图像即可合成在任何新环境光照条件下的准确、高质量的重光照图像,只需通过将图像生成器与目标环境图进行条件化,而无需显式的场景分解。 |
Haian Jin; Yuan Li; Fujun Luan; Yuanbo Xiangli; Sai Bi; Kai Zhang; Zexiang Xu; Jin Sun; Noah Snavely; 海安·金;袁·李;傅军·栾;袁博·向丽;赛·比;凯·张;泽翔·徐;金·孙;诺亚·斯纳夫利; |
76 | Achieving Efficient Alignment Through Learned Correction 通过学习修正实现高效对齐 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce *Aligner*, a novel and simple alignment paradigm that learns the correctional residuals between preferred and dispreferred answers using a small model. 亮点:在本文中,我们介绍了*Aligner*,一种新颖且简单的对齐范式,它使用一个小模型学习优选答案和非优选答案之间的校正残差。 |
Jiaming Ji; Boyuan Chen; Hantao Lou; Donghai Hong; Borong Zhang; Xuehai Pan; Tianyi (Alex) Qiu; Juntao Dai; Yaodong Yang; 贾明吉; 博源陈; 汉涛娄; 东海洪; 博荣张; 雪海潘; 天怡(亚历克斯)邱; 俊涛戴; 耀东杨; |
77 | WizardArena: Post-training Large Language Models Via Simulated Offline Chatbot Arena WizardArena:通过模拟离线聊天机器人竞技场进行后训练的大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. 亮点:为减轻与后续训练相关的人工和时间成本,本文引入了一个名为 WizardArena 的模拟聊天机器人竞技场,该竞技场完全基于开源LLMs并由其驱动。 |
Haipeng Luo; Qingfeng Sun; Can Xu; Pu Zhao; Qingwei Lin; Jian-Guang Lou; Shifeng Chen; Yansong Tang; Weizhu Chen; |
78 | Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 易到难的泛化:超越人类监督的可扩展对齐 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. 亮点:我们的关键见解是,针对较简单任务进行监督训练的评估者(奖励模型)可以有效地用于对较困难任务的候选解决方案进行评分,从而促进不同层次任务的从易到难的泛化。 |
Zhiqing Sun; Longhui Yu; Yikang Shen; Weiyang Liu; Yiming Yang; Sean Welleck; Chuang Gan; 孙志青; 余龙辉; 沈逸康; 刘伟扬; 杨逸铭; 肖恩·韦莱克; 甘创 |
79 | Interpreting The Weight Space of Customized Diffusion Models 解读定制扩散模型的权重空间 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We investigate the space of weights spanned by a large collection of customized diffusion models. 亮点:我们研究由大量定制扩散模型所构成的权重空间。 |
Amil Dravid; Yossi Gandelsman; Kuan-Chieh Wang; Rameen Abdal; Gordon Wetzstein; Alexei Efros; Kfir Aberman; |
80 | Make Your LLM Fully Utilize The Context 充分利用您的 LLM 上下文 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents **information-intensive (IN2) training**, a purely data-driven solution to overcome lost-in-the-middle. 亮点:我们假设这源于在长上下文训练中缺乏足够的显式监督,这未能强调长上下文中的任何位置都可能包含关键信息。基于这一直觉,我们的研究提出了**信息密集型(IN2)训练**,这是一种纯数据驱动的解决方案,以克服中间丢失的问题。 |
Shengnan An; Zexiong Ma; Zeqi Lin; Nanning Zheng; Jian-Guang Lou; Weizhu Chen; 盛南 安; 泽雄 马; 泽奇 林; 南宁 郑; 建光 楼; 伟柱 陈; |
81 | Multistep Distillation of Diffusion Models Via Moment Matching 通过矩匹配对扩散模型进行多步蒸馏 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present a new method for making diffusion models faster to sample. 亮点:我们提出了一种新的方法,使扩散模型的采样速度更快。 |
Tim Salimans; Emiel Hoogeboom; Thomas Mensink; Jonathan Heek; |
82 | Query-Based Adversarial Prompt Generation 基于查询的对抗性提示生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We improve on prior work with a query-based attack that leverages API access to a remote language model to construct adversarial examples that cause the model to emit harmful strings with (much) higher probability than with transfer-only attacks. 亮点:我们通过一种基于查询的攻击改进了先前的工作,该攻击利用对远程语言模型的 API 访问来构造对抗样本,使得模型以(更高)概率输出有害字符串,远高于仅进行迁移攻击的情况。 |
Jonathan Hayase; Ema Borevković; Nicholas Carlini; Florian Tramer; Milad Nasr; 乔纳森·哈亚斯; 埃玛·博雷夫科维奇; 尼古拉斯·卡尔尼; 弗洛里安·特拉默; 米拉德·纳斯尔; |
83 | Evaluating Copyright Takedown Methods for Language Models 评估语言模型的版权删除方法 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We propose CoTaEval, an evaluation framework to assess the effectiveness of copyright takedown methods,the impact on the model’s ability to retain uncopyrightable factual knowledge from the copyrighted content, and how well the model maintains its general utility and efficiency. 亮点:本文介绍了对版权撤销对语言模型可行性和副作用的首次评估。我们提出了 CoTaEval,一个评估框架,用于评估版权撤销方法的有效性、对模型保留来自版权内容的不可版权事实知识能力的影响,以及模型在保持其通用性和效率方面的表现。 |
Boyi Wei; Weijia Shi; Yangsibo Huang; Noah Smith; Chiyuan Zhang; Luke Zettlemoyer; Kai Li; Peter Henderson; |
84 | Exploring Context Window of Large Language Models Via Decomposed Positional Vectors 通过分解位置向量探索大型语言模型的上下文窗口 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we explore the positional information within and beyond the context window for deciphering the underlying mechanism of LLMs. 亮点:在本研究中,我们探讨了上下文窗口内外的位置相关信息,以解读LLMs的潜在机制。 |
Zican Dong; Junyi Li; Xin Men; Xin Zhao; Bingning Wang; Zhen Tian; weipeng chen; Ji-Rong Wen; |
85 | Visual Autoregressive Modeling: Scalable Image Generation Via Next-Scale Prediction 视觉自回归建模:通过下一尺度预测进行可扩展图像生成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine next-scale prediction or next-resolution prediction, diverging from the standard raster-scan next-token prediction. 亮点:我们提出了视觉自回归建模(VAR),这是一种新一代范式,它将图像上的自回归学习重新定义为粗到细的下一个尺度预测或下一个分辨率预测,偏离了标准的光栅扫描下一个标记预测。 |
Keyu Tian; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Liwei Wang; |
86 | Vision Model Pre-training on Interleaved Image-Text Data Via Latent Compression Learning 通过潜在压缩学习在交错图像-文本数据上进行视觉模型预训练 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Inspired by the recent success of compression learning in natural language processing, we propose a novel vision model pre-training method called Latent Compression Learning (LCL) for interleaved image-text data. 亮点:受到最近在自然语言处理领域压缩学习成功的启发,我们提出了一种新的视觉模型预训练方法,称为潜在压缩学习(LCL),用于交错的图像-文本数据。 |
CHENYU YANG; Xizhou Zhu; Jinguo Zhu; Weijie Su; Junjie Wang; Xuan Dong; Wenhai Wang; Bin Li; Jie Zhou; Yu Qiao; Jifeng Dai; |
87 | Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning Via The Lens of Representation Complexity 通过表示复杂性的视角重新思考基于模型、基于策略和基于价值的强化学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This work investigates the potential hierarchy of representation complexity among these RL paradigms. 亮点:本研究探讨了这些强化学习范式之间表示复杂性的潜在层次结构。 |
Guhao Feng; Han Zhong; 冯国豪; 钟汉 |
88 | Video Diffusion Models Are Training-free Motion Interpreter and Controller 视频扩散模型是无训练的运动解释器和控制器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Leveraging MOFT, we propose a novel training-free video motion control framework. 亮点:利用 MOFT,我们提出了一种新颖的无训练视频运动控制框架。 |
Zeqi Xiao; Yifan Zhou; Shuai Yang; Xingang Pan; |
89 | DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model DI-MaskDINO:一个联合目标检测和实例分割模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: With this question in mind, we further conduct qualitative and quantitative pre-experiments, which validate the negative impact of detection-segmentation imbalance issue on the model performance. To address this issue, this paper proposes DI-MaskDINO model, the core idea of which is to improve the final performance by alleviating the detection-segmentation imbalance. 亮点:考虑到这个问题,我们进一步进行定性和定量的预实验,验证了检测-分割不平衡问题对模型性能的负面影响。为了解决这个问题,本文提出了 DI-MaskDINO 模型,其核心思想是通过缓解检测-分割不平衡来提高最终性能。 |
Zhixiong Nan; Li Xianghong; Tao Xiang; Jifeng Dai; |
90 | Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? 大规模软标签对于大规模数据集蒸馏是否必要? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To reduce the within-class similarity, we introduce class-wise supervision during the image synthesizing process by batching the samples within classes, instead of across classes. 亮点:为了减少类内相似性,我们在图像合成过程中引入了类级监督,通过对同类样本进行批处理,而不是跨类处理。 |
Lingao Xiao; Yang He; 凌傲肖; 杨赫; |
91 | VisionLLM V2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks VisionLLM V2:一种端到端的通用多模态大型语言模型,适用于数百个视觉-语言任务 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. 亮点:我们推出了 VisionLLM v2,这是一个端到端的通用多模态大模型(MLLM),在一个框架内统一了视觉感知、理解和生成。 |
Jiannan Wu; Muyan Zhong; Sen Xing; Zeqiang Lai; Zhaoyang Liu; Wenhai Wang; Zhe Chen; Xizhou Zhu; Lewei Lu; Tong Lu; Ping Luo; Yu Qiao; Jifeng Dai; 吴建南;钟睦炎;邢森;赖泽强;刘兆阳;王文海;陈哲;朱希舟;卢乐伟;陆彤;罗平;乔煜;戴季锋; |
92 | Tactile DreamFusion: Exploiting Tactile Sensing for 3D Generation 触觉梦融合:利用触觉感知进行三维生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. 亮点:然而,它们往往无法生成真实的几何细节,导致表面过于光滑或几何细节在反照率贴图中不准确地烘焙。为了解决这个问题,我们提出了一种新方法,将触觉作为额外的模态,以改善生成的 3D 资产的几何细节。 |
Ruihan Gao; Kangle Deng; Gengshan Yang; Wenzhen Yuan; Jun-Yan Zhu; 高瑞涵; 邓康乐; 杨更山; 袁文珍; 朱俊彦; |
93 | NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates 新术语:针对大型语言模型的实时新术语基准测试与年度更新 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, existing benchmarks focus on outdated content and limited fields, facing difficulties in real-time updating and leaving new terms unexplored. To address this problem, we propose an adaptive benchmark, NewTerm, for real-time evaluation of new terms. 重点:然而,现有基准侧重于过时的内容和有限的领域,面临实时更新的困难,并且留下了新术语未被探索。为了解决这个问题,我们提出了一个自适应基准——NewTerm,用于新术语的实时评估。 |
Hexuan Deng; Wenxiang Jiao; Xuebo Liu; Min Zhang; Zhaopeng Tu; 邓赫轩; 焦文翔; 刘学波; 张敏; 涂照鹏; |
94 | Learning 1D Causal Visual Representation with De-focus Attention Networks 使用去焦点注意力网络学习一维因果视觉表示 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: The issue of over-focus hinders the model’s ability to extract diverse visual features and to receive effective gradients for optimization. To address this, we propose De-focus Attention Networks, which employ learnable bandpass filters to create varied attention patterns. 亮点:过度关注的问题阻碍了模型提取多样化视觉特征的能力,并且无法接收有效的梯度进行优化。为了解决这个问题,我们提出了去关注注意力网络,它采用可学习的带通滤波器来创建多样的注意力模式。 |
Tao Chenxin; Xizhou Zhu; Shiqian Su; Lewei Lu; Changyao Tian; Xuan Luo; Gao Huang; Hongsheng Li; Yu Qiao; Jie Zhou; Jifeng Dai; 陈鑫涛; 朱西洲; 苏世乾; 陆乐伟; 田长耀; 罗轩; 黄高; 李洪生; 乔宇; 周杰; 戴继峰; |
95 | Boosting Text-to-Video Generative Model with MLLMs Feedback 通过 MLLMs 反馈增强文本到视频生成模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Building upon this finding, we utilize MLLMs to perform fine-grained video preference annotations across two dimensions, resulting in the creation of VideoPrefer, which includes 135,000 preference annotations. Utilizing this dataset, we introduce VideoRM, the first general-purpose reward model tailored for video preference in the text-to-video domain. 亮点:基于这一发现,我们利用 MLLMs 在两个维度上进行细粒度视频偏好标注,从而创建了 VideoPrefer,其中包含 135,000 个偏好标注。利用该数据集,我们推出了 VideoRM,这是第一个针对文本到视频领域的视频偏好的通用奖励模型。 |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
96 | Multi-Head Mixture-of-Experts 多头专家混合模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose Multi-Head Mixture-of-Experts (MH-MoE). 亮点:在本文中,我们提出了多头专家混合模型(MH-MoE)。 |
Xun Wu; Shaohan Huang; Wenhui Wang; Shuming Ma; Li Dong; Furu Wei; |
97 | Multimodal Large Language Models Make Text-to-Image Generative Models Align Better 多模态大型语言模型使文本到图像生成模型的对齐效果更好 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite these advances, current human preference datasets are either prohibitively expensive to construct or suffer from a lack of diversity in preference dimensions, resulting in limited applicability for instruction tuning in open-source text-to-image generative models and hinder further exploration. To address these challenges and promote the alignment of generative models through instruction tuning, we leverage multimodal large language models to create VisionPrefer, a high-quality and fine-grained preference dataset that captures multiple preference aspects. 亮点:尽管取得了这些进展,目前的人类偏好数据集要么构建成本过高,要么在偏好维度上缺乏多样性,导致在开源文本到图像生成模型中的指令调优应用有限,并阻碍了进一步探索。为了解决这些挑战并促进通过指令调优对生成模型的对齐,我们利用多模态大型语言模型创建了 VisionPrefer,这是一个高质量且细粒度的偏好数据集,捕捉多个偏好方面。 |
Xun Wu; Shaohan Huang; Guolong Wang; Jing Xiong; Furu Wei; |
98 | Mind’s Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models 心灵之眼 LLMs:思维可视化引发大型语言模型的空间推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind’s Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. 亮点:人类具有通过一种被称为“心灵之眼”的过程创造看不见的物体和动作的心理图像的非凡能力,从而使想象力能够探索看不见的世界。受到这种认知能力的启发,我们提出了“思维可视化”(VoT)提示。 |
Wenshan Wu; Shaoguang Mao; Yadong Zhang; Yan Xia; Li Dong; Lei Cui; Furu Wei; 温山吴; 邵光毛; 亚东张; 闫霞; 李东; 雷崔; 付如伟; |
99 | Learning Scene-specific Descriptions Via Adaptive Renormalization for Open-vocabulary Scene Graph Generation 通过自适应重归一化学习场景特定描述以实现开放词汇场景图生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP and follow a standard zero-shot pipeline – computing similarity between the query image and the text embeddings for each category (i.e., text classifiers). In this work, we argue that the text classifiers adopted by existing OVSGG methods, i.e., category-/part-level prompts, are scene-agnostic as they remain unchanged across contexts. 亮点:当前的开放词汇场景图生成(OVSGG)方法使用视觉-语言模型,如 CLIP,并遵循标准的零-shot 流程——计算查询图像与每个类别的文本嵌入之间的相似性(即文本分类器)。在这项工作中,我们认为现有 OVSGG 方法采用的文本分类器,即类别/部分级提示,是场景无关的,因为它们在不同上下文中保持不变。 |
Guikun Chen; Jin Li; Wenguan Wang; 陈贵坤; 李进; 王文冠; |
100 | Unlocking The Potential of Global Human Expertise 释放全球人类专业知识的潜力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, it is difficult to identify, combine, and refine complementary information in an increasingly large and diverse knowledge base. This paper argues that artificial intelligence (AI) can play a crucial role in this process. 亮点:然而,在日益庞大和多样化的知识库中,识别、整合和提炼互补信息是困难的。本文认为人工智能(AI)在这一过程中可以发挥关键作用。 |
Elliot Meyerson; Olivier Francon; Darren Sargent; Babak Hodjat; Risto Miikkulainen; |
101 | Visual Sketchpad: Sketching As A Visual Chain of Thought for Multimodal Language Models 视觉草图板:作为多模态语言模型的视觉思维链的草图绘制 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce Sketchpad, a framework that gives multimodal LMs a visual sketchpad and tools to draw on the sketchpad. 亮点:在本研究中,我们介绍了 Sketchpad,一个为多模态语言模型提供视觉草图板和绘图工具的框架。 |
Yushi Hu; Weijia Shi; Xingyu Fu; Dan Roth; Mari Ostendorf; Luke Zettlemoyer; Noah Smith; Ranjay Krishna; 胡宇诗; 施维佳; 傅星宇; 丹·罗斯; 玛丽·奥斯滕多夫; 卢克·泽特尔莫耶; 诺亚·史密斯; 兰贾伊·克里希纳; |
102 | RL-GPT: Integrating Reinforcement Learning and Code-as-policy RL-GPT:将强化学习与代码作为策略相结合 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To seamlessly integrate both modalities, we introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. 亮点:为了无缝整合这两种模式,我们引入了一个两级层次框架,RL-GPT,包含一个慢代理和一个快代理。 |
Shaoteng Liu; Haoqi Yuan; Minda Hu; Yanwei Li; Yukang Chen; Shu Liu; Zongqing Lu; Jiaya Jia; 刘少腾; 袁浩奇; 胡敏达; 李艳伟; 陈宇康; 刘舒; 陆宗庆; 贾佳雅; |
103 | LLM Evaluators Recognize and Favor Their Own Generations LLM 评估者识别并偏爱他们自己的世代 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we investigate if self-recognition capability contributes to self-preference. 亮点:在本文中,我们研究自我识别能力是否有助于自我偏好。 |
Arjun Panickssery; Samuel Bowman; Shi Feng; 阿尔君·帕尼克谢里;塞缪尔·博曼;施冯; |
104 | SimPO: Simple Preference Optimization with A Reference-Free Reward SimPO:无参考奖励的简单偏好优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we propose SimPO, a simpler yet more effective approach. 亮点:在本研究中,我们提出了 SimPO,一种更简单但更有效的方法。 |
Yu Meng; Mengzhou Xia; Danqi Chen; 余梦; 孟州霞; 丹琦陈; |
105 | Finding Transformer Circuits With Edge Pruning 寻找边缘修剪的变压器电路 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we frame circuit discovery as an optimization problem and propose _Edge Pruning_ as an effective and scalable solution. 亮点:在本文中,我们将电路发现框架视为一个优化问题,并提出_边缘修剪_作为一种有效且可扩展的解决方案。 |
Adithya Bhaskar; Alexander Wettig; Dan Friedman; Danqi Chen; |
106 | Dissecting The Failure of Invariant Learning on Graphs 剖析图上不变学习的失败 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we develop a Structural Causal Model (SCM) to theoretically dissect the performance of two prominent invariant learning methods–Invariant Risk Minimization (IRM) and Variance-Risk Extrapolation (VREx)–in node-level OOD settings. 亮点:在本文中,我们开发了一种结构因果模型(SCM),以理论上剖析两种主要的不变学习方法——不变风险最小化(IRM)和方差风险外推(VREx)——在节点级 OOD 设置中的表现。 |
Qixun Wang; Yifei Wang; Yisen Wang; Xianghua Ying; |
107 | CALVIN: Improved Contextual Video Captioning Via Instruction Tuning CALVIN:通过指令调优改进上下文视频字幕生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Scene descriptions, especially in movies, require a deeper contextual understanding, unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully contextual scene descriptions. 亮点:场景描述,尤其是在电影中,需要更深层次的上下文理解,这与通用视频字幕不同。为了解决这一挑战,我们提出了一种模型,CALVIN,一种专门的视频 LLM,利用之前的电影上下文生成完全上下文的场景描述。 |
Gowthami Somepalli; Arkabandhu Chowdhury; Jonas Geiping; Basri Ronen; Tom Goldstein; David Jacobs; |
108 | Algorithmic Capabilities of Random Transformers 随机变换器的算法能力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To what extent do they depend on the supervisory signal provided to models, and to what extent are they attributable to behavior already present in models at the beginning of training? To investigate these questions, we investigate what functions can be learned by randomly initialized transformers in which only the embedding layers are optimized, so that the only input–output mappings learnable from data are those already implemented (up to a choice of encoding scheme) by the randomly initialized model. 重点:它们在多大程度上依赖于提供给模型的监督信号,以及在多大程度上可以归因于模型在训练开始时已经存在的行为?为了研究这些问题,我们探讨了在仅优化嵌入层的随机初始化变换器中可以学习到哪些函数,因此从数据中可学习的唯一输入-输出映射是随机初始化模型已经实现的那些(直到编码方案的选择)。 |
Ziqian Zhong; Jacob Andreas; 钟紫千;雅各布·安德烈亚斯; |
109 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs CharXiv: 在多模态中绘制现实图表理解的差距 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we propose CharXiv, a comprehensive evaluation suite involving 2,323 natural, challenging, and diverse charts from scientific papers. 亮点:在本研究中,我们提出了 CharXiv,这是一个综合评估套件,包含来自科学论文的 2,323 个自然、具有挑战性和多样化的图表。 |
Zirui Wang; Mengzhou Xia; Luxi He; Howard Chen; Yitao Liu; Richard Zhu; Kaiqu Liang; Xindi Wu; Haotian Liu; Sadhika Malladi; Chevalier; Sanjeev Arora; Danqi Chen; |
110 | Chain of Thoughtlessness? An Analysis of CoT in Planning 无思维链?对规划中 CoT 的分析 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain, and examines the performance of two state-of-the-art LLMs across two axes: generality of examples given in prompt, and complexity of problems queried with each prompt. 亮点:本文呈现了一个关于 Blocksworld 这一经典规划领域问题的思维链案例研究,并考察了两种最先进的LLMs在两个维度上的表现:提示中给出的例子的普遍性和每个提示中查询问题的复杂性。 |
Kaya Stechly; Karthik Valmeekam; Subbarao Kambhampati; |
111 | Warped Diffusion: Solving Video Inverse Problems with Image Diffusion Models 扭曲扩散:使用图像扩散模型解决视频逆问题 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Using image models naively for solving inverse video problems often suffers from flickering, texture-sticking, and temporal inconsistency in generated videos. To tackle these problems, in this paper, we view frames as continuous functions in the 2D space, and videos as a sequence of continuous warping transformations between different frames. 亮点:在解决逆视频问题时,天真地使用图像模型通常会导致生成视频中的闪烁、纹理粘附和时间不一致性。为了解决这些问题,在本文中,我们将帧视为二维空间中的连续函数,将视频视为不同帧之间连续扭曲变换的序列。 |
Giannis Daras; Weili Nie; Karsten Kreis; Alex Dimakis; Morteza Mardani; Nikola Kovachki; Arash Vahdat; |
112 | HYDRA: Model Factorization Framework for Black-Box LLM Personalization HYDRA:黑箱 LLM 个性化的模型因子分解框架 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Existing solutions have primarily focused on prompt design to incorporate user-specific profiles and behaviors; however, such approaches often struggle to generalize effectively due to their inability to capture shared knowledge among all users. To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation. 亮点:现有的解决方案主要集中在提示设计上,以纳入用户特定的个人资料和行为;然而,这些方法往往难以有效地进行泛化,因为它们无法捕捉所有用户之间的共享知识。为了解决这些挑战,我们提出了 HYDRA,一个模型分解框架,能够从历史数据中捕捉用户特定的行为模式以及所有用户之间的共享一般知识,以提供个性化生成。 |
Yuchen Zhuang; Haotian Sun; Yue Yu; Rushi Qiang; Qifan Wang; Chao Zhang; Bo Dai; |
113 | WildVision: Evaluating Vision-Language Models in The Wild with Human Preferences WildVision:在自然环境中通过人类偏好评估视觉-语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Our comprehensive analysis of 20K real-world interactions reveals important insights into the failure cases of top-performing VLMs. 亮点:我们对 20K 真实世界交互的全面分析揭示了关于顶级 VLM 失败案例的重要见解。 |
Yujie Lu; Dongfu Jiang; Wenhu Chen; William Yang Wang; Yejin Choi; Bill Yuchen Lin; |
114 | BitDelta: Your Fine-Tune May Only Be Worth One Bit BitDelta:您的微调可能只值一个比特 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. 亮点:我们通过将微调模型的权重分解为其预训练组件和一个额外的增量来探讨这一假设。我们提出了一种简单的方法,BitDelta,它成功地将这个增量量化到 1 位,而不影响性能。 |
James Liu; Guangxuan Xiao; Kai Li; Jason Lee; Song Han; Tri Dao; Tianle Cai; |
115 | MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark MMLU-Pro:一个更强大且更具挑战性的多任务语言理解基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. 亮点:本文介绍了 MMLU-Pro,这是一种增强数据集,旨在通过整合更具挑战性、注重推理的问题并将选项集从四个扩展到十个,来扩展主要以知识为驱动的 MMLU 基准。 |
Yubo Wang; Xueguang Ma; Ge Zhang; Yuansheng Ni; Abhranil Chandra; Shiguang Guo; Weiming Ren; Aaran Arulraj; Xuan He; Ziyan Jiang; Tianle Li; Max KU; Wang; Alex Zhuang; Rongqi Fan; Xiang Yue; Wenhu Chen; |
116 | Building on Efficient Foundations: Effective Training of LLMs with Structured Feedforward Layers 建立在高效基础之上:使用结构化前馈层有效训练 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Our study focuses on transformer-based LLMs, specifically targeting the computationally intensive feedforward networks (FFN), which are less studied than attention blocks. 亮点:我们的研究集中在基于变换器的 LLMs,特别针对计算密集型的前馈网络(FFN),这些网络的研究相较于注意力模块较少。 |
Xiuying Wei; Skander Moalla; Razvan Pascanu; Caglar Gulcehre; |
117 | Be Like A Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs 像金鱼一样,不要记忆!减轻生成中的记忆LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To mitigate training data exposure without sacrificing model performance, we introduce a simple but subtle modification to the standard next-token prediction objective for autoregressive LLMs that we call the goldfish loss. 亮点:为了减少训练数据的暴露而不牺牲模型性能,我们对自回归 LLMs 的标准下一个标记预测目标引入了一种简单但微妙的修改,我们称之为金鱼损失。 |
Abhimanyu Hans; John Kirchenbauer; Yuxin Wen; Neel Jain; Hamid Kazemi; Prajwal Singhania; Siddharth Singh; Gowthami Somepalli; Jonas Geiping; Abhinav Bhatele; Tom Goldstein; |
118 | Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning 连接联合嵌入预测架构与对比自监督学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite its success, two primary limitations have been identified: the inefficacy of Exponential Moving Average (EMA) from I-JEPA in preventing entire collapse and the inadequacy of I-JEPA prediction in accurately learning the mean of patch representations. Addressing these challenges, this study introduces a novel framework, namely C-JEPA (Contrastive-JEPA), which integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. 亮点:尽管取得了成功,但已识别出两个主要限制:I-JEPA 的指数移动平均(EMA)在防止整体崩溃方面的无效性,以及 I-JEPA 预测在准确学习补丁表示的均值方面的不足。为了解决这些挑战,本研究提出了一种新颖的框架,即 C-JEPA(对比-JEPA),该框架将基于图像的联合嵌入预测架构与方差不变协方差正则化(VICReg)策略相结合。 |
Shentong Mo; Shengbang Tong; |
119 | Crafting Interpretable Embeddings By Asking LLMs Questions 通过提出LLMs个问题来构建可解释的嵌入 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. 亮点:我们引入了问答嵌入(QA-Emb),这种嵌入的每个特征代表对向LLM提出的是/否问题的回答。 |
Vinamra Benara; Chandan Singh; John Morris; Richard Antonello; Ion Stoica; Alexander Huth; Jianfeng Gao; |
120 | Who’s Asking? User Personas and The Mechanics of Latent Misalignment 谁在提问?用户角色与潜在不一致的机制 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite investments in improving model safety, studies show that misaligned capabilities remain latent in safety-tuned models. In this work, we shed light on the mechanics of this phenomenon. 亮点:尽管在提高模型安全性方面进行了投资,但研究表明,安全调优模型中仍然潜藏着不对齐的能力。在这项工作中,我们揭示了这一现象的机制。 |
Asma Ghandeharioun; Ann Yuan; Marius Guerard; Emily Reif; 阿斯玛·甘德哈里昂; 安·袁; 马里乌斯·盖拉尔; 艾米莉·赖夫; |
121 | MathPile: A Billion-Token-Scale Pretraining Corpus for Math MathPile:一个十亿标记规模的数学预训练语料库 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce MathPile, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. 亮点:在这项工作中,我们介绍了 MathPile,这是一个多样化且高质量的数学中心语料库,包含约 95 亿个标记。 |
Zengzhi Wang; Xuefeng Li; Rui Xia; Pengfei Liu; |
122 | Self-Retrieval: End-to-End Information Retrieval with One Large Language Model 自我检索:基于一个大型语言模型的端到端信息检索 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce \emph{Self-Retrieval}, a novel end-to-end LLM-driven information retrieval architecture. 亮点:在本文中,我们介绍了\emph{Self-Retrieval},一种新颖的端到端LLM驱动的信息检索架构。 |
Qiaoyu Tang; Jiawei Chen; Zhuoqun Li; Bowen Yu; Yaojie Lu; ChengFu; Haiyang Yu; Hongyu Lin; Fei Huang; Ben He; Xianpei Han; Le Sun; Yongbin Li; |
123 | Knowledge Circuit in Transformers 变压器中的知识电路 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. 亮点:在本文中,我们深入探讨了语言模型的计算图,以揭示在表达特定知识时起重要作用的知识电路。 |
Yunzhi Yao; Ningyu Zhang; Zekun Xi; Mengru Wang; Ziwen Xu; Shumin Deng; Huajun Chen; |
124 | MInference: Accelerating Pre-filling for Long-Context LLMs Via Dynamic Sparse Attention MInference:通过动态稀疏注意力加速长上下文的预填充 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency when applied to longcontext LLMs. To address this gap, we introduce MInference, a sparse calculation method designed to accelerate pre-filling of long-sequence processing. 亮点:现有的加速预填充的方法在应用于长上下文时往往无法保持可接受的准确性或效率LLMs。为了解决这一问题,我们引入了 MInference,这是一种稀疏计算方法,旨在加速长序列处理的预填充。 |
Huiqiang Jiang; Yucheng LI; Chengruidong Zhang; Qianhui Wu; Xufang Luo; Surin Ahn; Zhenhua Han; Amir Abdi; Dongsheng Li; Chin-Yew Lin; Yuqing Yang; Lili Qiu; |
125 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset OpenMathInstruct-1:一个 180 万数学教学调优数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. 亮点:基于最近在开源LLMs方面的进展,我们提出的提示新颖性以及一些暴力扩展,我们构建了 OpenMathInstruct-1,这是一个包含 180 万个问题-解决方案对的数学指令调优数据集。 |
Shubham Toshniwal; Ivan Moshkov; Sean Narenthiran; Daria Gitman; Fei Jia; Igor Gitman; |
126 | UltraEdit: Instruction-based Fine-Grained Image Editing at Scale UltraEdit:基于指令的大规模细粒度图像编辑 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents UltraEdit, a large-scale (~ 4M editing samples), automatically generated dataset for instruction-based image editing. 亮点:本文介绍了 UltraEdit,这是一个大规模(~ 4M 编辑样本)、自动生成的用于基于指令的图像编辑的数据集。 |
Haozhe Zhao; Xiaojian (Shawn) Ma; Liang Chen; Shuzheng Si; Rujie Wu; Kaikai An; Peiyu Yu; Minjia Zhang; Qing Li; Baobao Chang; |
127 | Refusal in Language Models Is Mediated By A Single Direction 语言模型中的拒绝是由单一方向介导的 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we show that refusal is mediated by a one-dimensional subspace, across 13 popular open-source chat models up to 72B parameters in size. 亮点:在本研究中,我们展示了拒绝是通过一个一维子空间介导的,涉及 13 个流行的开源聊天模型,参数大小达到 72B。 |
Andy Arditi; Oscar Obeso; Aaquib Syed; Nina Panickssery; Daniel Paleka; Wes Gurnee; Neel Nanda; 安迪·阿尔迪提;奥斯卡·奥贝索;阿阿基布·赛义德;尼娜·帕尼克塞里;丹尼尔·帕莱卡;韦斯·古尔尼;尼尔·南达; |
128 | JiuZhang3.0: Efficiently Improving Mathematical Reasoning By Training Small Data Synthesis Models JiuZhang3.0:通过训练小数据合成模型有效提高数学推理能力 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. 亮点:为了降低成本,我们基于可用的开源文本,提出了一种有效的方法,该方法训练一个小型 {{1001 }} 用于数学问题合成,以高效生成足够的高质量预训练数据。 |
Kun Zhou; Beichen Zhang; jiapeng wang; Zhipeng Chen; Xin Zhao; Jing Sha; Zhichao Sheng; Shijin Wang; Ji-Rong Wen; |
129 | Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders 通过门控稀疏自编码器改善语言模型激活的稀疏分解 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. 亮点:我们引入了门控稀疏自编码器(Gated SAE),它在训练中相较于现有方法实现了帕累托改进。 |
Senthooran Rajamanoharan; Arthur Conmy; Lewis Smith; Tom Lieberum; Vikrant Varma; Janos Kramar; Rohin Shah; Neel Nanda; |
130 | IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering IQA-EVAL:人机交互问答的自动评估 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce an automated evaluation framework IQA-EVAL to Interactive Question Answering Evaluations, more specifically, we introduce LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. 亮点:在本研究中,我们引入了一个自动评估框架 IQA-EVAL 用于互动问答评估,更具体地说,我们引入了基于LLM的评估代理(LEA),它可以:(1)模拟人类行为以生成与 IQA 模型的交互;(2)自动评估生成的交互。 |
Ruosen Li; Ruochen Li; Barry Wang; Xinya Du; |
131 | MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations MEQA:一个用于多跳事件中心问题回答的基准,带有解释 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. 亮点:在本文中,我们通过从信息提取(IE)数据集中构建事件结构,介绍了一种新颖的半自动问题生成策略,并提出了第一个多跳事件中心问题回答(MEQA)基准。 |
Ruosen Li; Zimu Wang; Son Tran; Lei Xia; Xinya Du; 李若森; 王子木; 陈孙; 夏雷; 杜欣雅; |
132 | Stabilize The Latent Space for Image Autoregressive Modeling: A Unified Perspective 稳定图像自回归建模的潜在空间:统一视角 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. 亮点:这一发现与自然语言处理领域形成鲜明对比,在该领域,自回归模型 GPT 已建立了强大的存在。为了解决这一差异,我们提出了潜在空间与生成模型之间关系的统一视角,强调了潜在空间在图像生成建模中的稳定性。 |
Yongxin Zhu; Bocheng Li; Hang Zhang; Xin Li; Linli Xu; Lidong Bing; |
133 | How Do Large Language Models Handle Multilingualism? 大型语言模型如何处理多语言问题? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To verify $\texttt{MWork}$, we introduce Parallel Language-specific Neuron Detection ($\texttt{PLND}$) to identify activated neurons for inputs in different languages without any labeled data. 亮点:为了验证 $\texttt{MWork}$,我们引入了特定于语言的并行神经元检测($\texttt{PLND}$),以识别不同语言输入中激活的神经元,而无需任何标记数据。 |
Yiran Zhao; Wenxuan Zhang; Guizhen Chen; Kenji Kawaguchi; Lidong Bing; |
134 | KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization KVQuant:通过 KV 缓存量化实现 1000 万上下文长度推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Quantization is a promising approach for compressing KV cache activations; however, existing solutions fail to represent activations accurately in sub-4-bit precision. Our work, KVQuant, facilitates low precision KV cache quantization by incorporating several novel methods: (i) Per-Channel Key Quantization, where we adjust the dimension along which we quantize the Key activations to better match the distribution; (ii) Pre-RoPE Key Quantization, where we quantize Key activations before the rotary positional embedding to mitigate its impact on quantization; (iii) Non-Uniform KV Cache Quantization, where we derive per-layer sensitivity-weighted non-uniform datatypes that better represent the distributions; and (iv) Per-Vector Dense-and-Sparse Quantization, where we isolate outliers separately for each vector to minimize skews in quantization ranges. 亮点:量化是一种有前景的压缩 KV 缓存激活的方法;然而,现有解决方案未能在小于 4 位的精度下准确表示激活。我们的工作 KVQuant 通过结合几种新方法来促进低精度 KV 缓存量化:(i)每通道键量化,我们调整量化键激活的维度以更好地匹配分布;(ii)预旋转位置嵌入键量化,我们在旋转位置嵌入之前量化键激活,以减轻其对量化的影响;(iii)非均匀 KV 缓存量化,我们推导出每层敏感度加权的非均匀数据类型,以更好地表示分布;(iv)每向量稠密与稀疏量化,我们为每个向量单独隔离异常值,以最小化量化范围的偏差。 |
Coleman Hooper; Sehoon Kim; Hiva Mohammadzadeh; Michael Mahoney; Sophia Shao; Kurt Keutzer; Amir Gholami; 科尔曼·胡普; 金世勋; 希瓦·穆罕默德扎德; 迈克尔·马赫尼; 索非亚·肖; 柯特·库茨特; 阿米尔·戈拉米; |
135 | SIRIUS : Contexual Sparisty with Correction for Efficient LLMs SIRIUS : 具有修正的上下文稀疏性以提高效率 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces \textsc{Sirius}, an efficient correction mechanism, which enables accurate LLM inference with contextual sparsity. 亮点:本文介绍了\textsc{Sirius},一种高效的纠正机制,能够在上下文稀疏的情况下实现准确的LLM推理。 |
Yang Zhou; Zhuoming Chen; Zhaozhuo Xu; Victoria Lin; Beidi Chen; 杨周;陈卓明;徐兆卓;林维多利亚;陈北迪; |
136 | S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning By Structured Sparsity S$^{2}$FT:通过结构稀疏性实现高效、可扩展和可泛化的精细调优 LLM Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of Structured Sparse Fine-Tuning (S$^{2}$FT) methods for LLMs, which concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability. 亮点:为了解决这一局限性,我们研究了稀疏微调,并观察到泛化能力的显著提升。利用这一关键见解,我们提出了一系列结构稀疏微调(S$^{2}$FT)方法,适用于LLMs,同时实现了最先进的微调性能、训练效率和推理可扩展性。 |
Xinyu Yang; Jixuan Leng; Geyang Guo; Jiawei Zhao; Ryumei Nakada; Linjun Zhang; Huaxiu Yao; Beidi Chen; |
137 | Sequoia: Scalable and Robust Speculative Decoding 红杉:可扩展且稳健的推测解码 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces Sequoia, a scalable and robust algorithm for speculative decoding. 亮点:本文介绍了 Sequoia,一种可扩展且稳健的推测解码算法。 |
Zhuoming Chen; Avner May; Ruslan Svirschevski; Yu-Hsun Huang; Max Ryabinin; Zhihao Jia; Beidi Chen; |
138 | Confidence Regulation Neurons in Language Models 语言模型中的信心调节神经元 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. 亮点:本研究探讨了两个被认为影响这种不确定性的关键组成部分:最近发现的熵神经元和我们称之为标记频率神经元的新一组组成部分。 |
Alessandro Stolfo; Ben Wu; Wes Gurnee; Yonatan Belinkov; Xingyi Song; Mrinmaya Sachan; Neel Nanda; 阿莱桑德罗·斯托尔福; 本·吴; 韦斯·古尔尼; 约纳坦·贝林科夫; 邢怡·宋; 米林玛雅·萨昌; 尼尔·南达; |
139 | FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner FlowTurbo:基于流的实时图像生成与速度精炼器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality. 亮点:在本文中,我们提出了一个名为 FlowTurbo 的框架,以加速基于流的模型的采样,同时提高采样质量。 |
Wenliang Zhao; Minglei Shi; Xumin Yu; Jie Zhou; Jiwen Lu; 赵文良; 施明磊; 于旭敏; 周杰; 陆继文; |
140 | Counterfactual PPO Enhanced Shared Reflector for LLM-based Multi-agent Collaboration 基于LLM的多智能体协作的反事实 PPO 增强共享反射器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a novel framework, named COPPER, to enhance the collaboration ability of multi-agent systems through learnable self-reflection mechanism. 亮点:在本文中,我们提出了一种新颖的框架,名为 COPPER,通过可学习的自我反思机制来增强多智能体系统的协作能力。 |
Xiaohe Bo; Zeyu Zhang; Quanyu Dai; Xueyang Feng; Lei Wang; Rui Li; Xu Chen; Ji-Rong Wen; 小河博;泽宇张;全宇戴;雪阳冯;雷王;瑞李;徐晨;季荣温; |
141 | OneBit: Towards Extremely Low-bit Large Language Models OneBit:朝着极低比特的大型语言模型发展 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This paper boldly quantizes the weight matrices of LLMs to 1-bit, paving the way for the extremely low bit-width deployment of LLMs. For this target, we introduce a 1-bit model compressing framework named OneBit, including a novel 1-bit parameter representation method to better quantize LLMs as well as an effective parameter initialization method based on matrix decomposition to improve the convergence speed of the quantization framework. 亮点:本文大胆地将LLMs的权重矩阵量化为 1 位,为LLMs的极低位宽部署铺平了道路。为此,我们引入了一个名为 OneBit 的 1 位模型压缩框架,包括一种新颖的 1 位参数表示方法,以更好地量化LLMs,以及一种基于矩阵分解的有效参数初始化方法,以提高量化框架的收敛速度。 |
Yuzhuang Xu; Xu Han; Zonghan Yang; Shuo Wang; Qingfu Zhu; Zhiyuan Liu; Weidong Liu; Wanxiang Che; |
142 | One-Shot Safety Alignment for Large Language Models Via Optimal Dualization 通过最优双重化实现大型语言模型的一次性安全对齐 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. 亮点:本文提出了一种双重化视角,将受限对齐问题简化为等效的无约束对齐问题。 |
Xinmeng Huang; Shuo Li; Edgar Dobriban; Osbert Bastani; Hamed Hassani; Dongsheng Ding; |
143 | Super Consistency of Neural Network Landscapes and Learning Rate Transfer 神经网络景观的超一致性与学习率转移 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: From an optimization perspective, this phenomenon is puzzling, as it implies that the loss landscape is consistently similar across very different model sizes. In this work, we study the landscape through the lens of the Hessian, with a focus on its largest eigenvalue (i.e. the sharpness), and find that certain spectral properties under $\mu$P are largely independent of the width and depth of the network along the training trajectory. 亮点:从优化的角度来看,这一现象令人困惑,因为它意味着在非常不同的模型规模中,损失景观始终相似。在这项工作中,我们通过海森矩阵的视角研究这一景观,重点关注其最大特征值(即尖锐度),发现$\mu$P 下的某些谱特性在训练轨迹上基本独立于网络的宽度和深度。 |
Lorenzo Noci; Alexandru Meterez; Thomas Hofmann; Antonio Orvieto; 洛伦佐·诺奇;亚历山德鲁·梅特雷兹;托马斯·霍夫曼;安东尼奥·奥尔维耶托; |
144 | Not All Tokens Are What You Need for Pretraining 并非所有的标记都是预训练所需的 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. 亮点:我们的初步分析考察了语言模型的令牌级训练动态,揭示了不同令牌的独特损失模式。利用这些见解,我们引入了一种新的语言模型,称为 Rho-1。 |
Zhenghao Lin; Zhibin Gou; Yeyun Gong; Xiao Liu; yelong shen; Ruochen Xu; Chen Lin; Yujiu Yang; Jian Jiao; Nan Duan; Weizhu Chen; 郑浩 林; 志斌 龚; 叶云 龚; 小 刘; 叶龙 沈; 若晨 许; 陈 林; 余久 杨; 建 交; 南 段; 伟柱 陈; |
145 | Trajectory Flow Matching with Applications to Clinical Time Series Modelling 轨迹流匹配及其在临床时间序列建模中的应用 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, current algorithms for training Neural SDEs require backpropagation through the SDE dynamics, greatly limiting their scalability and stability. To address this, we propose \textbf{Trajectory Flow Matching} (TFM), which trains a Neural SDE in a \textit{simulation-free} manner, bypassing backpropagation through the dynamics. 亮点:然而,当前训练神经随机微分方程(Neural SDEs)的算法需要通过随机微分方程的动态进行反向传播,这极大地限制了它们的可扩展性和稳定性。为了解决这个问题,我们提出了\textbf{轨迹流匹配}(Trajectory Flow Matching,TFM),它以\textit{无模拟}的方式训练神经随机微分方程,绕过了通过动态进行反向传播的过程。 |
Xi (Nicole) Zhang; Yuan Pu; Yuki Kawamura; Andrew Loza; Yoshua Bengio; Dennis Shung; Alexander Tong; |
146 | Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models 使用棋盘游戏模型测量字典学习在语言模型可解释性中的进展 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To guide progress in interpretable dictionary learning, we introduce a new SAE training technique, $p$-annealing, which demonstrates improved performance on our metric. 亮点:为了指导可解释字典学习的进展,我们引入了一种新的 SAE 训练技术$p$-退火,它在我们的指标上表现出更好的性能。 |
Adam Karvonen; Benjamin Wright; Can Rager; Rico Angell; Jannik Brinkmann; Logan Smith; Claudio Mayrink Verdun; David Bau; Samuel Marks; 亚当·卡尔沃宁; 本杰明·赖特; 坎·雷杰; 里科·安杰尔; 扬尼克·布林克曼; 洛根·史密斯; 克劳迪奥·梅林克·维尔顿; 大卫·包; 塞缪尔·马克斯; |
147 | GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping GenWarp:单图像到新视图的语义保留生成扭曲 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a novel approach for single-shot novel view synthesis, a semantic-preserving generative warping framework that enables T2I generative models to learn where to warp and where to generate, through augmenting cross-view attention with self-attention. 亮点:在本文中,我们提出了一种新颖的单次拍摄新视图合成方法,一种语义保留的生成扭曲框架,使得 T2I 生成模型能够通过增强交叉视图注意力与自注意力,学习在哪里扭曲和在哪里生成。 |
Junyoung Seo; Kazumi Fukuda; Takashi Shibuya; Takuya Narihira; Naoki Murata; Shoukang Hu; Chieh-Hsin Lai; Seungryong Kim; Yuki Mitsufuji; |
148 | Scaling Sign Language Translation 扩展手语翻译 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. 亮点:在本文中,我们通过扩大预训练数据、模型规模和翻译方向的数量,推动了 SLT 的前沿。 |
Biao Zhang; Garrett Tanzer; Orhan Firat; |
149 | ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification ZipCache:通过显著标记识别实现准确高效的 KV 缓存量化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we present ZipCache, an accurate and efficient KV cache quantization method for large language models (LLMs). 亮点:在本文中,我们提出了 ZipCache,一种针对大型语言模型的准确且高效的 KV 缓存量化方法(LLMs)。 |
Yefei He; Luoming Zhang; Weijia Wu; Jing Liu; Hong Zhou; Bohan Zhuang; 叶飞 何; 罗明 张; 伟佳 吴; 静 刘; 宏 周; 博涵 庄; |
150 | Amortized Planning with Large-Scale Transformers: A Case Study on Chess 大规模变换器的摊销规划:以国际象棋为案例研究 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper uses chess, a landmark planning problem in AI, to assess transformers’ performance on a planning task where memorization is futile – even at large scale. 亮点:本文使用国际象棋这一人工智能中的标志性规划问题,评估变换器在一个记忆无效的规划任务上的表现——即使在大规模情况下也是如此。 |
Anian Ruoss; Grégoire Delétang; Sourabh Medapati; Jordi Grau-Moya; Kevin Li; Elliot Catt; John Reid; Cannada Lewis; Tim Genewein; Joel Veness; |
151 | DoFIT: Domain-aware Federated Instruction Tuning with Alleviated Catastrophic Forgetting DoFIT: 领域感知的联邦指令调优与减轻灾难性遗忘 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This leads to domain-information catastrophic forgetting in collaborative training and therefore makes model perform sub-optimally on the individual domain. To address this issue, we introduce DoFIT, a new Domain-aware FIT framework that alleviates catastrophic forgetting through two new designs. 亮点:这导致了协作训练中的领域信息灾难性遗忘,因此使模型在单个领域的表现不佳。为了解决这个问题,我们引入了 DoFIT,一个新的领域感知 FIT 框架,通过两种新设计减轻灾难性遗忘。 |
Binqian Xu; Xiangbo Shu; Haiyang Mei; Zechen Bai; Basura Fernando; Mike Zheng Shou; Jinhui Tang; |
152 | HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation HumanVid:揭示可通过相机控制的人物图像动画的训练数据 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely found in real-world data. 亮点:值得注意的是,我们引入了一种基于规则的相机轨迹生成方法,使合成管道能够融入多样且精确的相机运动注释,这在现实世界数据中很少见。 |
Zhenzhi Wang; Yixuan Li; Yanhong Zeng; Youqing Fang; Yuwei Guo; Wenran Liu; Jing Tan; Kai Chen; Bo Dai; Tianfan Xue; Dahua Lin; |
153 | Scalable Optimization in The Modular Norm 模块范数中的可扩展优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: When ramping up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the natural norm particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. 亮点:在增加单层的宽度时,训练的优雅扩展与需要在该层特有的自然范数下对权重及其更新进行归一化有关。本文通过定义模范数显著推广了这一思想,模范数是任何神经网络架构的完整权重空间上的自然范数。 |
Jeremy Bernstein; Tim Large; Yang Liu; Jacob Huh; Hyojin Bahng; Phillip Isola; 杰里米·伯恩斯坦; 蒂姆·拉尔杰; 杨柳; 雅各布·赫; 玄珍·方; 菲利普·伊索拉; |
154 | Paloma: A Benchmark for Evaluating Language Model Fit Paloma:评估语言模型适配性的基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains, instead of assuming perplexity on one distribution extrapolates to others. 亮点:我们引入了语言模型评估的困惑度分析(Paloma),这是一个基准,用于测量语言模型在 546 个英语和代码领域的适应性,而不是假设一个分布的困惑度可以外推到其他分布。 |
Ian Magnusson; Akshita Bhagia; Valentin Hofmann; Luca Soldaini; Ananya Harsh Jha; Oyvind Tafjord; Dustin Schwenk; Evan Walsh; Yanai Elazar; Kyle Lo; Dirk Groeneveld; Iz Beltagy; Hannaneh Hajishirzi; Noah Smith; Kyle Richardson; Jesse Dodge; 伊恩·马格努森;阿克希塔·巴吉亚;瓦伦丁·霍夫曼;卢卡·索尔达尼;阿南亚·哈什·贾;奥伊文德·塔福德;达斯汀·施温克;埃文·沃尔什;亚奈·埃拉扎尔;凯尔·洛;迪尔克·格罗内维尔德;伊兹·贝尔塔吉;哈娜内·哈吉希尔齐;诺亚·史密斯;凯尔·理查森;杰西·道奇; |
155 | AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents AgentBoard:一个多轮LLM代理的分析评估板 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. 亮点:此外,当前的评估框架主要关注最终成功率,在过程中揭示的见解很少,未能提供对模型能力的深入理解。为了解决这些挑战,我们引入了 AgentBoard,这是一个开创性的综合基准和配套的开源评估框架,旨在对LLM个代理进行分析评估。 |
Ma Chang; Junlei Zhang; Zhihao Zhu; Cheng Yang; Yujiu Yang; Yaohui Jin; Zhenzhong Lan; Lingpeng Kong; Junxian He; 马畅;张俊雷;朱志浩;杨成;杨宇久;金耀辉;兰振忠;孔灵鹏;何俊贤; |
156 | Analysing The Generalisation and Reliability of Steering Vectors 分析引导向量的泛化和可靠性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that steering vectors have substantial limitations both in- and out-of-distribution. 亮点:然而,这种方法的可靠性和推广特性尚不清楚。在本研究中,我们严格调查了这些特性,并表明引导向量在分布内和分布外都有显著的局限性。 |
Daniel Tan; David Chanin; Aengus Lynch; Brooks Paige; Dimitrios Kanoulas; Adrià Garriga-Alonso; Robert Kirk; 丹尼尔·谭;大卫·查宁;恩戈斯·林奇;布鲁克斯·佩奇;迪米特里奥斯·卡诺拉斯;阿德里亚·加里加-阿隆索;罗伯特·柯克; |
157 | Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation 合作、竞争与恶意:LLM-利益相关者互动谈判 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose multiple metrics to rigorously quantify agents’ performance and alignment with the assigned role. 亮点:我们提出多种指标,以严格量化代理的表现及其与分配角色的一致性。 |
Sahar Abdelnabi; Amr Gomaa; Sarath Sivaprasad; Schönherr; Mario Fritz; 萨哈尔·阿卜杜纳比;阿姆尔·戈马;萨拉斯·西瓦普拉萨德;舍恩赫尔;马里奥·弗里茨; |
158 | Invisible Image Watermarks Are Provably Removable Using Generative AI 不可见图像水印可以通过生成性人工智能证明可去除 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. 亮点:它们还防止人们滥用图像,特别是那些由 AI 模型生成的图像。我们提出了一系列再生攻击,以去除这些隐形水印。 |
Xuandong Zhao; Kexun Zhang; Zihao Su; Saastha Vasan; Ilya Grishchenko; Christopher Kruegel; Giovanni Vigna; Yu-Xiang Wang; Lei Li; 徐东赵; 张克勋; 苏子豪; 萨斯塔·瓦桑; 伊利亚·格里申科; 克里斯托弗·克鲁格尔; 乔瓦尼·维尼亚; 王宇翔; 李雷; |
159 | A Universal Growth Rate for Learning with Smooth Surrogate Losses 具有平滑代理损失的学习通用增长率 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. 亮点:本文对用于分类的各种代理损失的 $H$-一致性界限(和过量误差界限)的增长率进行了全面分析。 |
Anqi Mao; Mehryar Mohri; Yutao Zhong; 毛安琪; 梅赫里亚尔·莫赫里; 钟宇涛; |
160 | Turning Indirect Knowledge Into Direct Demonstrations for Computer Agents at Scale 将间接知识转化为大规模计算机代理的直接演示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present Synatra, an approach that effectively transforming the indirect knowledge into direct supervisions at scale. 亮点:在本研究中,我们提出了 Synatra,一种有效地将间接知识转化为直接监督的方法,具有大规模应用的能力。 |
Tianyue Ou; Frank F. Xu; Aman Madaan; Jiarui Liu; Robert Lo; Abishek Sridhar; Sudipta Sengupta; Dan Roth; Graham Neubig; Shuyan Zhou; |
161 | Connecting The Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data 连接点:LLMs 可以从不同的训练数据中推断和表达潜在结构 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Could an LLM infer the dangerous knowledge by piecing together these hints? As a step towards answering this question, we study \textit{inductive out-of-context reasoning} (OOCR). 亮点:一个LLM能否通过拼凑这些线索推断出危险的知识?作为回答这个问题的一步,我们研究\textit{归纳性上下文外推理}(OOCR)。 |
Johannes Treutlein; Dami Choi; Jan Betley; Cem Anil; Samuel Marks; Roger Grosse; Owain Evans; 约翰内斯·特罗特林; 达米·崔; 简·贝特利; 塞姆·阿尼尔; 塞缪尔·马克斯; 罗杰·格罗斯; 欧文·埃文斯; |
162 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs 我、我自己与人工智能:情境意识数据集(SAD)用于LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. 亮点:为了量化LLMs中的情境意识,我们引入了一系列基于问答和指令遵循的行为测试。 |
Rudolf Laine; Bilal Chughtai; Jan Betley; Kaivalya Hariharan; Mikita Balesni; Jérémy Scheurer; Marius Hobbhahn; Alexander Meinke; Owain Evans; 鲁道夫·莱因; 比拉尔·丘赫泰; 扬·贝特利; 凯瓦利亚·哈里哈兰; 米基塔·巴列斯尼; 杰雷米·舍尔尔; 马里乌斯·霍布汉; 亚历山大·迈因克; 欧温·埃文斯; |
163 | Fine-grained Analysis of In-context Linear Estimation 上下文线性估计的细粒度分析 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we develop a stronger characterization of the optimization and generalization landscape of ICL through contributions on architectures, low-rank parameterization, and correlated designs: (1) We study the landscape of 1-layer linear attention and 1-layer H3, a state-space model. 亮点:在这项工作中,我们通过对架构、低秩参数化和相关设计的贡献,发展了对 ICL 的优化和泛化景观的更强表征:(1) 我们研究了 1 层线性注意力和 1 层 H3(一个状态空间模型)的景观。 |
Yingcong Li; Ankit Rawat; Samet Oymak; 李英聪; 安基特·拉瓦特; 萨梅特·奥伊马克; |
164 | Metric Flow Matching for Smooth Interpolations on The Data Manifold 数据流匹配的度量用于数据流形上的平滑插值 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose Metric Flow Matching (MFM), a novel simulation-free framework for conditional flow matching where interpolants are approximate geodesics learned by minimizing the kinetic energy of a data-induced Riemannian metric. 亮点:在本文中,我们提出了度量流匹配(MFM),这是一种新颖的无模拟框架,用于条件流匹配,其中插值是通过最小化数据诱导的黎曼度量的动能来学习的近似测地线。 |
Kacper Kapusniak; Peter Potaptchik; Teodora Reu; Leo Zhang; Alexander Tong; Michael Bronstein; Joey Bose; Francesco Di Giovanni; |
165 | Large Language Model Unlearning 大型语言模型的遗忘 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We study how to perform unlearning, i.e. forgetting undesirable (mis)behaviors, on large language models (LLMs). 亮点:我们研究如何对大型语言模型(LLMs)执行去学习,即忘记不良的(错误)行为。 |
Yuanshun Yao; Xiaojun Xu; Yang Liu; 姚元顺; 许晓军; 刘洋; |
166 | Large Language Model-Driven Audio Codec Is A Few-Shot Audio Task Learner 大型语言模型驱动的音频编解码器是一个少量样本音频任务学习者 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Specifically, we propose a novel and LLMs-driven audio codec model, LLM-Codec, to transfer the audio modality into the textual space, \textit{i.e.} representing audio tokens with words or sub-words in the vocabulary of LLMs, while keeping high audio reconstruction quality. 亮点:具体而言,我们提出了一种新颖的和 LLMs 驱动的音频编解码器模型 LLM-Codec,将音频模态转移到文本空间,即用 LLMs 词汇中的单词或子词表示音频标记,同时保持高音频重建质量。 |
Dongchao Yang; Haohan Guo; Yuanyuan Wang; Rongjie Huang; Xiang Li; Xu Tan; Xixin Wu; Helen Meng; 董超 杨; 郭浩涵; 王媛媛; 黄荣杰; 李翔; 谭旭; 吴西新; 孟海伦; |
167 | Preference Learning Algorithms Do Not Learn Preference Rankings 偏好学习算法不学习偏好排名 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via *ranking accuracy*. 亮点:在这项工作中,我们研究了传统观点,即偏好学习训练模型使其对更受欢迎的输出分配比对不太受欢迎的输出更高的可能性,通过*排名准确性*来衡量。 |
Angelica Chen; Sadhika Malladi; Lily Zhang; Xinyi Chen; Qiuyi (Richard) Zhang; Rajesh Ranganath; Kyunghyun Cho; |
168 | The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About The Subjective and Multicultural Alignment of Large Language Models PRISM 对齐数据集:参与性、代表性和个性化人类反馈揭示了大型语言模型的主观性和多元文化对齐情况 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. 亮点:然而,关于反馈过程的方法(如何)、领域(在哪里)、人群(谁)和目标(目的何在)仍然存在未解之问。为了应对这些问题,我们引入了 PRISM,这是一个新的数据集,它将来自 75 个国家的 1,500 名多样化参与者的社会人口统计和声明偏好,与他们在与 21 LLMs 的 8,011 次实时对话中的上下文偏好和细致反馈进行了映射。 |
Hannah Rose Kirk; Alexander Whitefield; Paul Rottger; Andrew M. Bean; Katerina Margatina; Rafael Mosquera; Juan Ciro; Max Bartolo; Adina Williams; He He; Bertie Vidgen; Scott Hale; 汉娜·罗斯·柯克; 亚历山大·怀特菲尔德; 保罗·罗特格; 安德鲁·M·比恩; 卡特里娜·马尔加蒂娜; 拉斐尔·莫斯克拉; 胡安·西罗; 马克斯·巴托洛; 阿迪娜·威廉姆斯; 何何; 贝尔蒂·维杰恩; 斯科特·海尔; |
169 | Segment Anything Without Supervision 无监督地分割任何事物 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we present Unsupervised SAM (UnSAM), a segment anything model for interactive and automatic whole-image segmentation which does not require human annotations. 亮点:在本文中,我们提出了无监督 SAM(UnSAM),一种用于交互式和自动全图像分割的模型,无需人工标注。 |
Xudong Wang; Jingfeng Yang; Trevor Darrell; |
170 | Smoothed Energy Guidance: Guiding Diffusion Models By Attenuating Energy Curvature of Attention 平滑能量引导:通过减弱注意力的能量曲率来引导扩散模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose Smoothed Energy Guidance (SEG), a novel training- and condition-free approach that leverages the energy-based perspective of the self-attention mechanism to enhance image generation. 亮点:在本研究中,我们提出了平滑能量引导(SEG),这是一种新颖的无训练和无条件的方法,利用基于能量的自注意力机制的视角来增强图像生成。 |
Susung Hong; |
171 | Transcoders Find Interpretable LLM Feature Circuits 转码器发现可解释的 LLM 特征电路 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To address this we explore **transcoders**, which seek to faithfully approximate a densely activating MLP layer with a wider, sparsely-activating MLP layer. We successfully train transcoders on language models with 120M, 410M, and 1.4B parameters, and find them to perform at least on par with SAEs in terms of sparsity, faithfulness, and human-interpretability. 亮点:为了解决这个问题,我们探索了**转码器**,它们旨在用一个更宽、稀疏激活的 MLP 层忠实地近似一个密集激活的 MLP 层。我们成功地在具有 120M、410M 和 1.4B 参数的语言模型上训练了转码器,并发现它们在稀疏性、忠实性和人类可解释性方面至少与 SAEs 表现相当。 |
Jacob Dunefsky; Philippe Chlenski; Neel Nanda; 雅各布·杜内夫斯基;菲利普·克伦斯基;尼尔·南达; |
172 | LiT: Unifying LiDAR Languages with LiDAR Translator LiT:通过 LiDAR 翻译器统一 LiDAR 语言 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: These gaps, akin to language barriers, hinder the synergistic use of diverse LiDAR datasets, limiting the scalability and unification of perception models. To address this challenge, we present the \textit{LiDAR Translator (LiT)}, a novel framework designed to unify LiDAR data into a single target “language”. 亮点:这些差距类似于语言障碍,阻碍了多样化 LiDAR 数据集的协同使用,限制了感知模型的可扩展性和统一性。为了解决这一挑战,我们提出了\textit{LiDAR Translator (LiT)},这是一种旨在将 LiDAR 数据统一为单一目标“语言”的新框架。 |
Yixing Lao; Tao Tang; Xiaoyang Wu; Peng Chen; Kaicheng Yu; Hengshuang Zhao; 宜兴老; 陶唐; 肖阳吴; 彭晨; 开诚余; 恒霜赵; |
173 | Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts 彩虹团队:开放式生成多样化对抗性提示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel black-box approach for producing a diverse collection of adversarial prompts. 亮点:现有的对抗性提示识别方法往往专注于特定领域,缺乏多样性或需要大量人工标注。为了解决这些限制,我们提出了 Rainbow Teaming,这是一种新颖的黑箱方法,用于生成多样化的对抗性提示集合。 |
Mikayel Samvelyan; Sharath Chandra Raparthy; Andrei Lupu; Eric Hambro; Aram Markosyan; Manish Bhatt; Yuning Mao; Minqi Jiang; Jack Parker-Holder; Jakob Foerster; Tim Rocktäschel; Roberta Raileanu; 米卡耶尔·萨姆维尔扬;沙拉斯·钱德拉·拉帕尔提;安德烈·卢普;埃里克·汉布罗;阿拉姆·马尔科斯扬;马尼什·巴特;毛宇宁;蒋敏奇;杰克·帕克-霍尔德;雅各布·福斯特;蒂姆·罗克塔谢尔;罗伯塔·雷莱亚努; |
174 | PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action PrivacyLens:评估语言模型在实际应用中的隐私规范意识 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents’ actions. 亮点:然而,由于(1)隐私敏感案例的上下文和长尾特性,以及(2)缺乏能够捕捉现实应用场景的评估方法,量化语言模型(LM)的隐私规范意识和 LM 介导通信中出现的隐私风险是具有挑战性的。为了解决这些挑战,我们提出了 PrivacyLens,这是一个新颖的框架,旨在将隐私敏感的种子扩展为富有表现力的插图,并进一步扩展为代理轨迹,从而实现对 LM 代理行为中隐私泄露的多层次评估。 |
Yijia Shao; Tianshi Li; Weiyan Shi; Yanchen Liu; Diyi Yang; |
175 | Out-of-Distribution Detection with A Single Unconditional Diffusion Model 使用单一无条件扩散模型进行分布外检测 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To that end, we introduce our method, Diffusion Paths, (DiffPath) in this work. 亮点:为此,我们在本研究中引入了我们的方法,Diffusion Paths(DiffPath)。 |
Alvin Heng; alexandre thiery; Harold Soh; 阿尔文·亨; 亚历山大·蒂埃里; 哈罗德·索; |
176 | Scaling Transformer Neural Networks for Skillful and Reliable Medium-range Weather Forecasting 缩放变压器神经网络以实现熟练和可靠的中期天气预报 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Here we introduce Stormer, a simple transformer model that achieves state-of-the art performance on weather forecasting with minimal changes to the standard transformer backbone. 亮点:在这里我们介绍 Stormer,一个简单的变换器模型,它在天气预报方面以最小的改动实现了标准变换器骨干的最先进性能。 |
Tung Nguyen; Rohan Shah; Hritik Bansal; Troy Arcomano; Romit Maulik; Rao Kotamarthi; Ian Foster; Sandeep Madireddy; Aditya Grover; |
177 | The Road Less Scheduled 不那么安排的道路 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Existing learning rate schedules that do not require specification of the optimization stopping step $T$ are greatly out-performed by learning rate schedules that depend on $T$. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. 亮点:现有的学习率调度不需要指定优化停止步骤 $T$,远不如依赖于 $T$ 的学习率调度表现优越。我们提出了一种方法,完全避免使用调度,从而不需要这个停止时间,同时在一系列问题中展现出与调度相媲美的最先进性能,这些问题涵盖了从凸问题到大规模深度学习问题。 |
Aaron Defazio; Xingyu Yang; Ahmed Khaled; Konstantin Mishchenko; Harsh Mehta; Ashok Cutkosky; 亚伦·德法齐奥; 杨星宇; 艾哈迈德·哈立德; 康斯坦丁·米申科; 哈什·梅赫塔; 阿肖克·库特科斯基; |
178 | Diversity Is Not All You Need: Training A Robust Cooperative Agent Needs Specialist Partners 多样性不是你所需的一切:训练一个强健的协作代理需要专业合作伙伴 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose a principled method for quantifying both the diversity and specialization of a partner population based on the concept of mutual information. 亮点:我们提出了一种基于互信息概念的原则性方法,用于量化合作伙伴群体的多样性和专业化。 |
Rujikorn Charakorn; Poramate Manoonpong; Nat Dilokthanakul; |
179 | Enhancing Large Language Models Through Adaptive Tokenizers 通过自适应分词器增强大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we propose a simple but effective method to learn tokenizer specifically engineered for seamless integration with LLMs. 亮点:在本研究中,我们提出了一种简单但有效的方法来学习专门为与 LLMs 无缝集成而设计的分词器。 |
Mengyu Zheng; Hanting Chen; Tianyu Guo; Chong Zhu; Binfan Zheng; Chang Xu; Yunhe Wang; 孟宇 郑; 汉廷 陈; 天宇 郭; 崇 朱; 宾凡 郑; 昌 许; 云和 王; |
180 | An Image Is Worth 32 Tokens for Reconstruction and Generation 一幅图像的重建和生成价值 32 个标记 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, these 2D tokenizations face challenges in managing the inherent redundancies present in images, where adjacent regions frequently display similarities. To overcome this issue, we introduce **T**ransformer-based 1-D**i**mensional **Tok**enizer (TiTok), an innovative approach that tokenizes images into 1D latent sequences. 亮点:然而,这些二维标记化在管理图像中固有的冗余方面面临挑战,因为相邻区域经常显示相似性。为了解决这个问题,我们引入了基于**T**ransformer 的 1-D**i**mensional **Tok**enizer(TiTok),这是一种将图像标记为 1D 潜在序列的创新方法。 |
Qihang Yu; Mark Weber; Xueqing Deng; Xiaohui Shen; Daniel Cremers; Liang-Chieh Chen; |
181 | From An Image to A Scene: Learning to Imagine The World from A Million 360° Videos 从图像到场景:从一百万个 360°视频中学习想象世界 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper we introduce 360-1M, a 360° video dataset consisting of 1 million videos, and a process for efficiently finding corresponding frames from diverse viewpoints at scale. 亮点:在本文中,我们介绍了 360-1M,这是一个包含 100 万个视频的 360°视频数据集,以及一种高效地从不同视角大规模查找对应帧的过程。 |
Matthew Wallingford; Anand Bhattad; Aditya Kusupati; Vivek Ramanujan; Matt Deitke; Aniruddha Kembhavi; Roozbeh Mottaghi; Wei-Chiu Ma; Ali Farhadi; 马修·沃林福德;阿南德·巴塔德;阿迪提亚·库苏帕提;维韦克·拉马努詹;马特·德伊特克;阿尼鲁德·肯巴维;鲁兹贝赫·莫塔吉;魏秋·马;阿里·法哈迪; |
182 | Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving Bench2Drive:面向闭环端到端自动驾驶的多能力基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To fulfill the paramount need of comprehensive, realistic, and fair testing environments for Full Self-Driving (FSD), we present Bench2Drive, the first benchmark for evaluating E2E-AD systems’ multiple abilities in a closed-loop manner. 亮点:为了满足对全面、真实和公平的全自动驾驶(FSD)测试环境的迫切需求,我们提出了 Bench2Drive,这是第一个用于以闭环方式评估 E2E-AD 系统多种能力的基准。 |
Xiaosong Jia; Zhenjie Yang; Qifeng Li; Zhiyuan Zhang; Junchi Yan; 贾小松; 杨振杰; 李启峰; 张志远; 闫俊驰; |
183 | TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks TuneTables:可扩展先验数据拟合网络的上下文优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Notably, TabPFN achieves very strong performance on small tabular datasets but is not designed to make predictions for datasets of size larger than 1000. In this work, we overcome these limitations and substantially improve the performance of PFNs via context optimization. 亮点:值得注意的是,TabPFN 在小型表格数据集上表现非常强劲,但并不适合对大小超过 1000 的数据集进行预测。在本研究中,我们克服了这些限制,并通过上下文优化显著提高了 PFNs 的性能。 |
Benjamin Feuer; Robin Schirrmeister; Valeriia Cherepanova; Chinmay Hegde; Frank Hutter; Micah Goldblum; Niv Cohen; Colin White; 本杰明·费尔; 罗宾·希尔迈斯特; 瓦莱里娅·切列帕诺娃; 钦梅·赫格德; 弗兰克·胡特; 米卡·戈德布鲁姆; 尼夫·科恩; 科林·怀特; |
184 | Recurrent Neural Networks: Vanishing and Exploding Gradients Are Not The End of The Story 递归神经网络:消失和爆炸梯度并不是故事的终结 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: The recent success of state-space models (SSMs), a subclass of RNNs, to overcome such difficulties challenges our theoretical understanding. In this paper, we delve into the optimization challenges of RNNs and discover that, as the memory of a network increases, changes in its parameters result in increasingly large output variations, making gradient-based learning highly sensitive, even without exploding gradients. 亮点:状态空间模型(SSMs)作为递归神经网络(RNNs)的一类,在克服这些困难方面的近期成功挑战了我们的理论理解。本文深入探讨了 RNN 的优化挑战,并发现,随着网络记忆的增加,其参数的变化会导致输出变化越来越大,使得基于梯度的学习变得高度敏感,即使没有梯度爆炸。 |
Nicolas Zucchet; Antonio Orvieto; 尼古拉斯·祖凯特; 安东尼奥·奥尔维耶托; |
185 | DARG: Dynamic Evaluation of Large Language Models Via Adaptive Reasoning Graph DARG:通过自适应推理图对大型语言模型进行动态评估 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity. 亮点:在本研究中,我们通过自适应推理图演变(DARG)引入了LLMs的动态评估,以动态扩展当前基准测试的复杂性和多样性。 |
Zhehao Zhang; Jiaao Chen; Diyi Yang; |
186 | Transformers Can Do Arithmetic with The Right Embeddings 变压器可以通过正确的嵌入进行算术运算 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. 亮点:变压器在算术任务上的表现不佳似乎在很大程度上源于它们无法跟踪大量数字中每个数字的确切位置。我们通过为每个数字添加一个嵌入来解决这个问题,该嵌入编码了其相对于数字起始位置的位置。 |
Sean McLeish; Arpit Bansal; Alex Stein; Neel Jain; John Kirchenbauer; Brian Bartoldson; Bhavya Kailkhura; Abhinav Bhatele; Jonas Geiping; Avi Schwarzschild; Tom Goldstein; 肖恩·麦克利什;阿尔皮特·班萨尔;亚历克斯·斯坦;尼尔·贾因;约翰·基兴鲍尔;布赖恩·巴托尔德森;巴夫亚·凯尔库拉;阿比纳夫·巴特尔;乔纳斯·盖平;阿维·施瓦茨希尔德;汤姆·戈德斯坦; |
187 | InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques InterpBench:用于评估机制可解释性技术的半合成变换器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This work presents InterpBench, a collection of semi-synthetic yet realistic transformers with known circuits for evaluating these techniques. 亮点:本工作提出了 InterpBench,这是一个半合成但现实的变换器集合,具有已知电路,用于评估这些技术。 |
Rohan Gupta; Iván Arcuschin Moreno; Thomas Kwa; Adrià Garriga-Alonso; 罗汉·古普塔;伊万·阿库斯钦·莫雷诺;托马斯·夸;阿德里亚·加里加-阿隆索; |
188 | FasterDiT: Towards Faster Diffusion Transformers Training Without Architecture Modification FasterDiT:朝着更快的扩散变换器训练,无需架构修改 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we aim to accelerate DiT training without any architectural modification. 亮点:在本文中,我们旨在加速 DiT 训练,而无需进行任何架构修改。 |
Jingfeng Yao; Cheng Wang; Wenyu Liu; Xinggang Wang; 姚京峰; 王成; 刘文宇; 王兴刚; |
189 | OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments OSWorld:在真实计算机环境中对开放式任务的多模态代理进行基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability. To address this issue, we introduce OSWorld, the first-of-its-kind scalable, real computer environment for multimodal agents, supporting task setup, execution-based evaluation, and interactive learning across various operating systems such as Ubuntu, Windows, and macOS. 亮点:然而,现有基准要么缺乏交互环境,要么仅限于特定应用程序或领域的环境,未能反映现实世界计算机使用的多样性和复杂性,从而限制了任务范围和智能体的可扩展性。为了解决这个问题,我们引入了 OSWorld,这是首个可扩展的真实计算机环境,适用于多模态智能体,支持任务设置、基于执行的评估以及在不同操作系统(如 Ubuntu、Windows 和 macOS)上的交互学习。 |
Tianbao Xie; Danyang Zhang; Jixuan Chen; Xiaochuan Li; Siheng Zhao; Ruisheng Cao; Jing Hua Toh; Zhoujun Cheng; Dongchan Shin; Fangyu Lei; Yitao Liu; Yiheng Xu; Shuyan Zhou; Silvio Savarese; Caiming Xiong; Victor Zhong; Tao Yu; 谢天宝; 张丹阳; 陈继轩; 李晓川; 赵思恒; 曹瑞生; 托华景; 程周俊; 辛东灿; 雷方宇; 刘逸涛; 许亦恒; 周书燕; 西尔维奥·萨瓦雷斯; 熊采鸣; 钟维克; 余涛; |
190 | Many-Shot In-Context Learning 多次上下文学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: While promising, many-shot ICL can be bottlenecked by the available amount of human-generated outputs. To mitigate this limitation, we explore two new settings: (1) Reinforced ICL that uses model-generated chain-of-thought rationales in place of human rationales, and (2) Unsupervised ICL where we remove rationales from the prompt altogether, and prompts the model only with domain-specific inputs. 亮点:尽管前景良好,但多样本的 ICL 可能会受到可用的人类生成输出量的瓶颈。为了减轻这一限制,我们探索了两种新设置:(1)强化 ICL,使用模型生成的思维链合理性替代人类合理性;(2)无监督 ICL,在这种情况下,我们完全删除提示中的合理性,仅用领域特定的输入来提示模型。 |
Rishabh Agarwal; Avi Singh; Lei Zhang; Bernd Bohnet; Luis Rosias; Stephanie Chan; Biao Zhang; Ankesh Anand; Zaheer Abbas; Azade Nova; John Co-Reyes; Eric Chu; Feryal Behbahani; Aleksandra Faust; Hugo Larochelle; |
191 | ReFT: Representation Finetuning for Language Models ReFT: 表示微调用于语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods. 亮点:然而,许多先前的可解释性研究表明,表示编码了丰富的语义信息,这表明编辑表示可能是一个更强大的替代方案。我们通过开发一系列表示微调(ReFT)方法来追求这一假设。 |
Zhengxuan Wu; Aryaman Arora; Zheng Wang; Atticus Geiger; Dan Jurafsky; Christopher D Manning; Christopher Potts; 郑轩吴;阿里亚曼·阿罗拉;郑王;阿提卡斯·盖格;丹·朱拉夫斯基;克里斯托弗·D·曼宁;克里斯托弗·波茨; |
192 | FLAME : Factuality-Aware Alignment for Large Language Models FLAME : 事实意识对齐的大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL). 亮点:在本文中,我们研究如何使LLM对齐过程更加真实,首先识别导致对齐步骤中幻觉的因素:监督微调(SFT)和强化学习(RL)。 |
Sheng-Chieh Lin; Luyu Gao; Barlas Oguz; Wenhan Xiong; Jimmy Lin; Scott Yih; Xilun Chen; 林盛杰;高璐宇;巴拉斯·奥古兹;熊文汉;吉米·林;Scott Yih;陈西伦; |
193 | CorDA: Context-Oriented Decomposition Adaptation of Large Language Models CorDA: 大型语言模型的上下文导向分解适应 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose CorDA, a Context-oriented Decomposition Adaptation method that builds learnable adapters from weight decomposition oriented by the context of downstream task or world knowledge. 亮点:在本文中,我们提出了 CorDA,一种上下文导向的分解适应方法,该方法基于下游任务或世界知识的上下文构建可学习的适配器。 |
Yibo Yang; Xiaojie Li; Zhongzhu Zhou; Shuaiwen Song; Jianlong Wu; Liqiang Nie; Bernard Ghanem; |
194 | Continual Audio-Visual Sound Separation 持续的音频-视觉声音分离 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce a novel continual audio-visual sound separation task, aiming to continuously separate sound sources for new classes while preserving performance on previously learned classes, with the aid of visual guidance. 亮点:在本文中,我们介绍了一项新颖的持续音频-视觉声音分离任务,旨在在保留对先前学习的类别的性能的同时,持续分离新类别的声音源,并借助视觉指导。 |
Weiguo Pian; Yiyang Nan; Shijian Deng; Shentong Mo; Yunhui Guo; Yapeng Tian; |
195 | DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving DART-Math:基于难度的拒绝调优用于数学问题解决 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Hypothesizing that difficult queries are crucial to learn complex reasoning, we propose Difficulty-Aware Rejection Tuning (DART), a method that allocates difficult queries more trials during the synthesis phase, enabling more extensive training on difficult samples. 亮点:假设困难查询对于学习复杂推理至关重要,我们提出了困难感知拒绝调优(DART),这是一种在合成阶段为困难查询分配更多试验的方法,从而使对困难样本的训练更加广泛。 |
Yuxuan Tong; Xiwen Zhang; Rui Wang; Ruidong Wu; Junxian He; 佟宇轩; 张希文; 王锐; 吴瑞东; 何俊贤; |
196 | MotionBooth: Motion-Aware Customized Text-to-Video Generation MotionBooth: 动态感知定制文本到视频生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. 亮点:在本研究中,我们提出了 MotionBooth,这是一个创新框架,旨在以精确控制对象和相机运动的方式为定制主题进行动画制作。 |
Jianzong Wu; Xiangtai Li; Yanhong Zeng; Jiangning Zhang; Qianyu Zhou; Yining Li; Yunhai Tong; Kai Chen; 吴建宗; 李向泰; 曾艳红; 张江宁; 周千宇; 李怡宁; 童云海; 陈凯; |
197 | LLM Dataset Inference: Detect Datasets, Not Strings LLM 数据集推断:检测数据集,而非字符串 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Instead, we propose a new *dataset inference* method to accurately identify the datasets used to train large language models. 亮点:相反,我们提出了一种新的 *dataset inference* 方法,以准确识别用于训练大型语言模型的数据集。 |
Pratyush Maini; Hengrui Jia; Nicolas Papernot; Adam Dziedzic; |
198 | BAKU: An Efficient Transformer for Multi-Task Policy Learning BAKU:一种高效的变换器用于多任务策略学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. 亮点:在本研究中,我们提出了 BAKU,一种简单的变换器架构,能够高效学习多任务机器人策略。 |
Siddhant Haldar; Zhuoran Peng; Lerrel Pinto; |
199 | Revisiting Self-Supervised Heterogeneous Graph Learning from Spectral Clustering Perspective 从谱聚类的角度重新审视自监督异构图学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, while existing SHGL methods share a similar essential with clustering approaches, they encounter two significant limitations: (i) noise in graph structures is often introduced during the message-passing process to weaken node representations, and (ii) cluster-level information may be inadequately captured and leveraged, diminishing the performance in downstream tasks. In this paper, we address these limitations by theoretically revisiting SHGL from the spectral clustering perspective and introducing a novel framework enhanced by rank and dual consistency constraints. 亮点:然而,尽管现有的 SHGL 方法与聚类方法在本质上相似,但它们面临两个显著的限制:(i)在消息传递过程中,图结构中的噪声常常会引入,从而削弱节点表示;(ii)集群级别的信息可能未被充分捕获和利用,降低了下游任务的性能。本文通过从谱聚类的角度理论性地重新审视 SHGL,并引入一个通过秩和双重一致性约束增强的新框架,来解决这些限制。 |
YUJIE MO; Zhihe Lu; Runpeng Yu; Xiaofeng Zhu; Xinchao Wang; |
200 | Reprogramming Pretrained Target-Specific Diffusion Models for Dual-Target Drug Design 重新编程预训练的针对特定目标的扩散模型用于双靶药物设计 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose to design dual-target drugs with diffusion models that are trained on single-target protein-ligand complex pairs. 亮点:我们提议设计使用在单靶蛋白-配体复合物对上训练的扩散模型的双靶药物。 |
Xiangxin Zhou; Jiaqi Guan; Yijia Zhang; Xingang Peng; Liang Wang; Jianzhu Ma; 周向新; 关佳琪; 张怡佳; 彭兴刚; 王亮; 马建柱; |
201 | Make-it-Real: Unleashing Large Multimodal Model for Painting 3D Objects with Realistic Materials Make-it-Real: 解放大型多模态模型以绘制具有真实材料的 3D 对象 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs), particularly GPT-4V, to present a novel approach, Make-it-Real: 1) We demonstrate that GPT-4V can effectively recognize and describe materials, allowing the construction of a detailed material library. 亮点:在本文中,我们利用多模态大型语言模型(MLLMs)的进展,特别是 GPT-4V,提出了一种新方法,Make-it-Real:1)我们证明了 GPT-4V 能够有效识别和描述材料,从而构建一个详细的材料库。 |
Ye Fang; Zeyi Sun; Tong Wu; Jiaqi Wang; Ziwei Liu; Gordon Wetzstein; Dahua Lin; 叶芳; 孙泽毅; 吴彤; 王佳琪; 刘子维; 戈登·韦茨斯坦; 林大华; |
202 | MADiff: Offline Multi-agent Learning with Diffusion Models MADiff:基于扩散模型的离线多智能体学习 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, despite the effectiveness shown for single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordination by independently modeling each agent’s trajectories. In this paper, we propose MADiff, a novel generative multi-agent learning framework to tackle this problem. 亮点:然而,尽管单一代理学习显示出有效性,但在多代理问题中,决策者如何运作仍然不清楚,因为代理在没有良好协调的情况下几乎无法完成团队合作,独立建模每个代理的轨迹。在本文中,我们提出了 MADiff,一种新颖的生成多代理学习框架,以解决这个问题。 |
Zhengbang Zhu; Minghuan Liu; Liyuan Mao; Bingyi Kang; Minkai Xu; Yong Yu; Stefano Ermon; Weinan Zhang; 郑邦朱; 刘明焕; 毛丽媛; 康冰怡; 许敏凯; 余勇; 斯特法诺·埃尔蒙; 张伟南; |
203 | Text-space Graph Foundation Models: Comprehensive Benchmarks and New Insights 文本空间图基础模型:全面基准和新见解 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Despite the great potential of these text-space GFMs, current research in this field is hampered by two problems. First, the absence of a comprehensive benchmark with unified problem settings hinders a clear understanding of the comparative effectiveness and practical value of different text-space GFMs. Second, there is a lack of sufficient datasets to thoroughly explore the methods’ full potential and verify their effectiveness across diverse settings. To address these issues, we conduct a comprehensive benchmark providing novel text-space datasets and comprehensive evaluation under unified problem settings. 亮点:尽管这些文本空间 GFMs 具有巨大的潜力,但该领域当前的研究受到两个问题的阻碍。首先,缺乏统一问题设置的全面基准妨碍了对不同文本空间 GFMs 的比较有效性和实际价值的清晰理解。其次,缺乏足够的数据集来彻底探索这些方法的全部潜力,并验证它们在不同设置下的有效性。为了解决这些问题,我们进行了一项全面的基准测试,提供了新颖的文本空间数据集和在统一问题设置下的全面评估。 |
Zhikai Chen; Haitao Mao; Jingzhe Liu; Yu Song; Bingheng Li; Wei Jin; Bahare Fatemi; Anton Tsitsulin; Bryan Perozzi; Hui Liu; Jiliang Tang; |
204 | XLSTM: Extended Long Short-Term Memory XLSTM: 扩展长短期记忆 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. 亮点:首先,我们引入了具有适当归一化和稳定化技术的指数门控。其次,我们修改了 LSTM 内存结构,得到:(i) sLSTM,具有标量内存、标量更新和新的内存混合,(ii) mLSTM,具有矩阵内存和协方差更新规则,完全可并行化。将这些 LSTM 扩展集成到残差块骨干网络中,产生 xLSTM 块,然后将其残差堆叠成 xLSTM 架构。 |
Maximilian Beck; Korbinian Pöppel; Markus Spanring; Andreas Auer; Oleksandra Prudnikova; Michael Kopp; Günter Klambauer; Johannes Brandstetter; Sepp Hochreiter; 马克西米连·贝克; 科尔比尼安·波佩尔; 马库斯·斯潘林; 安德烈亚斯·奥尔; 奥列克桑德拉·普鲁德尼科娃; 迈克尔·科普; 吕特·克兰鲍尔; 约翰内斯·布兰德斯特特; 塞普·霍赫雷特尔; |
205 | Learning to Cooperate with Humans Using Generative Agents 使用生成代理学习与人类合作 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. 亮点:通过从潜在空间中采样,我们可以使用生成模型来生成不同的合作伙伴,以训练合作代理。 |
Yancheng Liang; Daphne Chen; Abhishek Gupta; Simon Du; Natasha Jaques; |
206 | FinBen: An Holistic Financial Benchmark for Large Language Models FinBen:大型语言模型的整体金融基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, and decision-making. 亮点:在本文中,我们介绍了 FinBen,这是第一个广泛的开源评估基准,包括覆盖 24 个金融任务的 36 个数据集,涵盖七个关键方面:信息提取(IE)、文本分析、问答(QA)、文本生成、风险管理、预测和决策。 |
Qianqian Xie; Weiguang Han; Zhengyu Chen; Ruoyu Xiang; Xiao Zhang; Yueru He; Mengxi Xiao; Dong Li; Yongfu Dai; Duanyu Feng; Yijing Xu; Haoqiang Kang; Ziyan Kuang; Chenhan Yuan; Kailai Yang; Zheheng Luo; Tianlin Zhang; Zhiwei Liu; GUOJUN XIONG; Zhiyang Deng; Yuechen Jiang; Zhiyuan Yao; Haohang Li; Yangyang Yu; Gang Hu; Huang Jiajia; Xiaoyang Liu; Alejandro Lopez-Lira; Benyou Wang; Yanzhao Lai; Hao Wang; Min Peng; Sophia Ananiadou; Jimin Huang; 谢倩倩; 韩伟光; 陈正宇; 向若宇; 张晓; 何月如; 肖梦溪; 李东; 戴永福; 冯端宇; 许怡静; 康浩强; 库阳; 袁晨汉; 杨凯来; 罗哲恒; 张天林; 刘志伟; 熊国军; 邓志扬; 姜悦辰; 姚志远; 李浩航; 于阳阳; 胡刚; 黄佳佳; 刘晓阳; 亚历杭德罗·洛佩斯-利拉; 王本友; 赖燕钊; 王浩; 彭敏; 索非亚·阿纳尼亚杜; 黄季敏; |
207 | Catastrophic Goodhart: Regularizing RLHF with KL Divergence Does Not Mitigate Heavy-tailed Reward Misspecification 灾难性 Goodhart:使用 KL 散度对 RLHF 进行正则化并不能减轻重尾奖励的错误指定 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, if error is heavy-tailed, some policies obtain arbitrarily high reward despite achieving no more utility than the base model—a phenomenon we call catastrophic Goodhart. We adapt a discrete optimization method developed for adversarial attacks to measure the tails of open-source reward models, finding that they are consistent with light-tailed error. 亮点:然而,如果误差是重尾的,某些策略尽管获得的效用与基础模型没有更多,但仍能获得任意高的奖励——这一现象我们称之为灾难性 Goodhart。我们调整了一种为对抗性攻击开发的离散优化方法,以测量开源奖励模型的尾部,发现它们与轻尾误差是一致的。 |
Thomas Kwa; Adrià Garriga-Alonso; 托马斯·夸;阿德里亚·加里加-阿隆索; |
208 | G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering G-Retriever:用于文本图理解和问答的检索增强生成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. 亮点:相较之下,我们开发了一个灵活的问答框架,针对现实世界的文本图,适用于多个应用,包括场景图理解、常识推理和知识图推理。 |
Xiaoxin He; Yijun Tian; Yifei Sun; Nitesh Chawla; Thomas Laurent; Yann LeCun; Xavier Bresson; Bryan Hooi; |
209 | Harmony4D: A Video Dataset for In-The-Wild Close Human Interactions Harmony4D:一个用于真实环境中近距离人际互动的视频数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose a novel markerless algorithm to track 3D human poses in severe occlusion and close interaction to obtain our annotations with minimal manual intervention. 亮点:我们提出了一种新颖的无标记算法,用于在严重遮挡和紧密交互中跟踪 3D 人体姿态,以最小的人工干预获取我们的注释。 |
Rawal Khirodkar; Jyun-Ting Song; Jinkun Cao; Zhengyi Luo; Kris Kitani; |
210 | Learn More, But Bother Less: Parameter Efficient Continual Learning 了解更多,但少打扰:参数高效的持续学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a novel parameter-efficient approach for continual learning in LLMs, which empirically investigates knowledge transfer from previously learned tasks to new tasks through low-rank matrix parameters, enhancing the learning of new tasks without significant interference. 亮点:在本文中,我们提出了一种新颖的参数高效的持续学习方法,实证研究了通过低秩矩阵参数将先前学习的任务的知识转移到新任务中,从而增强新任务的学习而不产生显著干扰。 |
Fuli Qiao; Mehrdad Mahdavi; |
211 | Aligning to Thousands of Varying Preferences Via System Message Generalization 通过系统消息泛化对成千上万的不同偏好进行对齐 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: A major challenge in adopting a more individualized approach to LLM alignment is scalability, as it involves repeatedly acquiring preference data and training new reward models and LLMs for each individual’s preferences. To address these challenges, we propose a new paradigm where users specify what they value most within the system messages steering the LLM’s generation behavior to better align with the user’s intentions. 亮点:采用更个性化的LLM对齐方法的一个主要挑战是可扩展性,因为这涉及到为每个个体的偏好反复获取偏好数据和训练新的奖励模型和LLMs。为了解决这些挑战,我们提出了一种新范式,用户在系统消息中指定他们最看重的内容,以引导LLM的生成行为,更好地与用户的意图对齐。 |
Seongyun Lee; Sue Park; Seungone Kim; Minjoon Seo; |
212 | InterDreamer: Less Supervision for More Generalizable Text-Driven 3D Human-Object Interaction Synthesis InterDreamer:更少的监督以实现更具普适性的文本驱动 3D 人类-物体交互合成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. 亮点:本文主动展示了在没有直接对文本-交互对数据进行训练的情况下生成人与物体交互的潜力。我们实现这一目标的关键见解是交互语义和动态可以解耦。 |
ziyin wang; Sirui Xu; Yu-Xiong Wang; Liangyan Gui; |
213 | No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO 没有代表性,就没有信任:连接 PPO 中的代表性、崩溃和信任问题 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we empirically study representation dynamics in Proximal Policy Optimization (PPO) on the Atari and MuJoCo environments, revealing that PPO agents are also affected by feature rank deterioration and loss of plasticity. 亮点:在本研究中,我们实证研究了近端策略优化(PPO)在 Atari 和 MuJoCo 环境中的表征动态,揭示了 PPO 代理也受到特征排名恶化和可塑性丧失的影响。 |
Skander Moalla; Andrea Miele; Razvan Pascanu; Caglar Gulcehre; 斯坎德·莫阿拉;安德烈亚·米埃莱;拉兹万·帕斯卡努;卡格拉尔·古尔切赫雷; |
214 | RoPINN: Region Optimized Physics-Informed Neural Networks RoPINN: 区域优化物理信息神经网络 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, since PDEs are usually defined on continuous domains, solely optimizing models on scattered points may be insufficient to obtain an accurate solution for the whole domain. To mitigate this inherent deficiency of the default scatter-point optimization, this paper proposes and theoretically studies a new training paradigm as region optimization. 亮点:然而,由于偏微分方程通常是在连续域上定义的,仅仅在离散点上优化模型可能不足以获得整个域的准确解。为了缓解默认散点优化的这一固有缺陷,本文提出并理论研究了一种新的训练范式,即区域优化。 |
Haixu Wu; Huakun Luo; Yuezhou Ma; Jianmin Wang; Mingsheng Long; 吴海旭; 罗华坤; 马跃洲; 王建民; 龙明生; |
215 | LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low Resource and Extinct Languages LINGOLY:低资源和灭绝语言的奥林匹克级语言推理难题基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. 亮点:在本文中,我们提出了 LingOly 基准,这是一个用于大型语言模型高级推理能力的新基准。 |
Andrew M. Bean; Simeon Hellsten; Harry Mayne; Jabez Magomere; Ethan Chi; Ryan Chi; Scott Hale; Hannah Rose Kirk; 安德鲁·M·比恩;西门·赫尔斯滕;哈里·梅因;贾贝兹·马戈梅雷;伊桑·池;瑞安·池;斯科特·黑尔;汉娜·罗斯·柯克; |
216 | Normalization and Effective Learning Rates in Reinforcement Learning 强化学习中的归一化和有效学习率 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose to make the learning rate schedule explicit with a simple re-parameterization which we call Normalize-and-Project (NaP), which couples the insertion of normalization layers with weight projection, ensuring that the effective learning rate remains constant throughout training. 亮点:我们提议通过一种简单的重新参数化方法使学习率调度变得明确,我们称之为归一化与投影(NaP),该方法将归一化层的插入与权重投影相结合,确保有效学习率在整个训练过程中保持不变。 |
Clare Lyle; Zeyu Zheng; Khimya Khetarpal; James Martens; Hado van Hasselt; Razvan Pascanu; Will Dabney; |
217 | Faster Neighborhood Attention: Reducing The O(n^2) Cost of Self Attention at The Threadblock Level 更快的邻域注意力:在线程块级别降低自注意力的 O(n^2) 成本 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we aim to massively improve upon existing infrastructure by providing two new methods for implementing neighborhood attention. 亮点:在本研究中,我们旨在通过提供两种新的邻域注意力实现方法,显著改善现有基础设施。 |
Ali Hassani; Wen-Mei Hwu; Humphrey Shi; 阿里·哈萨尼;温梅·胡;汉弗莱·施; |
218 | Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models 对话头:理解变换器语言模型中的层间通信 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We show that it is possible to manipulate the internal model representations as well as edit model weights, based on the mechanism we discover, in order to significantly improve performance on our synthetic Laundry List task, which requires recall from a list, often improving task accuracy by over 20\%. 亮点:我们展示了基于我们发现的机制,可以操控内部模型表示以及编辑模型权重,从而显著提高我们合成的洗衣清单任务的性能,该任务需要从列表中回忆,通常可以将任务准确率提高超过 20%。 |
Jack Merullo; Carsten Eickhoff; Ellie Pavlick; 杰克·梅鲁洛; 卡斯滕·艾克霍夫; 艾莉·帕夫利克; |
219 | Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing Vitron:一种统一的像素级视觉LLM用于理解、生成、分割、编辑 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper we present Vitron, a universal pixel-level vision LLM designed for comprehensive understanding, generating, segmenting, and editing of both static images and dynamic videos. 亮点:在本文中,我们介绍了 Vitron,一种通用的像素级视觉LLM,旨在对静态图像和动态视频进行全面理解、生成、分割和编辑。 |
Hao Fei; Shengqiong Wu; Hanwang Zhang; Tat-Seng Chua; Shuicheng Yan; 郝飞;吴胜琼;张汉旺;蔡达声;闫水城; |
220 | Code Agents Are State of The Art Software Testers 代码代理是最先进的软件测试工具 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To this end, we propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth patches, and golden tests. 亮点:为此,我们提出了一种基于流行 GitHub 仓库的新基准,包含真实世界的问题、真实补丁和黄金测试。 |
Niels Mündler; Mark Müller; Jingxuan He; Martin Vechev; |
221 | Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion 扩散强迫:下一个标记预测与全序列扩散相结合 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. 亮点:本文提出了扩散强制(Diffusion Forcing),一种新的训练范式,其中扩散模型被训练以去噪一组具有独立每个标记噪声水平的标记。 |
Boyuan Chen; Diego Martí Monsó; Yilun Du; Max Simchowitz; Russ Tedrake; Vincent Sitzmann; |
222 | Optimal Multiclass U-Calibration Error and Beyond 最优多类 U 校准误差及其扩展 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We consider the problem of online multiclass U-calibration, where a forecaster aims to make sequential distributional predictions over $K$ classes with low U-calibration error, that is, low regret with respect to all bounded proper losses simultaneously. 亮点:我们考虑在线多类 U 校准的问题,其中预测者旨在对$K$个类别进行顺序分布预测,以实现低 U 校准误差,即相对于所有有界适当损失同时实现低遗憾。 |
Haipeng Luo; Spandan Senapati; Vatsal Sharan; |
223 | BertaQA: How Much Do Language Models Know About Local Culture? BertaQA:语言模型对地方文化了解多少? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This raises the question of how well these models perform on topics relevant to other cultures, whose presence on the web is not that prominent. To address this gap, we introduce BertaQA, a multiple-choice trivia dataset that is parallel in English and Basque. 亮点:这引发了一个问题,即这些模型在与其他文化相关的主题上的表现如何,而这些文化在网络上的存在并不显著。为了解决这个问题,我们引入了 BertaQA,这是一个英语和巴斯克语平行的多项选择问答数据集。 |
Julen Etxaniz; Gorka Azkune; Aitor Soroa; Oier Lacalle; Mikel Artetxe; |
224 | Mr.Bean: A Comprehensive Meta-Reasoning Benchmark for Analyzing Large Language Models Mr.Bean:一个全面的元推理基准,用于分析大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To this end, we present a process-based benchmark Mr. Bean that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. 亮点:为此,我们提出了一个基于过程的基准测试 Mr. Bean,该测试要求具备元推理技能,在此过程中,语言模型需要定位和分析自动生成的推理步骤中的潜在错误。 |
Zhongshen Zeng; Yinhong Liu; Yingjia Wan; Jingyao Li; Pengguang Chen; Jianbo Dai; Yuxuan Yao; Rongwu Xu; Zehan Qi; Wanru Zhao; Linling Shen; Jianqiao Lu; Haochen Tan; Yukang Chen; Hao Zhang; Zhan Shi; Bailin Wang; Zhijiang Guo; Jiaya Jia; 曾中深; 刘寅宏; 万英佳; 李静瑶; 陈鹏光; 戴建博; 姚宇轩; 许荣武; 齐泽涵; 赵婉如; 沈琳玲; 陆建桥; 谭浩辰; 陈宇康; 张浩; 石展; 王柏林; 郭志江; 贾佳雅; |
225 | On The Inductive Bias of Stacking Towards Improving Reasoning 关于堆叠的归纳偏差以改善推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Although efficient for training, the model biases induced by such growing approaches is largely unexplored. In this work, we examine this fundamental aspect of gradual stacking, going beyond its efficiency benefits. 亮点:尽管这种逐步增长的方法在训练中效率高,但由此产生的模型偏差在很大程度上尚未被探索。在本研究中,我们考察了逐步堆叠的这一基本方面,超越了其效率优势。 |
Nikunj Saunshi; Stefani Karp; Shankar Krishnan; Sobhan Miryoosefi; Sashank Jakkam Reddi; Sanjiv Kumar; |
226 | Near-Minimax-Optimal Distributional Reinforcement Learning with A Generative Model 近最小极大最优分布式强化学习与生成模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions in the generative model regime (up to logarithmic factors), the first result of this kind for any distributional RL algorithm. 亮点:我们提出了一种基于模型的分布式强化学习(RL)新算法,并证明它在生成模型范畴内对于近似回报分布是最小最大最优的(最多到对数因子),这是任何分布式 RL 算法的首个此类结果。 |
Mark Rowland; Kevin Li; Remi Munos; Clare Lyle; Yunhao Tang; Will Dabney; 马可·罗兰; 凯文·李; 雷米·穆诺; 克莱尔·莱尔; 云豪·唐; 威尔·达布尼; |
227 | Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning 从人类反馈中个性化强化学习与变分偏好学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To address the need for pluralistic alignment, we develop a novel class of multi-modal RLHF methods. 亮点:为了满足多元对齐的需求,我们开发了一类新颖的多模态 RLHF 方法。 |
Sriyash Poddar; Yanming Wan; Hamish Ivison; Abhishek Gupta; Natasha Jaques; |
228 | TAPVid-3D: A Benchmark for Tracking Any Point in 3D TAPVid-3D:一个用于跟踪 3D 中任意点的基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). 亮点:我们引入了一个新的基准,TAPVid-3D,用于评估长距离跟踪三维中的任意点(TAP-3D)任务。 |
Skanda Koppula; Ignacio Rocco; Yi Yang; joseph heyward; Joao Carreira; Andrew Zisserman; Gabriel Brostow; Carl Doersch; |
229 | Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks 针对语言模型抵御越狱攻击的稳健提示优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO), to create robust system-level defenses. 亮点:虽然提出了一些防御措施,但它们尚未适应新提出的攻击和更具挑战性的威胁模型。为了解决这个问题,我们提出了一种基于优化的目标,以防御LLMs免受越狱攻击,并提出了一种算法,稳健提示优化(RPO),以创建稳健的系统级防御。 |
Andy Zhou; Bo Li; Haohan Wang; 安迪·周; 博·李; 郝汉·王; |
230 | Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Kaleido Diffusion:通过自回归潜在建模改进条件扩散模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we explore a variety of discrete latent representations, including textual descriptions, detection bounding boxes, object blobs, and visual tokens. 亮点:在本文中,我们探讨了多种离散潜在表示,包括文本描述、检测边界框、对象块和视觉标记。 |
Jiatao Gu; Ying Shen; Shuangfei Zhai; Yizhe Zhang; Navdeep Jaitly; Joshua Susskind; 贾涛; 沈颖; 翟双飞; 张逸哲; 纳夫迪普·贾特利; 约书亚·萨斯金德; |
231 | A General Protocol to Probe Large Vision Models for 3D Physical Understanding 一种通用协议用于探测大型视觉模型的三维物理理解 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Our objective in this paper is to probe large vision models to determine to what extent they `understand’ different physical properties of the 3D scene depicted in an image. 亮点:我们在本文中的目标是探讨大型视觉模型,以确定它们在多大程度上“理解”图像中描绘的三维场景的不同物理属性。 |
Guanqi Zhan; Chuanxia Zheng; Weidi Xie; Andrew Zisserman; 关奇 战; 川夏 郑; 伟迪 谢; 安德鲁 兹瑟曼; |
232 | Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting 架构师:使用层次化 2D 修复生成生动的互动 3D 场景 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing $\textit{Architect}$, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. 亮点:当前的方法,包括手动设计、过程生成、基于扩散的场景生成和大型语言模型(LLM)引导的场景设计,受到诸如过度人力、依赖预定义规则或训练数据集以及有限的 3D 空间推理能力等限制。由于预训练的 2D 图像生成模型比LLMs更好地捕捉场景和物体配置,我们通过引入$\textit{Architect}$,一个利用基于扩散的 2D 图像修复创建复杂和真实的 3D 具身环境的生成框架,来解决这些挑战。 |
Yian Wang; Xiaowen Qiu; Jiageng Liu; Zhehuan Chen; Jiting Cai; Yufei Wang; Tsun-Hsuan Johnson Wang; Zhou Xian; Chuang Gan; |
233 | ActionAtlas: A VideoQA Benchmark for Fine-grained Action Recognition ActionAtlas:一个用于细粒度动作识别的视频问答基准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Our world is full of varied actions and moves in specialized fields that we, as humans, seek to identify and learn about. To evaluate the effectiveness of multi-modal models in helping us recognize such fine-grained actions, we introduce ActionAtlas, a video question answering (VideoQA) benchmark on fine-grained action recognition with short videos across various sports. 亮点:我们的世界充满了各种各样的行动和在专业领域中的举动,我们作为人类,寻求识别和了解这些行动。为了评估多模态模型在帮助我们识别这些细粒度动作方面的有效性,我们推出了 ActionAtlas,这是一个关于细粒度动作识别的短视频问答(VideoQA)基准,涵盖了各种体育项目。 |
Mohammadreza (Reza) Salehi; Jae Sung Park; Aditya Kusupati; Ranjay Krishna; Yejin Choi; Hannaneh Hajishirzi; Ali Farhadi; |
234 | A Practitioner’s Guide to Real-World Continual Multimodal Pretraining 现实世界持续多模态预训练的实践者指南 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: How to best update foundation models, in cases beyond small edits but not warranting re-pretraining, remains unclear. This work aims to provide extensive guidance on effective continual model updates in such scenarios. 亮点:在超出小幅编辑但不需要重新预训练的情况下,如何最佳更新基础模型仍不清楚。本研究旨在提供关于在此类情况下有效持续模型更新的广泛指导。 |
Karsten Roth; Vishaal Udandarao; Sebastian Dziadzio; Ameya Prabhu; Mehdi Cherti; Oriol Vinyals; Olivier Henaff; Samuel Albanie; Matthias Bethge; Zeynep Akata; 卡斯滕·罗特;维沙尔·乌丹达拉奥;塞巴斯蒂安·季亚兹迪奥;阿梅亚·普拉布;梅赫迪·切尔提;奥里奥尔·维尼亚尔斯;奥利维耶·亨纳夫;塞缪尔·阿尔巴尼;马蒂亚斯·贝斯格;泽伊内普·阿卡塔; |
235 | Normalization Layer Per-Example Gradients Are Sufficient to Predict Gradient Noise Scale in Transformers 归一化层的每个示例梯度足以预测变换器中的梯度噪声尺度 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Observing the tensor contractions required to compute them, we propose a method with minimal FLOPs in 3D or greater tensor regimes by simultaneously computing the norms while computing the parameter gradients. 亮点:观察计算所需的张量收缩,我们提出了一种在三维或更高张量范围内具有最小 FLOPs 的方法,通过在计算参数梯度的同时计算范数。 |
Gavia Gray; aman tiwari; Shane Bergsma; Joel Hestness; |
236 | Iteratively Refined Behavior Regularization for Offline Reinforcement Learning 离线强化学习的迭代精炼行为正则化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration. 亮点:在本文中,我们提出了一种新的算法,显著增强了基于保守策略迭代的行为正则化。 |
Yi Ma; Jianye Hao; Xiaohan Hu; YAN ZHENG; Chenjun Xiao; |
237 | Offline Behavior Distillation 离线行为蒸馏 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy. 亮点:我们提出了两个简单的 OBD 目标,DBC 和 PBC,它们通过在蒸馏数据上训练的策略与离线数据或近专家策略之间的决策差异来衡量蒸馏性能。 |
Shiye Lei; Sen Zhang; Dacheng Tao; |
238 | Motion Graph Unleashed: A Novel Approach to Video Prediction 动作图解放:一种新颖的视频预测方法 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce motion graph, a novel approach to address the video prediction problem, i.e., predicting future video frames from limited past data. 亮点:我们介绍了一种新的方法——运动图,用于解决视频预测问题,即从有限的过去数据中预测未来的视频帧。 |
Yiqi Zhong; Luming Liang; Bohan Tang; Ilya Zharkov; Ulrich Neumann; |
239 | T2V-Turbo: Breaking The Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback T2V-Turbo:通过混合奖励反馈打破视频一致性模型的质量瓶颈 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve **both fast and high-quality video generation**. 亮点:在本研究中,我们旨在打破视频一致性模型(VCM)的质量瓶颈,以实现**快速和高质量的视频生成**。 |
Jiachen Li; Weixi Feng; Tsu-Jui Fu; Xinyi Wang; S Basu; Wenhu Chen; William Yang Wang; 李佳晨; 冯维熙; 傅子瑞; 王欣怡; 巴苏; 陈文虎; 王威廉·杨; |
240 | CV-VAE: A Compatible Video VAE for Latent Generative Video Models CV-VAE:一种兼容的视频变分自编码器,用于潜在生成视频模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization. To address this issue, we propose a method for training a video VAE of latent video models, namely CV-VAE, whose latent space is compatible with that of a given image VAE, e.g., image VAE of Stable Diffusion (SD). 亮点:此外,由于当前基于扩散的方法通常使用预训练的文本到图像(T2I)模型实现,直接训练视频变分自编码器(VAE)而不考虑与现有 T2I 模型的兼容性,将导致它们之间的潜在空间差距,即使以 T2I 模型作为初始化,弥补这一差距也需要巨大的计算资源。为了解决这个问题,我们提出了一种训练潜在视频模型的视频 VAE 的方法,即 CV-VAE,其潜在空间与给定图像 VAE 的潜在空间兼容,例如,Stable Diffusion(SD)的图像 VAE。 |
Sijie Zhao; Yong Zhang; Xiaodong Cun; Shaoshu Yang; Muyao Niu; Xiaoyu Li; Wenbo HU; Ying Shan; |
241 | Neural Network Learns Low-dimensional Polynomials with SGD Near The Information-theoretic Limit 神经网络通过 SGD 在信息论极限附近学习低维多项式 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We study the problem of gradient descent learning of a single-index target function $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ under isotropic Gaussian data in $\mathbb{R}^d$, where the link function $\sigma_*:\mathbb{R}\to\mathbb{R}$ is an unknown degree-$q$ polynomial with information exponent $p$ (defined as the lowest degree in the Hermite expansion). 重点:我们研究在 $\mathbb{R}^d$ 中的各向同性高斯数据下,单索引目标函数 $f_*(\boldsymbol{x}) = \textstyle\sigma_*\left(\langle\boldsymbol{x},\boldsymbol{\theta}\rangle\right)$ 的梯度下降学习问题,其中链接函数 $\sigma_*:\mathbb{R}\to\mathbb{R}$ 是一个未知的度为 $q$ 的多项式,信息指数为 $p$(定义为 Hermite 展开中的最低度数)。 |
Kazusato Oko; Denny Wu; Jason Lee; Taiji Suzuki; |
242 | LM-HT SNN: Enhancing The Performance of SNN to ANN Counterpart Through Learnable Multi-hierarchical Threshold Model LM-HT SNN:通过可学习的多层次阈值模型提升 SNN 相对于 ANN 的性能 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. 亮点:在本文中,我们从数学角度严格分析了多阈值模型、普通脉冲模型和量化人工神经网络之间的关系,然后提出了一种新颖的 LM-HT 模型,该模型是一种等距多阈值模型,可以在时间维度上动态调节全局输入电流和膜电位泄漏。 |
Zecheng Hao; Xinyu Shi; Yujia Liu; Zhaofei Yu; Tiejun Huang; |
243 | No-Regret Learning for Fair Multi-Agent Social Welfare Optimization 无悔学习用于公平的多智能体社会福利优化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Given the fundamental role of NSW in the fairness literature, it is more than natural to ask whether no-regret fair learning with NSW as the objective is possible. In this work, we provide a complete answer to this question in various settings. 亮点:鉴于 NSW 在公平性文献中的基础性作用,询问以 NSW 为目标的无遗憾公平学习是否可能,是非常自然的。在这项工作中,我们在各种背景下为这个问题提供了完整的答案。 |
Mengxiao Zhang; Ramiro Deo-Campo Vuong; Haipeng Luo; 孟晓 张; Ramiro Deo-Campo Vuong; 赖鹏 罗; |
244 | Contextual Multinomial Logit Bandits with General Value Functions 上下文多项式逻辑斯蒂克赌博机与一般价值函数 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. 亮点:具体而言,我们考虑随机和对抗设置,并提出一套算法,每种算法具有不同的计算-遗憾权衡。 |
Mengxiao Zhang; Haipeng Luo; 孟晓 张; 海鹏 罗; |
245 | Zero-shot Image Editing with Reference Imitation 零样本图像编辑与参考模仿 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. 亮点:在本研究中,我们提出了一种新的编辑形式,称为模仿编辑,以帮助用户更方便地发挥创造力。 |
Xi Chen; Yutong Feng; Mengting Chen; Yiyang Wang; Shilong Zhang; Yu Liu; Yujun Shen; Hengshuang Zhao; |
246 | Implicit Bias of Mirror Flow on Separable Data 可分数据的镜像流隐性偏见 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. 亮点:我们研究了镜像下降的连续时间对应物,即镜像流,在线性可分的分类问题上。 |
Scott Pesme; Radu-Alexandru Dragomir; Nicolas Flammarion; 斯科特·佩斯梅; 拉杜-亚历山大·德拉戈米尔; 尼古拉·弗拉马里翁; |
247 | WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs WildGuard:开放一站式安全风险、越狱和拒绝的审核工具LLMs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce WildGuard—an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. 亮点:我们介绍 WildGuard——一个开放的轻量级 {{1001 }} 安全管理工具,旨在实现三个目标:(1)识别用户提示中的恶意意图,(2)检测模型响应的安全风险,以及(3)确定模型拒绝率。 |
Seungju Han; Kavel Rao; Allyson Ettinger; Liwei Jiang; Bill Yuchen Lin; Nathan Lambert; Nouha Dziri; Yejin Choi; |
248 | GSDF: 3DGS Meets SDF for Improved Neural Rendering and Reconstruction GSDF:3DGS 与 SDF 相结合以改善神经渲染和重建 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Although both neural implicit surfaces and explicit Gaussian primitives have advanced with neural rendering techniques, current methods impose strict constraints on density fields or primitive shapes, which enhances the affinity for geometric reconstruction at the sacrifice of rendering quality. To address this dilemma, we introduce GSDF, a dual-branch architecture combining 3D Gaussian Splatting (3DGS) and neural Signed Distance Fields (SDF). 亮点:尽管神经隐式表面和显式高斯原语都随着神经渲染技术的发展而进步,但当前的方法对密度场或原语形状施加了严格的限制,这增强了几何重建的亲和力,但以牺牲渲染质量为代价。为了解决这一困境,我们引入了 GSDF,一种结合了 3D 高斯溅射(3DGS)和神经符号距离场(SDF)的双分支架构。 |
Mulin Yu; Tao Lu; Linning Xu; Lihan Jiang; Yuanbo Xiangli; Bo Dai; |
249 | Universal In-Context Approximation By Prompting Fully Recurrent Models 通过提示完全递归模型的通用上下文近似 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. 亮点:我们证明了 RNN、LSTM、GRU、线性 RNN 以及线性门控架构如 Mamba 和 Hawk/Griffin 也可以作为通用的上下文近似器。为了简化我们的论点,我们引入了一种名为 LSRL 的编程语言,该语言可以编译为这些完全递归架构。 |
Aleksandar Petrov; Tom Lamb; Alasdair Paren; Philip Torr; Adel Bibi; 亚历山大·彼得罗夫; 汤姆·兰博; 阿拉斯代尔·帕伦; 菲利普·托尔; 阿德尔·比比; |
250 | Understanding Emergent Abilities of Language Models from The Loss Perspective 从损失的角度理解语言模型的突现能力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose to study emergent abilities in the lens of pre-training loss, instead of model size or training compute. 亮点:在本文中,我们建议从预训练损失的角度研究突现能力,而不是模型大小或训练计算量。 |
Zhengxiao Du; Aohan Zeng; Yuxiao Dong; Jie Tang; 郑晓杜;阿汉曾;玉晓董;杰唐; |
251 | Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning 反向学习:一种生物学上合理的双网络深度学习方法 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in deep learning. 亮点:受到生物系统中观察到的逆流交换机制的启发,我们提出了逆流学习(CCL),这是一个在深度学习中进行信用分配的生物学上合理的框架。 |
Chia-Hsiang Kao; Bharath Hariharan; |
252 | Revisiting Few-Shot Object Detection with Vision-Language Models 重新审视基于视觉-语言模型的少样本目标检测 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. 亮点:在本研究中,我们重新审视了在最近的基础视觉语言模型(VLMs)背景下的少样本目标检测(FSOD)任务。 |
Anish Madan; Neehar Peri; Shu Kong; Deva Ramanan; |
253 | Prospective Representation Learning for Non-Exemplar Class-Incremental Learning 面向非示例类增量学习的前瞻性表示学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Instead, we propose a Prospective Representation Learning (PRL) scheme to prepare the model for handling conflicts in advance. 亮点:相反,我们提出了一种前瞻性表示学习(PRL)方案,以提前为模型处理冲突做好准备。 |
Wuxuan Shi; Mang Ye; |
254 | Fast Sampling Via Discrete Non-Markov Diffusion Models 通过离散非马尔可夫扩散模型的快速采样 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation. 亮点:在本文中,我们提出了一种离散非马尔可夫扩散模型,该模型允许对离散数据生成进行加速反向采样。 |
Zixiang Chen; Angela Yuan; Yongqian Li; Yiwen Kou; Junkai Zhang; Quanquan Gu; |
255 | GrootVL: Tree Topology Is All You Need in State Space Model GrootVL:状态空间模型中只需树形拓扑 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree topology based on spatial relationships and input features. 亮点:然而,由于序列固有的几何约束,它在建模长程依赖性方面仍然不足。为了解决这个问题,我们提出了 GrootVL 网络,该网络首先基于空间关系和输入特征动态生成树形拓扑。 |
Yicheng Xiao; Lin Song; shaoli huang; Jiangshan Wang; Siyu Song; Yixiao Ge; Xiu Li; Ying Shan; |
256 | UQE: A Query Engine for Unstructured Databases UQE: 一种用于非结构化数据库的查询引擎 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In particular, we propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections. 亮点:特别是,我们提出了一种新的通用查询引擎(UQE),可以直接对非结构化数据集合进行查询并提取洞见。 |
Hanjun Dai; Bethany Wang; Xingchen Wan; Bo Dai; Sherry Yang; Azade Nova; Pengcheng Yin; Mangpo Phothilimthana; Charles Sutton; Dale Schuurmans; 戴汉军; 王贝瑟妮; 万星辰; 戴博; 杨雪莉; 阿扎德·诺瓦; 尹鹏程; 曼波·波提利姆塔纳; 查尔斯·萨顿; 戴尔·舒尔曼斯; |
257 | Exploring Molecular Pretraining Model at Scale 探索大规模分子预训练模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. 亮点:在本研究中,我们提出了一种创新的分子预训练模型,利用双轨变换器有效地整合原子级、图级和几何结构级的特征。 |
ji xh; Zhen Wang; Zhifeng Gao; Hang Zheng; Linfeng Zhang; Guolin Ke; |
258 | Adaptive Proximal Gradient Method for Convex Optimization 自适应近端梯度法用于凸优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). 亮点:在本文中,我们探讨了凸优化中的两种基本一阶算法,即梯度下降(GD)和近端梯度法(ProxGD)。 |
Yura Malitsky; Konstantin Mishchenko; 尤拉·马利茨基;康斯坦丁·米申科; |
259 | Language Models As Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models 语言模型作为零-shot 无损梯度压缩器:朝向通用神经参数先验模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We examine the property by considering lossless gradient compression — a critical application in distributed learning — that depends heavily on precise probability modeling. To achieve this, we introduce LM-GC, a novel method that integrates LLMs with arithmetic coding. 亮点:我们通过考虑无损梯度压缩——这是分布式学习中的一个关键应用——来研究该属性,这在很大程度上依赖于精确的概率建模。为此,我们引入了 LM-GC,这是一种将LLMs与算术编码相结合的新方法。 |
Hui-Po Wang; Mario Fritz; 王慧波; 马里奥·弗里茨; |
260 | Towards Visual Text Design Transfer Across Languages 面向跨语言视觉文本设计转移 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. 亮点:我们介绍了 SIGIL,一个多模态风格翻译框架,消除了对风格描述的需求。 |
Yejin Choi; Jiwan Chung; Sumin Shim; Giyeong Oh; Youngjae Yu; |
261 | Efficient Lifelong Model Evaluation in An Era of Rapid Progress 在快速进步的时代中高效的终身模型评估 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, with repeated testing, the risk of overfitting grows as algorithms over-exploit benchmark idiosyncrasies. In our work, we seek to mitigate this challenge by compiling \textit{ever-expanding} large-scale benchmarks called \textit{Lifelong Benchmarks}. 亮点:然而,随着重复测试,过拟合的风险随着算法对基准特征的过度利用而增加。在我们的工作中,我们试图通过编制称为\textit{Lifelong Benchmarks}的\textit{不断扩展}的大规模基准来减轻这一挑战。 |
Ameya Prabhu; Vishaal Udandarao; Philip Torr; Matthias Bethge; Adel Bibi; Samuel Albanie; |
262 | ARC: A Generalist Graph Anomaly Detector with In-Context Learning ARC:一种具有上下文学习的通用图异常检测器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a “one-for-all” GAD model to detect anomalies across various graph datasets on-the-fly. 亮点:然而,当前的 GAD 方法需要针对每个数据集进行特定训练,导致高昂的训练成本、巨大的数据需求,以及在应用于新数据集和领域时的有限泛化能力。为了解决这些局限性,本文提出了 ARC,一种通用的 GAD 方法,使得“一个模型适用于所有”GAD 模型能够实时检测各种图数据集中的异常。 |
Yixin Liu; Shiyuan Li; Yu Zheng; Qingfeng Chen; Chengqi Zhang; Shirui Pan; |
263 | Adam with Model Exponential Moving Average Is Effective for Nonconvex Optimization 使用模型指数移动平均的亚当算法对非凸优化有效 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). 亮点:在本研究中,我们对两种现代优化技术进行了理论分析,以训练大型和复杂的模型:(i)自适应优化算法,如 Adam,以及(ii)模型指数移动平均(EMA)。 |
Kwangjun Ahn; Ashok Cutkosky; |
264 | TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools TACT:利用信息提取工具推进复杂聚合推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Unexpectedly, we discover that each component presents substantial challenges for current LLMs. These insights lead us to propose a focused modeling framework, which we refer to as _IE as a tool_. 亮点:出乎意料的是,我们发现每个组件对当前的 LLMs 都提出了重大挑战。这些见解使我们提出了一个专注的建模框架,我们称之为 _IE as a tool_。 |
Avi Caciularu; Alon Jacovi; Eyal Ben-David; Sasha Goldshtein; Tal Schuster; Jonathan Herzig; Gal Elidan; Amir Globerson; |
265 | Vivid-ZOO: Multi-View Video Generation with Diffusion Model Vivid-ZOO:基于扩散模型的多视角视频生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To this end, we propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text. 亮点:为此,我们提出了一种新颖的基于扩散的管道,该管道从文本生成围绕动态 3D 对象的高质量多视角视频。 |
Bing Li; Cheng Zheng; Wenxuan Zhu; Jinjie Mai; Biao Zhang; Peter Wonka; Bernard Ghanem; |
266 | OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents OmniJARVIS:统一的视觉-语言-行动标记化使开放世界指令跟随代理成为可能 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. 亮点:本文提出了 OmniJARVIS,一种用于开放世界 Minecraft 中开放世界指令跟随代理的新型视觉-语言-动作(VLA)模型。 |
Zihao Wang; Shaofei Cai; Zhancun Mu; Haowei Lin; Ceyao Zhang; Xuejie Liu; Qing Li; Anji Liu; Xiaojian (Shawn) Ma; Yitao Liang; 王子豪; 蔡少飞; 穆占存; 林浩伟; 张策尧; 刘学杰; 李青; 刘安吉; 马晓健 (肖恩); 梁怡涛; |
267 | Understanding The Differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks 理解基础模型的差异:注意力机制、状态空间模型和递归神经网络 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. 亮点:尽管这些方法之间存在联系,但此类模型通常是孤立开发的,缺乏对支撑这些架构的共同原则及其微妙差异的理论理解,这对性能和可扩展性有很大影响。在本文中,我们介绍了动态系统框架(DSF),它允许在共同表示中对所有这些架构进行原则性研究。 |
Jerome Sieber; Carmen Amo Alonso; Alexandre Didier; Melanie Zeilinger; Antonio Orvieto; 杰罗姆·西伯; 卡门·阿莫·阿隆索; 亚历山大·迪迪埃; 梅拉妮·齐林格; 安东尼奥·奥尔维耶托; |
268 | SemCoder: Training Code Language Models with Comprehensive Semantics SemCoder:使用全面语义训练代码语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper aims to bridge the gap between Code LLMs’ reliance on static text data and the need for thorough semantic understanding for complex tasks like debugging and program repair. 亮点:本文旨在弥合代码 LLMs 对静态文本数据的依赖与对复杂任务(如调试和程序修复)所需的深入语义理解之间的差距。 |
Yangruibo Ding; Jinjun Peng; Marcus Min; Gail Kaiser; Junfeng Yang; Baishakhi Ray; 杨瑞博 丁; 彭金军; 马库斯·敏; 盖尔·凯泽; 杨俊峰; 巴伊沙基·雷; |
269 | MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence MicroAdam:低空间开销和可证明收敛的准确自适应优化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose a new variant of the Adam optimizer called microAdam that specifically minimizes memory overheads, while maintaining theoretical convergence guarantees. 亮点:我们提出了一种新的 Adam 优化器变体,称为 microAdam,它专门最小化内存开销,同时保持理论收敛保证。 |
Ionut-Vlad Modoranu; Mher Safaryan; Grigory Malinovsky; Eldar Kurtić; Thomas Robert; Peter Richtarik; Dan Alistarh; |
270 | Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models 重尾类不平衡及为何 Adam 在语言模型上优于梯度下降 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: On a linear model with cross-entropy loss, we show that class imbalance leads to imbalanced, correlated gradients and Hessians that have been hypothesized to benefit Adam. 亮点:在一个具有交叉熵损失的线性模型中,我们展示了类别不平衡导致不平衡的、相关的梯度和海森矩阵,这被假设有利于 Adam。 |
Frederik Kunstner; Robin Yadav; Alan Milligan; Mark Schmidt; Alberto Bietti; 弗雷德里克·库斯特纳; 罗宾·亚达夫; 阿兰·米利根; 马克·施密特; 阿尔贝托·比耶蒂; |
271 | Universal Neural Functionals 通用神经泛函 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This work proposes an algorithm that automatically constructs permutation equivariant models, which we refer to as universal neural functionals (UNFs), for any weight space. 亮点:本工作提出了一种算法,该算法自动构建置换等变模型,我们称之为通用神经泛函(UNFs),适用于任何权重空间。 |
Allan Zhou; Chelsea Finn; James Harrison; |
272 | Enhancing Protein Mutation Effect Prediction Through A Retrieval-Augmented Framework 通过检索增强框架提升蛋白质突变效应预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, existing models struggle to effectively extract mutation-related local structure motifs from protein databases, which hinders their predictive accuracy and robustness. To tackle this problem, we design a novel retrieval-augmented framework for incorporating similar structure information in known protein structures. 亮点:然而,现有模型在有效提取蛋白质数据库中的突变相关局部结构基序方面存在困难,这阻碍了它们的预测准确性和稳健性。为了解决这个问题,我们设计了一种新颖的检索增强框架,以纳入已知蛋白质结构中的相似结构信息。 |
Ruihan Guo; Rui Wang; Ruidong Wu; Zhizhou Ren; Jiahan Li; Shitong Luo; Zuofan Wu; Qiang Liu; Jian Peng; Jianzhu Ma; 郭瑞涵; 王瑞; 吴瑞东; 任志周; 李佳汉; 罗世彤; 吴作凡; 刘强; 彭健; 马建柱; |
273 | Aligning LLM Agents By Learning Latent Preference from User Edits 通过用户编辑学习潜在偏好来对齐LLM代理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose a learning framework, PRELUDE that infers a description of the user’s latent preference based on historic edit data and using it to define a prompt policy that drives future response generation. 亮点:我们提出了一个学习框架 PRELUDE,它基于历史编辑数据推断用户潜在偏好的描述,并利用该描述定义一个提示策略,以驱动未来的响应生成。 |
Ge Gao; Alexey Taymanov; Eduardo Salinas; Paul Mineiro; Dipendra Misra; |
274 | AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery AllClear:卫星图像云去除的综合数据集和基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset — *AllClear* for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. 亮点:当前云去除研究的一个主要挑战是缺乏全面的基准和足够大且多样化的训练数据集。为了解决这个问题,我们引入了最大的公共数据集——*AllClear*,用于云去除,包含 23,742 个全球分布的兴趣区域(ROIs),具有多样的土地利用模式,总计 4 百万张图像。 |
Hangyu Zhou; Chia-Hsiang Kao; Cheng Perng Phoo; Utkarsh Mall; Bharath Hariharan; Kavita Bala; |
275 | Cardinality-Aware Set Prediction and Top-$k$ Classification 基数感知集预测与 Top-$k$ 分类 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. 亮点:我们提出了一项关于基数感知的前$k$分类的详细研究,这是一种新颖的方法,旨在学习一个准确的前$k$集合预测器,同时保持较低的基数。 |
Corinna Cortes; Anqi Mao; Christopher Mohri; Mehryar Mohri; Yutao Zhong; |
276 | Statistical Estimation in The Spiked Tensor Model Via The Quantum Approximate Optimization Algorithm 通过量子近似优化算法在尖峰张量模型中的统计估计 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we analyze the performance of the QAOA on the spiked tensor model, a statistical estimation problem that exhibits a large computational-statistical gap classically. 亮点:在本文中,我们分析了 QAOA 在尖峰张量模型上的性能,这是一种统计估计问题,在经典计算中表现出较大的计算-统计差距。 |
Leo Zhou; Joao Basso; Song Mei; |
277 | Deep Graph Mating 深度图配对 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce the first learning-free model reuse task within the non-Euclidean domain, termed as Deep Graph Mating (Grama). 亮点:在本文中,我们介绍了非欧几里得领域内的第一个无学习模型重用任务,称为深度图匹配(Grama)。 |
Yongcheng Jing; Seok-Hee Hong; Dacheng Tao; 永城靖;洪锡希;陶大成; |
278 | SpeechAlign: Speech Language Models Can Self-Improve Via Preference Optimization SpeechAlign: 语音语言模型可以通过偏好优化自我改进 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. 亮点:我们介绍了 SpeechAlign,这是一种迭代自我改进策略,旨在将语音语言模型与人类偏好对齐。 |
Dong Zhang; Zhaowei Li; Shimin Li; Xin Zhang; Pengyu Wang; Yaqian Zhou; Xipeng Qiu; 董张; 赵伟李; 史敏李; 辛张; 彭宇王; 雅倩周; 西鹏邱; |
279 | Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates 在微调后保持LLMs对齐:提示模板的关键作用 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Through extensive experiments on several chat models (Meta’s Llama 2-Chat, Mistral AI’s Mistral 7B Instruct v0.2, and OpenAI’s GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the “Pure Tuning, Safe Testing” (PTST) strategy — fine-tune models without a safety prompt, but include it at test time. 亮点:通过对多个聊天模型(Meta 的 Llama 2-Chat、Mistral AI 的 Mistral 7B Instruct v0.2 和 OpenAI 的 GPT-3.5 Turbo)进行广泛实验,本文揭示了在微调和推理过程中使用的提示模板在保持安全对齐方面发挥了关键作用,并提出了“纯调优,安全测试”(PTST)策略——在微调模型时不使用安全提示,但在测试时包含它。 |
Kaifeng Lyu; Haoyu Zhao; Xinran Gu; Dingli Yu; Anirudh Goyal; Sanjeev Arora; 开封吕; 郝宇赵; 辛然顾; 丁立余; 阿尼鲁德·戈亚尔; 桑吉夫·阿罗拉; |
280 | Calibrated Self-Rewarding Vision Language Models 校准自奖励视觉语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: These approaches are resource-intensive and may not effectively reflect the target LVLM’s preferences, making the curated preferences easily distinguishable. Our work addresses these challenges by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. 亮点:这些方法资源密集,可能无法有效反映目标 LVLM 的偏好,使得策划的偏好易于区分。我们的工作通过提出校准自我奖励(CSR)方法来解决这些挑战,该方法使模型能够通过迭代生成候选响应、评估每个响应的奖励,并策划偏好数据以进行微调,从而自我改进。 |
Yiyang Zhou; Zhiyuan Fan; Dongjie Cheng; Sihan Yang; Zhaorun Chen; Chenhang Cui; Xiyao Wang; Yun Li; Linjun Zhang; Huaxiu Yao; 周怡扬; 范志远; 程东杰; 杨思涵; 陈兆润; 崔晨航; 王希尧; 李云; 张林军; 姚华秀; |
281 | Transfer Q-star : Principled Decoding for LLM Alignment 转移 Q-star:用于LLM对齐的原则解码 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose $\texttt{Transfer Q}^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $\rho_{\texttt{BL}}$ aligned with a baseline reward $r_{\texttt{BL}}$ (which can be different from the target reward $r$). 亮点:在这项工作中,我们提出了 $\texttt{Transfer Q}^*$,它通过与基准奖励 $r_{\texttt{BL}}$ (可能不同于目标奖励 $r$)对齐的基线模型 $\rho_{\texttt{BL}}$ 隐式地估计目标奖励 $r$ 的最优值函数。 |
Souradip Chakraborty; Soumya Suvra Ghosal; Ming Yin; Dinesh Manocha; Mengdi Wang; Amrit Singh Bedi; Furong Huang; |
282 | Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions 图像文本化:生成丰富详细图像描述的自动化框架 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose an innovative framework termed Image Textualization, which automatically produces high-quality image descriptions by leveraging existing mult-modal large language models (MLLMs) and multiple vision expert models in a collaborative manner. 亮点:在本文中,我们提出了一种创新框架,称为图像文本化(Image Textualization),该框架通过协同利用现有的多模态大型语言模型(MLLMs)和多个视觉专家模型,自动生成高质量的图像描述。 |
Renjie Pi; Jianshu Zhang; Jipeng Zhang; Rui Pan; Zhekai Chen; Tong Zhang; |
283 | SyncVIS: Synchronized Video Instance Segmentation SyncVIS:同步视频实例分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite harvesting remarkable progress, existing works follow asynchronous designs, which model video sequences via either video-level queries only or adopting query-sensitive cascade structures, resulting in difficulties when handling complex and challenging video scenarios. In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS. 亮点:尽管取得了显著进展,现有的工作仍然遵循异步设计,仅通过视频级查询或采用查询敏感的级联结构来建模视频序列,这在处理复杂和具有挑战性的视频场景时造成了困难。在本研究中,我们分析了这一现象的原因及当前解决方案的局限性,并提出通过一个名为 SyncVIS 的新框架进行同步建模。 |
Rongkun Zheng; Lu Qi; Xi Chen; Yi Wang; Kun Wang; Yu Qiao; Hengshuang Zhao; 郑荣坤; 齐璐; 陈熙; 王怡; 王坤; 乔宇; 赵恒霜; |
284 | Training-Free Visual Prompt Learning for Multimodal Large Language Models 无训练的视觉提示学习用于多模态大语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose a training-free method to inject visual referring into Multimodal Large Language Models (MLLMs) through learnable visual token optimization. 亮点:在本研究中,我们提出了一种无训练的方法,通过可学习的视觉标记优化将视觉指代注入多模态大型语言模型(MLLMs)中。 |
Mingrui Wu; Xinyue Cai; Jiayi Ji; Jiale Li; Oucheng Huang; Gen Luo; Hao Fei; GUANNAN JIANG; Xiaoshuai Sun; Rongrong Ji; 吴明瑞; 蔡欣悦; 姜佳怡; 李佳乐; 黄欧成; 罗根; 费浩; 姜冠南; 孙晓帅; 纪荣荣; |
285 | Decoupling Semantic Similarity from Spatial Alignment for Neural Networks 从空间对齐中解耦语义相似性以用于神经网络 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Revisiting the established similarity calculations for RSMs we expose their sensitivity to spatial alignment. In this paper we propose to solve this through _semantic RSMs_, which are invariant to spatial permutation. 亮点:重新审视已建立的 RSM 相似性计算,我们揭示了它们对空间对齐的敏感性。本文我们提出通过_语义 RSM_来解决这个问题,它对空间置换是不变的。 |
Tassilo Wald; Constantin Ulrich; Priyank Jaini; Gregor Koehler; David Zimmerer; Stefan Denner; Fabian Isensee; Michael Baumgartner; Klaus Maier-Hein; 塔西洛·瓦尔德;康斯坦丁·乌尔里希;普里扬克·贾伊尼;格雷戈尔·科勒;大卫·齐默尔;斯特凡·登纳;法比安·伊森塞;迈克尔·鲍姆加特纳;克劳斯·迈尔-海因 |
286 | Variational Distillation of Diffusion Policies Into Mixture of Experts 扩散策略的变分蒸馏为专家混合模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. 亮点:本研究介绍了一种新颖的方法——变分扩散蒸馏(Variational Diffusion Distillation, VDD),该方法通过变分推断将去噪扩散策略蒸馏为专家混合模型(Mixtures of Experts, MoE)。 |
Hongyi Zhou; Denis Blessing; Ge Li; Onur Celik; Xiaogang Jia; Gerhard Neumann; Rudolf Lioutikov; 周鸿毅; 丹尼斯·布莱辛; 李戈; 奥努尔·切利克; 贾晓刚; 盖尔哈德·诺伊曼; 鲁道夫·柳季科夫; |
287 | GTBench: Uncovering The Strategic Reasoning Capabilities of LLMs Via Game-Theoretic Evaluations GTBench:通过博弈论评估揭示 LLMs 的战略推理能力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper evaluates LLMs’ reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. 亮点:本文通过博弈论任务评估LLMs在竞争环境中的推理能力,例如需要纯逻辑和战略推理以与对手竞争的棋类和纸牌游戏。 |
Jinhao Duan; Renming Zhang; James Diffenderfer; Bhavya Kailkhura; Lichao Sun; Elias Stengel-Eskin; Mohit Bansal; Tianlong Chen; Kaidi Xu; 段金浩;张仁明;詹姆斯·迪芬德费尔;巴维亚·凯尔库拉;孙立超;埃利亚斯·斯滕格尔-埃斯金;莫希特·班萨尔;陈天龙;徐凯迪; |
288 | Probing The Decision Boundaries of In-context Learning in Large Language Models 探究大语言模型中上下文学习的决策边界 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification. 亮点:在本研究中,我们提出了一种新机制,从决策边界的角度探讨和理解上下文学习,以进行上下文二元分类。 |
Siyan Zhao; Tung Nguyen; Aditya Grover; |
289 | MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions MiraData:一个具有长时长和结构化字幕的大规模视频数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, existing publicly available datasets are inadequate for generating Sora-like videos, as they mainly contain short videos with low motion intensity and brief captions. To address these issues, we propose MiraData, a high-quality video dataset that surpasses previous ones in video duration, caption detail, motion strength, and visual quality. 亮点:然而,现有的公开可用数据集不足以生成类似 Sora 的视频,因为它们主要包含低运动强度和简短字幕的短视频。为了解决这些问题,我们提出了 MiraData,这是一个高质量的视频数据集,在视频时长、字幕细节、运动强度和视觉质量上超过了之前的数据集。 |
Xuan Ju; Yiming Gao; Zhaoyang Zhang; Ziyang Yuan; Xintao Wang; AILING ZENG; Yu Xiong; Qiang Xu; Ying Shan; |
290 | Who Evaluates The Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) 谁来评估评估?使用 T2IScoreScore (TS2) 客观评分文本到图像提示一致性指标 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce T2IScoreScore (TS2), a curated set of semantic error graphs containing a prompt and a set increasingly erroneous images. 亮点:我们介绍 T2IScoreScore (TS2),这是一个经过整理的语义错误图集,包含一个提示和一组逐渐出现错误的图像。 |
Michael Saxon; Fatima Jahara; Mahsa Khoshnoodi; Yujie Lu; Aditya Sharma; William Yang Wang; 迈克尔·萨克森;法蒂玛·贾哈拉;马赫萨·霍什诺迪;陆宇杰;阿迪提亚·夏尔马;威廉·杨·王; |
291 | Improved Distribution Matching Distillation for Fast Image Synthesis 改进的分布匹配蒸馏用于快速图像合成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This is not only computationally expensive for large-scale text-to-image synthesis, but it also limits the student’s quality, tying it too closely to the teacher’s original sampling paths. We introduce DMD2, a set of techniques that lift this limitation and improve DMD training. 亮点:这不仅在大规模文本到图像合成中计算成本高昂,而且还限制了学生的质量,使其过于依赖教师的原始采样路径。我们引入了 DMD2,一组技术,旨在消除这一限制并改善 DMD 训练。 |
Tianwei Yin; Michaël Gharbi; Taesung Park; Richard Zhang; Eli Shechtman; Fredo Durand; Bill Freeman; 田伟 银; 米歇尔·加尔比; 朴泰成; 理查德·张; 埃利·谢赫特曼; 弗雷多·杜兰; 比尔·弗里曼; |
292 | Mixture of Tokens: Continuous MoE Through Cross-Example Aggregation 混合标记:通过跨示例聚合实现连续 MoE Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Motivated by the observation that the adaptation of fully continuous methods has been an overarching trend in deep learning, we develop Mixture of Tokens (MoT), a simple, continuous architecture that is capable of scaling the number of parameters similarly to sparse MoE models. 亮点:受到完全连续方法适应性已成为深度学习中的一个总体趋势的启发,我们开发了混合标记(MoT),这是一种简单的连续架构,能够以类似于稀疏 MoE 模型的方式扩展参数数量。 |
Szymon Antoniak; Michał Krutul; Maciej Pióro; Jakub Krajewski; Jan Ludziejewski; Kamil Ciebiera; Krystian Król; Tomasz Odrzygóźdź; Marek Cygan; Sebastian Jaszczur; |
293 | Visual CoT: Advancing Multi-Modal Language Models with A Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning 视觉 CoT:通过全面的数据集和基准推进多模态语言模型的链式思维推理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. 重点是,我们提出了一种多轮处理管道,动态关注视觉输入并提供可解释的思考。 |
Hao Shao; Shengju Qian; Han Xiao; Guanglu Song; ZHUOFAN ZONG; Letian Wang; Yu Liu; Hongsheng Li; |
294 | Learning Action and Reasoning-Centric Image Editing from Videos and Simulation 从视频和模拟中学习以行动和推理为中心的图像编辑 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Instead, we propose a new automatic metric that focuses on discriminative understanding. 亮点:相反,我们提出了一种新的自动度量,专注于区分性理解。 |
Benno Krojer; Dheeraj Vattikonda; Luis Lara; Varun Jampani; Eva Portelance; Chris Pal; Siva Reddy; 本诺·克罗耶尔; 迪赫拉杰·瓦蒂孔达; 路易斯·拉拉; 瓦伦·贾帕尼; 伊娃·波特兰斯; 克里斯·帕尔; 西瓦·雷迪; |
295 | The Iterative Optimal Brain Surgeon: Faster Sparse Recovery By Leveraging Second-Order Information 迭代最优脑外科医生:通过利用二阶信息加速稀疏恢复 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Yet, these results still lack a solid theoretical understanding, and it is unclear whether they can be improved by leveraging connections to the wealth of work on sparse recovery algorithms. In this paper, we draw new connections between these two areas and present new sparse recovery algorithms inspired by the OBS framework that come with theoretical guarantees under reasonable assumptions and have strong practical performance. 亮点:然而,这些结果仍然缺乏扎实的理论理解,目前尚不清楚是否可以通过利用与稀疏恢复算法大量工作的联系来加以改进。在本文中,我们在这两个领域之间建立了新的联系,并提出了受 OBS 框架启发的新稀疏恢复算法,这些算法在合理假设下具有理论保证,并且具有强大的实际性能。 |
Diyuan Wu; Ionut-Vlad Modoranu; Mher Safaryan; Denis Kuznedelev; Dan Alistarh; |
296 | Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models 双原型演化用于视觉-语言模型的测试时泛化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, these methods typically focus solely on adapting VLMs from a single modality and fail to accumulate task-specific knowledge as more samples are processed. To address this, we introduce Dual Prototype Evolving (DPE), a novel test-time adaptation approach for VLMs that effectively accumulates task-specific knowledge from multi-modalities. 亮点:然而,这些方法通常仅专注于从单一模态适应 VLM,并未随着更多样本的处理而积累任务特定知识。为了解决这个问题,我们引入了双原型演化(DPE),这是一种新颖的 VLM 测试时适应方法,能够有效地从多模态中积累任务特定知识。 |
Ce Zhang; Simon Stepputtis; Katia Sycara; Yaqi Xie; |
297 | Predicting Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data 基于统计学和近似理论预测变换器神经网络在内在低维数据上的缩放法则 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. 亮点:然而,尽管持续广泛的兴趣,关于变换器缩放法则为何存在的严格理解仍然缺失。为了解答这个问题,我们建立了新的统计估计和数学近似理论,适用于输入数据集中在低维流形上的变换器。 |
Alexander Havrilla; Wenjing Liao; 亚历山大·哈夫里拉; 廖文静; |
298 | CALE: Continuous Arcade Learning Environment CALE: 连续街机学习环境 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. 亮点:我们介绍了连续街机学习环境(CALE),这是著名的街机学习环境(ALE)的扩展[Bellemare et al., 2013]。 |
Jesse Farebrother; Pablo Samuel Castro; 杰西·法雷布罗瑟;巴勃罗·塞缪尔·卡斯特罗; |
299 | HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting HiCoM:层次一致运动用于动态可流式场景的 3D 高斯溅射 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper proposes an efficient framework, dubbed HiCoM, with three key components. 亮点:本文提出了一个高效的框架,称为 HiCoM,具有三个关键组件。 |
Qiankun Gao; Jiarui Meng; Chengxiang Wen; Jie Chen; Jian Zhang; 高乾坤; 孟佳瑞; 文承祥; 陈杰; 张健; |
300 | Vector Quantization Prompting for Continual Learning 向量量化提示用于持续学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, these prompts are continuous, which lack sufficient abstraction for task knowledge representation, making them less effective for continual learning. To address these challenges, we propose VQ-Prompt, a prompt-based continual learning method that incorporates Vector Quantization (VQ) into end-to-end training of a set of discrete prompts. 亮点:然而,这些提示是连续的,缺乏足够的抽象以表示任务知识,使其在持续学习中效果不佳。为了解决这些挑战,我们提出了 VQ-Prompt,这是一种基于提示的持续学习方法,将向量量化(VQ)纳入一组离散提示的端到端训练中。 |
Li Jiao; Qiuxia LAI; YU LI; Qiang Xu; 李娇;赖秋霞;李宇;徐强; |
301 | Transformer Efficiently Learns Low-dimensional Target Functions In-context 变换器高效学习上下文中的低维目标函数 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We study ICL of a nonlinear function class via transformer with nonlinear MLP layer: given a class of single-index target functions $f_*(x) = \sigma_*(\langle x,\beta\rangle)$, where the index features $\beta\in\mathbb{R}^d$ are drawn from a rank-$r$ subspace, we show that a nonlinear transformer optimized by gradient descent on the empirical loss learns $f_*$ in-context with a prompt length that only depends on the dimension of function class $r$; in contrast, an algorithm that directly learns $f_*$ on test prompt yields a statistical complexity that scales with the ambient dimension $d$. 亮点:我们通过具有非线性 MLP 层的变换器研究非线性函数类的 ICL:给定一类单索引目标函数 $f_*(x) = \sigma_*(\langle x,\beta\rangle)$,其中索引特征 $\beta\in\mathbb{R}^d$ 来自于一个秩为 $r$ 的子空间,我们表明,通过对经验损失进行梯度下降优化的非线性变换器能够在上下文中学习 $f_*$,其提示长度仅依赖于函数类的维度 $r$;相比之下,直接在测试提示上学习 $f_*$ 的算法则导致统计复杂度与环境维度 $d$ 成比例增长。 |
Kazusato Oko; Yujin Song; Taiji Suzuki; Denny Wu; |
302 | Diffusion of Thought: Chain-of-Thought Reasoning in Diffusion Language Models 思想的扩散:扩散语言模型中的链式思维推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose Diffusion-of-Thought (DoT), a novel approach that integrates diffusion models with Chain-of-Thought, a well-established technique for improving the reasoning ability of autoregressive language models. 亮点:在本研究中,我们提出了思维扩散(DoT),这是一种将扩散模型与链式思维相结合的新方法,链式思维是一种公认的提高自回归语言模型推理能力的技术。 |
Jiacheng Ye; Shansan Gong; Liheng Chen; Lin Zheng; Jiahui Gao; Han Shi; Chuan Wu; Xin Jiang; Zhenguo Li; Wei Bi; Lingpeng Kong; 叶家诚; 龚珊珊; 陈立恒; 郑林; 高佳辉; 石汉; 吴川; 江鑫; 李正国; 毕伟; 孔灵鹏; |
303 | When Does Perceptual Alignment Benefit Vision Representations? 感知对齐何时有利于视觉表征? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Here, we investigate how aligning vision model representations to human perceptual judgments impacts their usability in standard computer vision tasks. 亮点:在这里,我们研究了将视觉模型表示与人类感知判断对齐如何影响其在标准计算机视觉任务中的可用性。 |
Shobhita Sundaram; Stephanie Fu; Lukas Muttenthaler; Netanel Tamir; Lucy Chai; Simon Kornblith; Trevor Darrell; Phillip Isola; |
304 | Is Value Function Learning Really The Main Bottleneck of Offline RL? 价值函数学习真的是离线强化学习的主要瓶颈吗? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we aim to understand bottlenecks in current offline RL algorithms. 亮点:在本研究中,我们旨在理解当前离线强化学习算法中的瓶颈。 |
Seohong Park; Kevin Frans; Sergey Levine; Aviral Kumar; |
305 | SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation SplitNeRF:用于联合几何、照明和材料估计的分裂和近似神经场 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present a novel approach for digitizing real-world objects by estimating their geometry, material properties, and environmental lighting from a set of posed images with fixed lighting. 亮点:我们提出了一种新颖的方法,通过从一组具有固定光照的姿态图像中估计真实世界物体的几何形状、材料属性和环境光照来实现物体的数字化。 |
Jesus Zarzar; Bernard Ghanem; 耶稣·扎尔扎尔; 伯纳德·甘宁; |
306 | InfLLM: Training-Free Long-Context Extrapolation for LLMs with An Efficient Context Memory InfLLM:无训练的长上下文外推方法,适用于LLMs,具有高效的上下文记忆 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. 亮点:在本文中,我们揭示了LLMs在无需任何微调的情况下理解极长序列的内在能力。 |
Chaojun Xiao; Pengle Zhang; Xu Han; Guangxuan Xiao; Yankai Lin; Zhengyan Zhang; Zhiyuan Liu; Maosong Sun; |
307 | Gorilla: Teaching LLMs to Use Tools 大猩猩:教LLMs使用工具 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We develop Go-rilla, a finetuned LLaMA model that surpassesthe performance of GPT-4 on writing API calls. 亮点:我们开发了 Go-rilla,一个微调的 LLaMA 模型,其在编写 API 调用方面超越了 GPT-4 的性能。 |
Shishir G Patil; Tianjun Zhang; Xin Wang; Joseph Gonzalez; |
308 | Reranking Laws for Language Generation: A Communication-Theoretic Perspective 语言生成的重排序法则:一种通信理论视角 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reranker to choose the best one. In this paper, we draw a parallel between this strategy and the use of redundancy to decrease the error rate in noisy communication channels. 亮点:一种简单且常用的策略是首先让LLM生成多个假设,然后使用重排序器选择最佳的一个。在本文中,我们将这一策略与使用冗余来降低噪声通信通道中的错误率进行了类比。 |
António Farinhas; Haau-Sing Li; André Martins; 安东尼奥·法里尼亚斯; 李豪胜; 安德烈·马丁斯; |
309 | Robust Reinforcement Learning from Corrupted Human Feedback 来自受损人类反馈的强健强化学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach — $R^3M$, which models the potentially corrupted preference label as sparse outliers. 亮点:由于各种原因,例如个人偏见、上下文模糊、缺乏训练等,人类标注者可能会给出不正确或不一致的偏好标签。为了解决这个挑战,我们提出了一种稳健的 RLHF 方法——$R^3M$,该方法将潜在的损坏偏好标签建模为稀疏离群点。 |
Alexander Bukharin; Ilgee Hong; Haoming Jiang; Zichong Li; Qingru Zhang; Zixuan Zhang; Tuo Zhao; 亚历山大·布哈林; 伊尔吉·洪; 蒋浩敏; 李自冲; 张青如; 张子轩; 赵拓; |
310 | Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control 协作视频扩散:具有相机控制的一致多视频生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. 亮点:解决这一多视频生成问题的方案可以实现大规模的 3D 场景生成,并支持可编辑的摄像机轨迹,以及其他应用。我们引入了协作视频扩散(CVD),作为实现这一愿景的重要一步。 |
Zhengfei Kuang; Shengqu Cai; Hao He; Yinghao Xu; Hongsheng Li; Leonidas Guibas; Gordon Wetzstein; 郑飞 匡; 盛曲 蔡; 郝 贺; 应浩 许; 洪生 李; 莱昂尼达斯 吉巴斯; 戈登 韦茨斯坦; |
311 | Needle In A Multimodal Haystack 多模态干草堆中的针 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. 亮点:在本研究中,我们提出了多模态干草堆中的针(MM-NIAH),这是第一个专门设计的基准,用于系统评估现有多模态大语言模型(MLLMs)理解长篇多模态文档的能力。 |
Weiyun Wang; Shuibo Zhang; Yiming Ren; Yuchen Duan; Tiantong Li; Shuo Liu; Mengkang Hu; Zhe Chen; Kaipeng Zhang; Lewei Lu; Xizhou Zhu; Ping Luo; Yu Qiao; Jifeng Dai; Wenqi Shao; Wenhai Wang; 魏云 王; 水波 张; 一鸣 任; 余晨 段; 天彤 李; 朔 刘; 孟康 胡; 哲 陈; 凯鹏 张; 乐伟 陆; 西洲 朱; 平 罗; 余乔; 继峰 戴; 文琦 邵; 文海 王; |
312 | A Tractable Inference Perspective of Offline RL 离线强化学习的可处理推理视角 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: While it is still possible to approximate such queries, we observe that such crude estimates undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern tractable generative models to bridge the gap between good sequence models and high expected returns at evaluation time. 亮点:虽然仍然可以对这些查询进行近似,但我们观察到这种粗略的估计削弱了表现力强的序列模型所带来的好处。为了解决这个问题,本文提出了 Trifle(离线强化学习的可处理推断),它利用现代可处理生成模型来弥合良好序列模型与评估时高期望回报之间的差距。 |
Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang; |
313 | StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation StoryDiffusion:一致的自注意力用于长范围图像和视频生成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a simple but effective self-attention mechanism, termed Consistent Self-Attention, that boosts the consistency between the generated images. 亮点:在本文中,我们提出了一种简单但有效的自注意力机制,称为一致性自注意力,能够增强生成图像之间的一致性。 |
Yupeng Zhou; Daquan Zhou; Ming-Ming Cheng; Jiashi Feng; Qibin Hou; 周跃鹏;周大权;程明明;冯佳士;侯启斌; |
314 | PaGoDA: Progressive Growing of A One-Step Generator from A Low-Resolution Diffusion Teacher PaGoDA: 从低分辨率扩散教师逐步生成一体化生成器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. 亮点:在这种方法中,生成器的分辨率在根本上受到教师 DM 的限制。为了克服这一限制,我们提出了渐进式扩展扩散自编码器(PaGoDA),这是一种逐步提高生成器分辨率的方法,超越原始教师 DM 的分辨率。 |
Dongjun Kim; Chieh-Hsin Lai; Wei-Hsiang Liao; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon; 金东俊;赖杰信;廖维翔;滝田裕太;村田尚树;上坂利光;三冨幸;斯特法诺·埃尔蒙 |
315 | VMamba: Visual State Space Model VMamba: 视觉状态空间模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we transplant Mamba, a state-space language model, into VMamba, a vision backbone that works in linear time complexity. 亮点:在本文中,我们将 Mamba,一个状态空间语言模型,移植到 VMamba,一个以线性时间复杂度工作的视觉骨干网络中。 |
Liu Yue; Yunjie Tian; Yuzhong Zhao; Hongtian Yu; Lingxi Xie; Yaowei Wang; Qixiang Ye; Jianbin Jiao; Yunfan Liu; 刘月;田云杰;赵玉忠;于红天;谢玲熙;王耀伟;叶奇祥;焦建彬;刘云凡; |
316 | TableRAG: Million-Token Tabular Reasoning with Large Language Models TableRAG:使用大型语言模型进行百万标记的表格推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. 亮点:然而,这些方法通常需要整个表格作为输入,这导致由于位置偏差或上下文长度限制而面临可扩展性挑战。为应对这些挑战,我们引入了 TableRAG,一种专门为基于语言模型的表格理解设计的检索增强生成(RAG)框架。 |
Si-An Chen; Lesly Miculicich; Julian Eisenschlos; Zifeng Wang; Zilong Wang; Yanfei Chen; YASUHISA FUJII; Hsuan-Tien Lin; Chen-Yu Lee; Tomas Pfister; |
317 | ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings ODGS:基于全向图像的 3D 场景重建与 3D 高斯点云 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present ‘ODGS’ which includes a new rasterization appropriate for omnidirectional image projection. 亮点:我们提出了“ODGS”,其中包括一种适用于全向图像投影的新光栅化方法。 |
Suyoung Lee; Jaeyoung Chung; Jaeyoo Huh; Kyoung Mu Lee; 李水容;郑在勇;许在友;李京穆; |
318 | LSH-MoE: Communication-efficient MoE Training Via Locality-Sensitive Hashing LSH-MoE:通过局部敏感哈希实现通信高效的 MoE 训练 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose LSH-MoE, a communication-efficient MoE training framework using locality-sensitive hashing (LSH). 亮点:在本文中,我们提出了 LSH-MoE,一种使用局部敏感哈希(LSH)的通信高效的 MoE 训练框架。 |
Xiaonan Nie; Liu Qibin; Fangcheng Fu; Shenhan Zhu; Xupeng Miao; Xiaoyang Li; Yang Zhang; Shouda Liu; Bin CUI; |
319 | WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models 大规模野外团队合作:从野外越狱到(对抗性)更安全的语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce WildTeaming, an automatic red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes selections of multiple mined tactics for systematic exploration of novel and even more challenging jailbreaks. 亮点:我们介绍了 WildTeaming,一个自动化红队框架,它挖掘真实环境中的用户与聊天机器人的互动,以发现 5.7K 个独特的新的越狱战术集群,然后组合多个挖掘出的战术进行系统性探索新的甚至更具挑战性的越狱。 |
Liwei Jiang; Kavel Rao; Seungju Han; Allyson Ettinger; Faeze Brahman; Sachin Kumar; Niloofar Mireshghallah; Ximing Lu; Maarten Sap; Nouha Dziri; Yejin Choi; 李伟江; 卡维尔·拉奥; 韩承珠; 阿莉森·埃廷格; 法伊泽·布拉赫曼; 萨钦·库马尔; 尼洛法尔·米雷什加拉; 陆西明; 马尔滕·萨普; 诺哈·兹里; 崔艺珍; |
320 | Towards Flexible Visual Relationship Segmentation 朝向灵活的视觉关系分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose Flex-VRS, a single model that seamlessly integrates the above three aspects in standard and promptable visual relationship segmentation, and further possesses the capability for open-vocabulary segmentation to adapt to novel scenarios. 亮点:鉴于这些任务的复杂性和相互关联性,拥有一个灵活的框架以有效地以连贯的方式解决这些任务至关重要。在本研究中,我们提出了 Flex-VRS,这是一个单一模型,能够无缝整合上述三个方面的标准和可提示视觉关系分割,并进一步具备开放词汇分割的能力,以适应新颖场景。 |
Fangrui Zhu; Jianwei Yang; Huaizu Jiang; 方瑞 朱; 建伟 杨; 华祖 蒋; |
321 | Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control 预训练文本到图像扩散模型是多功能的表示学习者,用于控制 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, commonly used contrastively trained representations such as in CLIP have been shown to fail at enabling embodied agents to gain a sufficiently fine-grained scene understanding—a capability vital for control. To address this shortcoming, we consider representations from pre-trained text-to-image diffusion models, which are explicitly optimized to generate images from text prompts and as such, contain text-conditioned representations that reflect highly fine-grained visuo-spatial information. 亮点:然而,常用的对比训练表示,如 CLIP,已被证明无法使具身代理获得足够细致的场景理解——这一能力对控制至关重要。为了解决这一缺陷,我们考虑来自预训练文本到图像扩散模型的表示,这些模型明确优化以从文本提示生成图像,因此包含反映高度细致的视觉空间信息的文本条件表示。 |
Gunshi Gupta; Karmesh Yadav; Yarin Gal; Dhruv Batra; Zsolt Kira; Cong Lu; Tim G. J. Rudner; |
322 | Rethinking Weight Decay for Robust Fine-Tuning of Foundation Models 重新思考基础模型的鲁棒微调中的权重衰减 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper proposes a new weight decay technique, Selective Projection Decay (SPD), that selectively imposes a strong penalty on certain layers while allowing others to change freely. 亮点:本文提出了一种新的权重衰减技术,选择性投影衰减(SPD),该技术对某些层施加强烈的惩罚,同时允许其他层自由变化。 |
Junjiao Tian; Chengyue Huang; Zsolt Kira; 田俊娇; 黄承跃; 佐尔特·基拉; |
323 | NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples NaturalBench:在自然对抗样本上评估视觉-语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples. 亮点:在这项工作中,我们展示了 VLM 在处理自然图像和人类可以轻松回答的问题时仍然存在困难,我们称之为自然对抗样本。 |
Baiqi Li; Zhiqiu Lin; WENXUAN PENG; Jean de Dieu Nyandwi; Daniel Jiang; Zixian Ma; Simran Khanuja; Ranjay Krishna; Graham Neubig; Deva Ramanan; 白奇·李;志秋·林;彭文轩;让·德·迪厄·尼扬德维;丹尼尔·姜;子贤·马;西姆兰·卡努贾;兰杰·克里希纳;格雷厄姆·纽比格;德瓦·拉马南; |
324 | UNITS: A Unified Multi-Task Time Series Model UNITS:一个统一的多任务时间序列模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. 亮点:我们介绍了 UniTS,这是一种多任务时间序列模型,利用任务标记化在单一模型中表达预测和生成任务。 |
Shanghua Gao; Teddy Koker; Owen Queen; Tom Hartvigsen; Theodoros Tsiligkaridis; Marinka Zitnik; |
325 | WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models WAGLE:大型语言模型中有效和模块化的遗忘的战略权重归因 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we systematically explore how model weights interact with unlearning processes in LLMs and we design the weight attribution-guided LLM unlearning method, WAGLE, which unveils the interconnections between ‘influence’ of weights and ‘influence’ of data to forget and retain in LLM generation. 亮点:在本文中,我们系统地探讨了模型权重如何与LLMs中的遗忘过程相互作用,并设计了权重归因引导的LLM遗忘方法 WAGLE,揭示了权重的“影响”与数据的“影响”在LLM生成中遗忘和保留之间的相互联系。 |
Jinghan Jia; Jiancheng Liu; Yihua Zhang; Parikshit Ram; Nathalie Baracaldo; Sijia Liu; 贾京汉;刘建成;张怡华;帕里克希特·拉姆;娜塔莉·巴拉卡尔多;刘思佳; |
326 | Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge 受限的人类与人工智能合作:一个包容性的具身社会智能挑战 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce the Constrained Human-AI Cooperation (CHAIC), an inclusive embodied social intelligence challenge for testing social perception and cooperation in embodied agents. 亮点:我们介绍了受限人类-人工智能合作(CHAIC),这是一个包容性的具身社会智能挑战,用于测试具身代理的社会感知和合作能力。 |
Weihua Du; Qiushi Lyu; Jiaming Shan; Zhenting Qi; Hongxin Zhang; Sunli Chen; Andi Peng; Tianmin Shu; Kwonjoon Lee; Behzad Dariush; Chuang Gan; 韦华杜; 丘实吕; 嘉铭单; 振廷齐; 洪鑫张; 孙丽陈; 安迪彭; 天敏舒; 权俊李; 贝赫扎德达里乌什; 创干; |
327 | Lorentz-Equivariant Geometric Algebra Transformers for High-Energy Physics 洛伦兹等变几何代数变换器用于高能物理 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose the Lorentz Geometric Algebra Transformer (L-GATr), a new multi-purpose architecture for high-energy physics. 亮点:我们提出了洛伦兹几何代数变换器(L-GATr),一种用于高能物理的新型多用途架构。 |
Jonas Spinner; Victor Breso; Pim de Haan; Tilman Plehn; Jesse Thaler; Johann Brehmer; 乔纳斯·斯宾纳;维克托·布雷索;皮姆·德·哈恩;蒂尔曼·普莱恩;杰西·塔勒;约翰·布雷默; |
328 | Conservative Fine-Tuning of Diffusion Models from Offline Data 离线数据的扩散模型保守微调 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In offline scenarios, existing approaches tend to suffer from overoptimization, as they may be misled by the reward model in out-of-distribution regions. To address this, we introduce a conservative fine-tuning approach, BRAID, by optimizing a conservative reward model, which includes additional penalization outside of offline data distributions. 亮点:在离线场景中,现有方法往往会遭受过度优化,因为它们可能会受到分布外区域的奖励模型的误导。为了解决这个问题,我们引入了一种保守的微调方法 BRAID,通过优化一个保守的奖励模型,该模型在离线数据分布之外包含额外的惩罚。 |
Masatoshi Uehara; Yulai Zhao; Ehsan Hajiramezanali; Gabriele Scalia; Gokcen Eraslan; Avantika Lal; Sergey Levine; Tommaso Biancalani; 上原雅俊; 赵玉来; 埃赫桑·哈吉拉梅扎纳利; 加布里埃尔·斯卡利亚; 戈克岑·埃拉斯兰; 阿万提卡·拉尔; 谢尔盖·列维宁; 托马索·比安卡拉尼; |
329 | Few-Shot Task Learning Through Inverse Generative Modeling 通过逆生成建模进行少样本任务学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning, and present our approach, Few-Shot Task Learning Through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. 亮点:仅通过少量示例,了解代理的意图(由其目标或运动风格定义)通常是极具挑战性的。我们将这个问题称为任务概念学习,并提出我们的方法,即通过逆生成建模的少样本任务学习(FTL-IGM),该方法通过利用可逆神经生成模型来学习新的任务概念。 |
Aviv Netanyahu; Yilun Du; Jyothish Pari; Josh Tenenbaum; Tianmin Shu; Pulkit Agrawal; |
330 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers SPIQA: 一个用于科学论文多模态问答的数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. To address this limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the first large-scale QA dataset specifically designed to interpret complex figures and tables within the context of scientific research articles across various domains of computer science. 亮点:然而,现有的基于科学论文的问题回答(QA)数据集在规模上有限,并且仅关注文本内容。为了解决这一限制,我们引入了 SPIQA(科学论文图像问答),这是第一个专门设计用于在各种计算机科学领域的科学研究文章中解释复杂图形和表格的大规模 QA 数据集。 |
Shraman Pramanick; Rama Chellappa; Subhashini Venugopalan; |
331 | Theoretical Guarantees in KL for Diffusion Flow Matching 扩散流匹配中的 KL 理论保证 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: The main contribution of this paper is to provide relatively mild assumption on $\nu^\star$, $\mu$ and $\pi$ to obtain non-asymptotics guarantees for Diffusion Flow Matching (DFM) models using as bridge the conditional distribution associated with the Brownian motion. 亮点:本文的主要贡献是对$\nu^\star$、$\mu$和$\pi$提供相对温和的假设,以获得扩散流匹配(DFM)模型的非渐近保证,利用与布朗运动相关的条件分布作为桥梁。 |
Marta Gentiloni Silveri; Alain Durmus; Giovanni Conforti; 马尔塔·根蒂利奥尼·西尔维里;阿兰·杜尔穆斯;乔瓦尼·孔福尔蒂; |
332 | TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation TransVIP:语音到语音翻译系统,具有语音和等时性保留 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. 亮点:在本研究中,我们引入了一种新颖的模型框架 TransVIP,该框架以级联方式利用多样化的数据集,同时通过联合概率实现端到端推理。 |
Chenyang Le; Yao Qian; Dongmei Wang; Long Zhou; Shujie LIU; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Michael Zeng; |
333 | Neural Isometries: Taming Transformations for Equivariant ML 神经等距:驯服等变机器学习的变换 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. 亮点:在本文中,我们介绍了神经等距性(Neural Isometries),这是一种自编码器框架,旨在学习将观察空间映射到一个通用的潜在空间,在该空间中,当其对应的观察在世界空间中几何相关时,编码之间通过等距关系相互关联。 |
Thomas Mitchel; Michael Taylor; Vincent Sitzmann; 托马斯·米切尔; 迈克尔·泰勒; 文森特·西茨曼; |
334 | Is Behavior Cloning All You Need? Understanding Horizon in Imitation Learning 行为克隆是你所需的一切吗?理解模仿学习中的视野 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Through a new analysis of BC with the logarithmic loss, we show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. 亮点:通过对具有对数损失的 BC 进行新的分析,我们表明,只要 (i) 控制累积收益的范围,并且 (ii) 控制策略类的适当监督学习复杂度,那么在离线 IL 中可以实现与视野无关的样本复杂度。 |
Dylan J Foster; Adam Block; Dipendra Misra; 迪伦·J·福斯特; 亚当·布洛克; 迪彭德拉·米斯拉; |
335 | Multiple Physics Pretraining for Spatiotemporal Surrogate Models 多物理预训练用于时空代理模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. 亮点:我们引入了多物理预训练(MPP),这是一种自回归的任务无关预训练方法,旨在利用变换器对时空系统进行物理代理建模。 |
Michael McCabe; Bruno Régaldo-Saint Blancard; Liam Parker; Ruben Ohana; Miles Cranmer; Alberto Bietti; Michael Eickenberg; Siavash Golkar; Geraud Krawezik; Francois Lanusse; Mariel Pettee; Tiberiu Tesileanu; Kyunghyun Cho; Shirley Ho; 迈克尔·麦凯布;布鲁诺·雷加尔多-圣布朗卡德;利亚姆·帕克;鲁本·奥哈纳;迈尔斯·克朗默;阿尔贝托·比耶蒂;迈克尔·艾肯伯格;西亚瓦什·戈尔卡;热劳德·克拉维齐克;弗朗索瓦·拉努斯;马里尔·佩特;提贝留·特西莱亚努;曹庆贤;雪莉·霍; |
336 | Decoupled Kullback-Leibler Divergence Loss 解耦的 Kullback-Leibler 散度损失 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we delve deeper into the Kullback–Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error ($\mathbf{w}$MSE) loss and 2) a Cross-Entropy loss incorporating soft labels. 亮点:在本文中,我们深入探讨了 Kullback–Leibler (KL) Divergence 损失,并数学上证明它等同于由 1) 加权均方误差 ($\mathbf{w}$MSE) 损失和 2) 包含软标签的交叉熵损失组成的解耦 Kullback-Leibler (DKL) Divergence 损失。 |
Jiequan Cui; Zhuotao Tian; Zhisheng Zhong; Xiaojuan Qi; Bei Yu; Hanwang Zhang; 崔杰泉; 田卓涛; 钟志生; 齐晓娟; 于贝; 张汉旺; |
337 | Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving LLMs的元认知能力:数学问题解决中的探索 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. 亮点:本文提供了证据表明他们也具备元认知知识,包括在给定任务时命名技能和程序的能力。 |
Aniket Didolkar; Anirudh Goyal; Nan Rosemary Ke; Siyuan Guo; Michal Valko; Timothy Lillicrap; Danilo Jimenez Rezende; Yoshua Bengio; Michael Mozer; Sanjeev Arora; |
338 | ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty ConceptMix:一个具有可控难度的组合图像生成基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We propose ConceptMix, a scalable, controllable, and customizable benchmark consisting of two stages: (a) With categories of visual concepts (e.g., objects, colors, shapes, spatial relationships), it randomly samples an object and $k$-tuples of visual concepts to generate text prompts with GPT-4o for image generation. 亮点:我们提出了 ConceptMix,这是一个可扩展、可控且可定制的基准,分为两个阶段:(a)通过视觉概念的类别(例如,物体、颜色、形状、空间关系),随机抽取一个物体和$k$-元组的视觉概念,以生成用于图像生成的文本提示,使用 GPT-4o。 |
Xindi Wu; Dingli Yu; Yangsibo Huang; Olga Russakovsky; Sanjeev Arora; 吴新迪;余定力;黄扬思博;奥尔加·鲁萨科夫斯基;桑吉夫·阿罗拉; |
339 | Base of RoPE Bounds Context Length RoPE 边界上下文长度的基础 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. 亮点:我们重新审视了 RoPE 在LLMs中的作用,并提出了一种长期衰减的新特性,我们推导出 RoPE 的基数限制上下文长度:为了获得特定的上下文长度能力,基数值存在一个绝对下限。 |
Xin Men; Mingyu Xu; Qingyu Zhang; Bingning Wang; Hongyu Lin; Xianpei Han; weipeng chen; |
340 | Fight Back Against Jailbreaking Via Prompt Adversarial Tuning 通过对抗性调优反击越狱行为 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, motivated by adversarial training paradigms for achieving reliable robustness, we propose an approach named **Prompt Adversarial Tuning (PAT)** that trains a prompt control attached to the user prompt as a guard prefix. 亮点:在本文中,基于对抗训练范式以实现可靠的鲁棒性,我们提出了一种名为 **Prompt Adversarial Tuning (PAT)** 的方法,该方法训练附加在用户提示上的提示控制作为保护前缀。 |
Yichuan Mo; Yuji Wang; Zeming Wei; Yisen Wang; 易川莫;于吉王;泽明魏;怡森王; |
341 | PointMamba: A Simple State Space Model for Point Cloud Analysis PointMamba:一种用于点云分析的简单状态空间模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. 亮点:在本文中,我们提出了 PointMamba,将最近的代表性状态空间模型(SSM)Mamba 的成功从自然语言处理(NLP)转移到点云分析任务中。 |
Dingkang Liang; Xin Zhou; Wei Xu; xingkui zhu; Zhikang Zou; Xiaoqing Ye; Xiao Tan; Xiang Bai; 丁康 梁; 辛 周; 魏 旭; xingkui zhu; 志康 邹; 小青 叶; 小 谭; 向 白; |
342 | Breaking Determinism: Fuzzy Modeling of Sequential Recommendation Using Discrete State Space Diffusion Model 打破决定论:使用离散状态空间扩散模型的序列推荐模糊建模 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Inspired by fuzzy information processing theory, this paper introduces the DDSR model, which uses fuzzy sets of interaction sequences to overcome the limitations and better capture the evolution of users’ real interests. 亮点:受模糊信息处理理论的启发,本文引入了 DDSR 模型,该模型使用交互序列的模糊集来克服局限性,更好地捕捉用户真实兴趣的演变。 |
Wenjia Xie; Hao Wang; Luankang Zhang; Rui Zhou; Defu Lian; Enhong Chen; |
343 | Approaching Human-Level Forecasting with Language Models 利用语言模型接近人类水平的预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. 亮点:在本研究中,我们探讨语言模型(LMs)是否能够达到与竞争性人类预测者相当的预测水平。 |
Danny Halawi; Fred Zhang; Chen Yueh-Han; Jacob Steinhardt; 丹尼·哈拉维;弗雷德·张;陈月汉;雅各布·斯坦哈特; |
344 | InterControl: Zero-shot Human Interaction Generation By Controlling Every Joint InterControl:通过控制每个关节实现零-shot 人类交互生成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce a novel controllable motion generation method, InterControl, to encourage the synthesized motions maintaining the desired distance between joint pairs. 亮点:我们提出了一种新颖的可控运动生成方法 InterControl,以鼓励合成的运动保持关节对之间的期望距离。 |
Zhenzhi Wang; Jingbo Wang; Yixuan Li; Dahua Lin; Bo Dai; 王振志; 王静波; 李逸轩; 林大华; 戴博 |
345 | Reversing The Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference 逆转遗忘-保留目标:一种基于 Logit 差异的高效LLM反学习框架 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a novel unlearning framework called Unlearning from Logit Difference (ULD), which introduces an assistant LLM that aims to achieve the opposite of the unlearning goals: remembering the forget documents and forgetting the retain knowledge. 亮点:在本文中,我们提出了一种新颖的反学习框架,称为基于对数差异的反学习(ULD),该框架引入了一个助手LLM,旨在实现反学习目标的相反:记住被遗忘的文档并遗忘保留的知识。 |
Jiabao Ji; Yujian Liu; Yang Zhang; Gaowen Liu; Ramana Kompella; Sijia Liu; Shiyu Chang; |
346 | Emu3D: Text-to-Mesh Generation with High-Quality Geometry, Texture, and PBR Materials Emu3D:高质量几何体、纹理和 PBR 材料的文本到网格生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present Emu3D, a significant advancement in text-to-3D which produces faithful, high-quality meshes with full material control. 亮点:我们推出了 Emu3D,这是文本到 3D 领域的一项重大进展,能够生成真实、高质量的网格,并提供全面的材质控制。 |
Yawar Siddiqui; Filippos Kokkinos; Tom Monnier; Mahendra Kariya; Yanir Kleiman; Emilien Garreau; Oran Gafni; Natalia Neverova; Andrea Vedaldi; David Novotny; Roman Shapovalov; |
347 | One-Step Effective Diffusion Network for Real-World Image Super-Resolution 一阶有效扩散网络用于真实世界图像超分辨率 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Meanwhile, the random noise introduces uncertainty in the output, which is unfriendly to image restoration tasks. To address these issues, we propose a one-step effective diffusion network, namely OSEDiff, for the Real-ISR problem. 亮点:与此同时,随机噪声在输出中引入了不确定性,这对图像恢复任务不利。为了解决这些问题,我们提出了一种一步有效的扩散网络,即 OSEDiff,用于真实图像超分辨率(Real-ISR)问题。 |
Rongyuan Wu; Lingchen Sun; Zhiyuan Ma; Lei Zhang; 吴荣远; 孙灵辰; 马志远; 张磊; |
348 | Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models Delta-CoMe:针对大型语言模型的无训练 Delta 压缩与混合精度 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Motivated by the long-tail distribution of singular values in the delta weights, we propose a delta quantization approach using mixed-precision. 亮点:受到 delta 权重中奇异值长尾分布的启发,我们提出了一种使用混合精度的 delta 量化方法。 |
Bowen Ping; Shuo Wang; Hanqing Wang; Xu Han; Yuzhuang Xu; Yukun Yan; Yun Chen; Baobao Chang; Zhiyuan Liu; Maosong Sun; |
349 | CosAE: Learnable Fourier Series for Image Restoration CosAE:可学习的傅里叶级数用于图像恢复 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce Cosine Autoencoder (CosAE), a novel, generic Autoencoder that seamlessly leverages the classic Fourier series with a feed-forward neural network. 亮点:在本文中,我们介绍了余弦自编码器(CosAE),一种新颖的通用自编码器,它无缝地结合了经典的傅里叶级数和前馈神经网络。 |
Sifei Liu; Shalini De Mello; Jan Kautz; |
350 | Explaining Text Datasets with Language Parameters 用语言参数解释文本数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To make model parameters directly interpretable, we introduce a family of statistical models—including clustering, time-series, and classification models—parameterized by *natural language predicates*. 亮点:为了使模型参数直接可解释,我们引入了一系列统计模型——包括聚类、时间序列和分类模型——由*自然语言谓词*参数化。 |
Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt; |
351 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos 一个令牌分割所有:视频中的语言指导推理分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. 亮点:我们介绍了 VideoLISA,这是一种基于视频的多模态大型语言模型,旨在解决视频中语言指令推理分割的问题。 |
Zechen Bai; Tong He; Haiyang Mei; Pichao WANG; Ziteng Gao; Joya Chen; liulei; Zheng Zhang; Mike Zheng Shou; |
352 | MeshFormer : High-Quality Mesh Generation with 3D-Guided Reconstruction Model MeshFormer:基于 3D 引导重建模型的高质量网格生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. 亮点:在本研究中,我们介绍了 MeshFormer,这是一种稀疏视图重建模型,明确利用了 3D 原生结构、输入指导和训练监督。 |
Minghua Liu; Chong Zeng; Xinyue Wei; Ruoxi Shi; Linghao Chen; Chao Xu; Mengqi Zhang; Zhaoning Wang; Xiaoshuai Zhang; Isabella Liu; Hongzhi Wu; Hao Su; 刘明华; 曾冲; 魏欣悦; 施若熙; 陈灵浩; 许超; 张梦琪; 王兆宁; 张晓帅; 刘伊莎贝拉; 吴洪志; 苏浩; |
353 | Q-VLM: Post-training Quantization for Large Vision-Language Models Q-VLM:大型视觉语言模型的后训练量化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. 亮点:在本文中,我们提出了一种大型视觉语言模型(LVLMs)的后训练量化框架,以实现高效的多模态推理。 |
Changyuan Wang; Ziwei Wang; Xiuwei Xu; Yansong Tang; Jie Zhou; Jiwen Lu; 王长远;王子威;徐秀伟;唐延松;周杰;卢嘉文; |
354 | FuseFL: One-Shot Federated Learning Through The Lens of Causality with Progressive Model Fusion FuseFL:通过因果关系视角进行一次性联邦学习与渐进式模型融合 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. 亮点:然而,先进的 OFL 方法的性能远远落后于正常的 FL。在这项工作中,我们提供了一种因果视角,发现 OFL 方法性能下降的原因在于孤立问题,这意味着在 OFL 中局部孤立训练的模型可能由于数据异质性而容易适应虚假相关性。 |
Zhenheng Tang; Yonggang Zhang; Peijie Dong; Yiu-ming Cheung; Amelie Zhou; Bo Han; Xiaowen Chu; 唐振恒; 张永刚; 董培杰; 张耀明; 周阿梅; 韩博; 朱晓文; |
355 | TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning TorchSpatial:一种用于空间表征学习的位置编码框架和基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Even though SRL has become the foundation of almost all geospatial artificial intelligence (GeoAI) research, we have not yet seen significant efforts to develop an extensive deep learning framework and benchmark to support SRL model development and evaluation. To fill this gap, we propose TorchSpatial, a learning framework and benchmark· for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. 亮点:尽管空间表示学习(SRL)已成为几乎所有地理空间人工智能(GeoAI)研究的基础,但我们尚未看到为支持 SRL 模型开发和评估而开发广泛深度学习框架和基准的重大努力。为填补这一空白,我们提出了 TorchSpatial,这是一个用于位置(点)编码的学习框架和基准,它是空间表示学习中最基本的数据类型之一。 |
Nemin Wu; Qian Cao; Zhangyu Wang; Zeping Liu; Yanlin Qi; Jielu Zhang; Joshua Ni; X. Yao; Hongxu Ma; Lan Mu; Stefano Ermon; Tanuja Ganu; Akshay Nambi; Ni Lao; Gengchen Mai; |
356 | Generalizable Implicit Motion Modeling for Video Frame Interpolation 可泛化的隐式运动建模用于视频帧插值 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Existing paradigms either simply consider linear combinations of bidirectional flows or directly predict bilateral flows with the condition of timestamps, lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. 亮点:现有范式要么仅考虑双向流的线性组合,要么在时间戳条件下直接预测双边流,缺乏有效建模现实世界视频中的时空动态的能力。为了解决这一局限性,在本研究中,我们引入了可推广的隐式运动建模(GIMM),这是一种新颖且有效的运动建模方法,用于视频帧插值(VFI)。 |
Zujin Guo; Wei Li; Chen Change Loy; |
357 | Weak-to-Strong Search: Align Large Language Models Via Searching Over Small Language Models 弱到强搜索:通过在小型语言模型上搜索来对齐大型语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we introduce *weak-to-strong search*, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model. 亮点:在这项工作中,我们引入了*弱到强搜索*,将大型语言模型的对齐框架视为一种测试时贪婪搜索,以最大化小型调优模型和未调优模型之间的对数似然差异,同时从冻结的大型模型中进行采样。 |
Zhanhui Zhou; Zhixuan Liu; Jie Liu; Zhichen Dong; Chao Yang; Yu Qiao; 周展辉; 刘志轩; 刘杰; 董志辰; 杨超; 乔宇; |
358 | Pandora’s Box: Towards Building Universal Attackers Against Real-World Large Vision-Language Models 潘多拉的盒子:构建针对现实世界大型视觉-语言模型的通用攻击者 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Motivated by the research gap and practical demands, in this paper, we make the first attempt to build a universal attacker against real-world LVLMs, focusing on two critical aspects: (i) restricting access to only the LVLM inputs and outputs. 亮点:受到研究空白和实际需求的激励,本文首次尝试构建一个针对现实世界 LVLM 的通用攻击者,重点关注两个关键方面:(i)限制仅访问 LVLM 的输入和输出。 |
Daizong Liu; Mingyu Yang; Xiaoye Qu; Pan Zhou; Xiang Fang; Keke Tang; Yao Wan; Lichao Sun; 刘代宗; 杨明宇; 曲晓烨; 周潘; 方翔; 唐可可; 万瑶; 孙立超; |
359 | Semi-Truths: A Large-Scale Dataset for Testing Robustness of AI-Generated Image Detectors 半真相:用于测试 AI 生成图像检测器鲁棒性的一个大规模数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Do they exhibit biases towards specific scenes or data distributions? To address these questions, we introduce Semi-Truths, featuring 27,635 real images, 245,360 masks, and 850,226 AI-augmented images featuring varying degrees of targeted and localized edits, created using diverse augmentation methods, diffusion models, and data distributions. 亮点:它们是否对特定场景或数据分布表现出偏见?为了解决这些问题,我们引入了 Semi-Truths,包含 27,635 张真实图像、245,360 个掩码和 850,226 张具有不同程度的针对性和局部编辑的 AI 增强图像,这些图像是使用多种增强方法、扩散模型和数据分布创建的。 |
Anisha Pal; Julia Kruk; Mansi Phute; Manognya Bhattaram; Diyi Yang; Duen Horng Chau; Judy Hoffman; |
360 | Demystify Mamba in Vision: A Linear Attention Perspective 揭示视觉中的 Mamba:线性注意力视角 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we explore the similarities and disparities between Mamba and linear attention Transformer, providing comprehensive analyses to demystify the key factors behind Mamba’s success. 亮点:在本文中,我们探讨了 Mamba 与线性注意力 Transformer 之间的相似性和差异,提供了全面的分析,以揭示 Mamba 成功背后的关键因素。 |
Dongchen Han; Ziyi Wang; Zhuofan Xia; Yizeng Han; Yifan Pu; Chunjiang Ge; Jun Song; Shiji Song; Bo Zheng; Gao Huang; 董晨 韩; 子怡 王; 卓凡 夏; 逸增 韩; 逸凡 蒲; 春江 葛; 军 宋; 世基 宋; 博 郑; 高 黄; |
361 | Bridging The Divide: Reconsidering Softmax and Linear Attention 弥合鸿沟:重新考虑 Softmax 和线性注意力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we take a step forward to close the gap between the linear and Softmax attention with novel theoretical analyses, which demystify the core factors behind the performance deviations. 亮点:在本文中,我们通过新的理论分析向前迈出了一步,以缩小线性注意力和 Softmax 注意力之间的差距,这些分析揭示了性能偏差背后的核心因素。 |
Dongchen Han; Yifan Pu; Zhuofan Xia; Yizeng Han; Xuran Pan; Xiu Li; Jiwen Lu; Shiji Song; Gao Huang; 董晨 韩; 逸凡 蒲; 卓凡 夏; 逸增 韩; 旭然 潘; 秀 李; 继文 陆; 世基 宋; 高 黄; |
362 | Segment Any Change 分段任何更改 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose the segment any change models (AnyChange), a new type of change detection model that supports zero-shot prediction and generalization on unseen change types and data distributions. 亮点:在本文中,我们提出了段落任何变化模型(AnyChange),这是一种新的变化检测模型,支持零样本预测和对未见变化类型及数据分布的泛化。 |
Zhuo Zheng; Yanfei Zhong; Liangpei Zhang; Stefano Ermon; |
363 | Clustering in Causal Attention Masking 因果注意力掩蔽中的聚类 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This work presents a modification of the self-attention dynamics proposed in Geshkovski et al to better reflect the practically relevant causally masked attention used in transformer architectures for generative AI. 亮点:本工作提出了一种对 Geshkovski 等人提出的自注意力动态的修改,以更好地反映在生成性人工智能的变换器架构中使用的实际相关的因果掩蔽注意力。 |
Nikita Karagodin; Yury Polyanskiy; Philippe Rigollet; 尼基塔·卡拉戈丁; 尤里·波利扬斯基; 菲利普·里戈莱 |
364 | MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models MATES:基于模型的数据选择以利用数据影响模型进行高效预训练 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we introduce \textit{model-aware data selection with data influence models (MATES)}, where a data influence model continuously adapts to the evolving data preferences of the main pretraining model, thus selecting data most effective for the model’s current learning progress. 亮点:在本文中,我们介绍了\textit{模型感知数据选择与数据影响模型(MATES)},其中数据影响模型持续适应主预训练模型不断变化的数据偏好,从而选择对模型当前学习进展最有效的数据。 |
Zichun Yu; Spandan Das; Chenyan Xiong; 余自春; 达斯潘; 熊晨焱; |
365 | NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking NAVSIM:数据驱动的非反应性自主车辆仿真与基准测试 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This has resulted in an inability to draw clear conclusions from the rapidly growing body of research on end-to-end autonomous driving. In this paper, we present NAVSIM, a middle ground between these evaluation paradigms, where we use large datasets in combination with a non-reactive simulator to enable large-scale real-world benchmarking. 亮点:这导致无法从快速增长的端到端自动驾驶研究中得出明确的结论。在本文中,我们提出了 NAVSIM,这是一种介于这些评估范式之间的折中方案,我们结合使用大型数据集和非反应性模拟器,以实现大规模的现实世界基准测试。 |
Daniel Dauner; Marcel Hallgarten; Tianyu Li; Xinshuo Weng; Zhiyu Huang; Zetong Yang; Hongyang Li; Igor Gilitschenski; Boris Ivanovic; Marco Pavone; Andreas Geiger; Kashyap Chitta; 丹尼尔·道纳;马塞尔·哈尔根;李天宇;翁新硕;黄志宇;杨泽彤;李洪阳;伊戈尔·吉利琴斯基;博里斯·伊万诺维奇;马尔科·帕沃内;安德烈亚斯·盖格;卡夏普·奇塔; |
366 | DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ DeTikZify:使用 TikZ 合成科学图形和草图的图形程序 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. 亮点:此外,重新创建未以保留语义信息格式存储的现有图形同样复杂。为了解决这个问题,我们引入了 DeTikZify,这是一种新颖的多模态语言模型,能够根据草图和现有图形自动合成作为保留语义的 TikZ 图形程序的科学图形。 |
Jonas Belouadi; Simone Ponzetto; Steffen Eger; 乔纳斯·贝洛阿迪;西蒙娜·庞泽托;斯特芬·埃格尔; |
367 | Learning from Highly Sparse Spatio-temporal Data 从高度稀疏的时空数据中学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We provide a theoretical analysis revealing that such iterative models are not only susceptible to data sparsity but also to graph sparsity, causing unstable performances on different datasets. To overcome these limitations, we introduce a novel method named One-step Propagation and Confidence-based Refinement (OPCR). 亮点:我们提供了一个理论分析,揭示了这种迭代模型不仅容易受到数据稀疏的影响,还容易受到图稀疏的影响,从而导致在不同数据集上的性能不稳定。为了克服这些局限性,我们提出了一种名为“一步传播与基于置信度的精炼”(OPCR)的新方法。 |
Leyan Deng; Chenwang Wu; Defu Lian; Enhong Chen; 邓乐言; 吴晨望; 连德福; 陈恩宏; |
368 | Generative Hierarchical Materials Search 生成层次材料搜索 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we formulate end-to-end language-to-structure generation as a multi-objective optimization problem, and propose Generative Hierarchical Materials Search (GenMS) for controllable generation of crystal structures. 亮点:在本研究中,我们将端到端的语言到结构生成形式化为多目标优化问题,并提出了生成层次材料搜索(GenMS),用于可控生成晶体结构。 |
Sherry Yang; Simon Batzner; Ruiqi Gao; Muratahan Aykol; Alexander Gaunt; Brendan C McMorrow; Danilo Jimenez Rezende; Dale Schuurmans; Igor Mordatch; Ekin Dogus Cubuk; |
369 | CAPE: Context-Adaptive Positional Encoding for Length Extrapolation CAPE:上下文自适应位置编码用于长度外推 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a Context-Adaptive Positional Encoding (CAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. 亮点:在本文中,我们提出了一种上下文自适应位置编码(CAPE)方法,该方法基于输入上下文和学习到的固定先验动态和语义地进行调整。 |
Chuanyang Zheng; Yihang Gao; Han Shi; Minbin Huang; Jingyao Li; Jing Xiong; Xiaozhe Ren; Michael Ng; Xin Jiang; Zhenguo Li; Yu Li; |
370 | SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation SG-Nav: 基于LLM的零-shot 对象导航在线 3D 场景图提示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a new framework for zero-shot object navigation. 亮点:在本文中,我们提出了一种新的零-shot 物体导航框架。 |
Hang Yin; Xiuwei Xu; Zhenyu Wu; Jie Zhou; Jiwen Lu; |
371 | Small Steps No More: Global Convergence of Stochastic Gradient Bandits for Arbitrary Learning Rates 小步伐不再:随机梯度强盗在任意学习率下的全球收敛 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. 亮点:我们通过展示随机梯度强盗算法几乎肯定地收敛到全局最优策略,提供了对该算法的新理解,使用\emph{任何}常数学习率。 |
Jincheng Mei; Bo Dai; Alekh Agarwal; Sharan Vaswani; Anant Raj; Csaba Szepesvari; Dale Schuurmans; 梅金城; 戴博; 阿莱克·阿加瓦尔; 沙兰·瓦斯瓦尼; 阿南特·拉杰; 查巴·塞佩斯瓦里; 戴尔·舒尔曼斯; |
372 | QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation QUEST: 质量感知的城市-哈斯廷斯采样用于机器翻译 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we address the problem of sampling a set of high-quality and diverse translations. 亮点:在本文中,我们解决了采样一组高质量和多样化翻译的问题。 |
Gonçalo Faria; Sweta Agrawal; António Farinhas; Ricardo Rei; José de Souza; André Martins; |
373 | Optimal Ablation for Model Internals 模型内部的最佳消融 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We argue for the adoption of optimal ablation of activations for studying model internals and show that it has theoretical and empirical advantages over popular methods for component ablation. 亮点:我们主张采用最佳激活消融方法来研究模型内部,并展示其在理论和实证上相较于流行的组件消融方法具有优势。 |
Maximilian Li; Lucas Janson; 马克西米连·李; 卢卡斯·扬森; |
374 | Probablistic Emulation of A Global Climate Model with Spherical DYffusion 球面扩散的全球气候模型的概率仿真 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Here, we present the first conditional generative model able to produce global climate ensemble projections that are accurate and physically consistent. 亮点:在这里,我们提出了第一个条件生成模型,能够生成准确且物理一致的全球气候集合预测。 |
Salva Rühling Cachay; Brian Henn; Oliver Watt-Meyer; Christopher S. Bretherton; Rose Yu; |
375 | Can Models Learn Skill Composition from Examples? 模型能从示例中学习技能组合吗? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we employ a setup akin to Skill-Mix to evaluate the capacity of smaller models to learn compositional generalization from examples. 亮点:在本文中,我们采用类似于 Skill-Mix 的设置来评估较小模型从示例中学习组合泛化的能力。 |
Haoyu Zhao; Simran Kaur; Dingli Yu; Anirudh Goyal; Sanjeev Arora; |
376 | Image Understanding Makes for A Good Tokenizer for Image Generation 图像理解为图像生成提供了良好的分词器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, the potential of IU models to improve IG performance remains uncharted. We address this issue using a token-based IG framework, which relies on effective tokenizers to project images into token sequences. 亮点:然而,IU 模型提高 IG 性能的潜力仍然未被探索。我们使用基于令牌的 IG 框架来解决这个问题,该框架依赖于有效的令牌化工具将图像投影为令牌序列。 |
Luting Wang; Yang Zhao; Zijian Zhang; Jiashi Feng; Si Liu; Bingyi Kang; |
377 | FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models FiVA: 用于文本到图像扩散模型的细粒度视觉属性数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, letting users apply characteristics like lighting, texture, and dynamics from different images. 亮点:在本研究中,我们提出了一种更有效的方法,将图像的美学分解为特定的视觉属性,使用户能够从不同的图像中应用光照、纹理和动态等特征。 |
Tong Wu; Yinghao Xu; Ryan Po; Mengchen Zhang; Guandao Yang; Jiaqi Wang; Ziwei Liu; Dahua Lin; Gordon Wetzstein; 通吴;英浩徐;瑞安波;孟晨张;关道杨;佳琪王;子维刘;大华林;戈登韦茨坦; |
378 | FilterNet: Harnessing Frequency Filters for Time Series Forecasting FilterNet:利用频率滤波器进行时间序列预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we explore a novel perspective of enlightening signal processing for deep time series forecasting. 亮点:在本文中,我们探讨了一种新颖的信号处理视角,以启发深度时间序列预测。 |
Kun Yi; Wei Fan; Qi Zhang; Hui He; Jingru Fei; Shufeng Hao; Defu Lian; 昆毅;魏凡;齐张;惠赫;静如费;舒峰郝;德福连; |
379 | PeRFlow: Piecewise Rectified Flow As Universal Plug-and-Play Accelerator PeRFlow:分段整流流动作为通用即插即用加速器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. 亮点:我们提出了分段整流流(PeRFlow),这是一种基于流的方法,用于加速扩散模型。 |
Hanshu Yan; Xingchao Liu; Jiachun Pan; Jun Hao Liew; Qiang Liu; Jiashi Feng; 汉书燕;邢超刘;贾春潘;刘俊浩;强刘;贾士丰; |
380 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context MoVA:将视觉专家混合模型适应于多模态上下文 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. 亮点:在粗粒度阶段,我们设计了一种上下文感知的专家路由策略,以根据用户指令、输入图像和视觉专家的专业知识动态选择最合适的视觉专家。 |
ZHUOFAN ZONG; Bingqi Ma; Dazhong Shen; Guanglu Song; Hao Shao; DONGZHI JIANG; Hongsheng Li; Yu Liu; |
381 | MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models MetaAligner:朝着可泛化的多目标语言模型对齐迈进 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we propose Meta-Objective Aligner (MetaAligner), the first policy-agnostic and generalizable method for multi-objective preference alignment. 亮点:在本研究中,我们提出了 Meta-Objective Aligner(MetaAligner),这是首个与策略无关且可泛化的多目标偏好对齐方法。 |
Kailai Yang; Zhiwei Liu; Qianqian Xie; Jimin Huang; Tianlin Zhang; Sophia Ananiadou; |
382 | Consistency Purification: Effective and Efficient Diffusion Purification Towards Certified Robustness 一致性净化:有效且高效的扩散净化以实现认证的鲁棒性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. 亮点:在本研究中,我们证明了理想的纯化流程应该在数据流形上生成与原始图像在语义上尽可能一致的纯化图像,以提高一步法的有效性和效率。 |
Yiquan Li; Zhongzhu Chen; Kun Jin; Jiongxiao Wang; Bo Li; Chaowei Xiao; 李毅泉; 陈忠柱; 金琨; 王炯晓; 李博; 肖超伟; |
383 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching CoMat:将文本到图像扩散模型与图像到文本概念匹配对齐 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the image-to-text concept matching mechanism. 亮点:为了解决这两个挑战,我们提出了 CoMat,一种端到端的扩散模型微调策略,具有图像到文本的概念匹配机制。 |
DONGZHI JIANG; Guanglu Song; Xiaoshi Wu; Renrui Zhang; Dazhong Shen; ZHUOFAN ZONG; Yu Liu; Hongsheng Li; |
384 | Panacea: Pareto Alignment Via Preference Adaptation for LLMs 万灵药:通过偏好适应实现的帕累托对齐 LLMs Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents Panacea, an innovative approach that reframes alignment as a multi-dimensional preference optimization problem. 亮点:本文提出了 Panacea,一种将对齐重新构建为多维偏好优化问题的创新方法。 |
Yifan Zhong; Chengdong Ma; Xiaoyuan Zhang; Ziran Yang; Haojun Chen; Qingfu Zhang; Siyuan Qi; Yaodong Yang; |
385 | Adaptive Preference Scaling for Reinforcement Learning with Human Feedback 基于人类反馈的强化学习自适应偏好缩放 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO), designed to address this uncertainty in preference strength. 亮点:然而,由于各种原因,这些数据通常以轨迹片段对的排名形式出现,这无法捕捉到不同对之间偏好强度的变化。在本文中,我们提出了一种新颖的自适应偏好损失,基于分布鲁棒优化(DRO),旨在解决偏好强度的不确定性。 |
Ilgee Hong; Zichong Li; Alexander Bukharin; Yixiao Li; Haoming Jiang; Tianbao Yang; Tuo Zhao; |
386 | CogVLM: Visual Expert for Pretrained Language Models CogVLM:预训练语言模型的视觉专家 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce CogVLM, a powerful open-source visual language foundation model. 亮点:我们介绍了 CogVLM,一个强大的开源视觉语言基础模型。 |
Weihan Wang; Qingsong Lv; Wenmeng Yu; Wenyi Hong; Ji Qi; Yan Wang; Junhui Ji; Zhuoyi Yang; Lei Zhao; Song XiXuan; Jiazheng Xu; Keqin Chen; Bin Xu; Juanzi Li; Yuxiao Dong; Ming Ding; Jie Tang; 韦汉 王; 青松 吕; 文梦 于; 文怡 洪; 吉 奇; 骏 王; 君辉 季; 卓毅 杨; 雷 赵; 宋 昔玄; 佳政 许; 克勤 陈; 彬 许; 罐子 李; 欣晓 董; 明 丁; 杰 唐; |
387 | MambaTalk: Co-Speech Gesture Generation with Selective State Space Models MambaTalk:使用选择性状态空间模型生成共语手势 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we explore the potential of state space models (SSMs). 亮点:在本研究中,我们探讨状态空间模型(SSMs)的潜力。 |
Zunnan Xu; Yukang Lin; Haonan Han; Sicheng Yang; Ronghui Li; Yachao Zhang; Xiu Li; |
388 | Data-Efficient Learning with Neural Programs 数据高效学习与神经程序 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present an algorithm for learning neural programs, called ISED, that only relies on input-output samples of black-box components. 亮点:我们提出了一种学习神经程序的算法,称为 ISED,该算法仅依赖于黑箱组件的输入输出样本。 |
Alaia Solko-Breslin; Seewon Choi; Ziyang Li; Neelay Velingker; Rajeev Alur; Mayur Naik; Eric Wong; |
389 | Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly 基于深度先验组装的单幅图像零-shot 场景重建 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we present deep prior assembly, a novel framework that assembles diverse deep priors from large models for scene generation from single images in a zero-shot manner. 亮点:在本研究中,我们提出了深度先验组装,这是一种新颖的框架,能够以零-shot 的方式从单幅图像生成场景,组装来自大型模型的多样化深度先验。 |
Junsheng Zhou; Yu-Shen Liu; Zhizhong Han; 周俊生; 刘宇深; 韩志忠; |
390 | Unveiling Encoder-Free Vision-Language Models 揭示无编码器的视觉-语言模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we bridge the gap between encoder-based and encoder-free models and present a simple yet effective training recipe towards pure LVLMs. 亮点:在本研究中,我们弥合了基于编码器和无编码器模型之间的差距,并提出了一种简单而有效的训练方案,以实现纯 LVLM。 |
Haiwen Diao; Yufeng Cui; Xiaotong Li; Yueze Wang; Huchuan Lu; Xinlong Wang; 海文 钓; 余峰 崔; 小彤 李; 月泽 王; 胡川 陆; 新龙 王; |
391 | Selective Attention: Enhancing Transformer Through Principled Context Control 选择性注意:通过原则性上下文控制增强变换器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: While self-attention has enjoyed major success, it notably treats all queries $q$ in the same way by applying the mapping $V^\top\text{softmax}(Kq)$, where $V,K$ are the query and key respectively. In this work, we argue that this uniform treatment hinders the ability to control contextual sparsity and relevance. 亮点:尽管自注意力取得了重大成功,但它显著地以相同的方式处理所有查询 $q$,通过应用映射 $V^\top\text{softmax}(Kq)$,其中 $V,K$ 分别是查询和键。在本研究中,我们认为这种统一处理妨碍了对上下文稀疏性和相关性的控制能力。 |
Xuechen Zhang; Xiangyu Chang; Mingchen Li; Amit Roy-Chowdhury; Jiasi Chen; Samet Oymak; |
392 | Stochastic Optimal Control Matching 随机最优控制匹配 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. 亮点:我们的工作引入了随机最优控制匹配(SOCM),这是一种新颖的迭代扩散优化(IDO)技术,旨在随机最优控制,源于与扩散模型的条件得分匹配损失相同的理念。 |
Carles Domingo i Enrich; Jiequn Han; Brandon Amos; Joan Bruna; Ricky T. Q. Chen; |
393 | Improved Off-policy Training of Diffusion Samplers 改进的离线策略训练扩散采样器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We study the problem of training diffusion models to sample from a distribution with a given unnormalized density or energy function. 重点:我们研究训练扩散模型以从具有给定非标准化密度或能量函数的分布中进行采样的问题。 |
Marcin Sendera; Minsu Kim; Sarthak Mittal; Pablo Lemos; Luca Scimeca; Jarrid Rector-Brooks; Alexandre Adam; Yoshua Bengio; Nikolay Malkin; 马尔钦·森德拉; 金敏秀; 萨尔塔克·米塔尔; 巴勃罗·莱莫斯; 卢卡·斯基梅卡; 贾里德·雷克托-布鲁克斯; 亚历山大·亚当; 约书亚·本吉奥; 尼古拉·马尔金; |
394 | The Fine-Grained Complexity of Gradient Computation for Training Large Language Models 训练大型语言模型的梯度计算的细粒度复杂性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we show nearly identical results for the harder-seeming problem of computing the gradient of loss function of one layer attention network, and thus for the entire process of LLM training. 亮点:在这项工作中,我们展示了对于看似更困难的计算单层注意力网络损失函数梯度的问题,几乎得到了相同的结果,因此也适用于整个LLM训练过程。 |
Josh Alman; Zhao Song; |
395 | Scalable and Effective Arithmetic Tree Generation for RL-Driven Adder and Multiplier Designs 可扩展且高效的算术树生成用于基于强化学习的加法器和乘法器设计 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To boost computing performance, this work focuses on the two most common and fundamental arithmetic modules, adders and multipliers. 亮点:为了提升计算性能,本研究集中于两种最常见和基本的算术模块,即加法器和乘法器。 |
Yao Lai; Jinxin Liu; David Pan; Ping Luo; 姚来;刘金鑫;大卫·潘;罗平; |
396 | GenRec: Unifying Video Generation and Recognition with Diffusion Models GenRec:通过扩散模型统一视频生成与识别 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. 亮点:视频扩散模型能够通过在大规模数据集上学习强大的时空先验来生成高质量的视频。本文旨在研究由生成过程得出的先验是否适用于视频识别,并最终实现生成与识别的联合优化。 |
Zejia Weng; Xitong Yang; Zhen Xing; Zuxuan Wu; Yu-Gang Jiang; |
397 | SafeWorld: Geo-Diverse Safety Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: On top of it, we propose a multi-dimensional automatic safety evaluation framework that assesses the contextual appropriateness, accuracy, and comprehensiveness of responses. 亮点:在此基础上,我们提出了一个多维自动安全评估框架,用于评估响应的上下文适宜性、准确性和全面性。 |
Da Yin; Haoyi Qiu; Kung-Hsiang Huang; Kai-Wei Chang; Nanyun Peng; 大尹;邱浩毅;黄恭翔;张凯维;彭南云; |
398 | BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction BehaviorGPT:用于自主驾驶的智能代理模拟与下一补丁预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. 亮点:然而,这种范式使模型架构变得复杂,而手动分离历史和未来轨迹导致数据利用率低。为了解决这些挑战,我们提出了行为生成预训练变压器(BehaviorGPT),一种仅基于解码器的自回归架构,旨在模拟多个智能体的连续运动。 |
Zikang Zhou; HU Haibo; Xinhong Chen; Jianping Wang; Nan Guan; Kui Wu; Yung-Hui Li; Yu-Kai Huang; Chun Jason Xue; 周子康; 胡海波; 陈新宏; 王建平; 管楠; 吴奎; 李永辉; 黄宇凯; 薛春杰 |
399 | EGODE: An Event-attended Graph ODE Framework for Modeling Rigid Dynamics EGODE:一个用于建模刚体动力学的事件驱动图形 ODE 框架 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a novel approach named Event-attend Graph ODE(EGODE) for effective rigid dynamics modeling. 亮点:在本文中,我们提出了一种名为事件关注图常微分方程(EGODE)的新方法,用于有效的刚体动力学建模。 |
Jingyang Yuan; Gongbo Sun; Zhiping Xiao; Hang Zhou; Xiao Luo; Junyu Luo; Yusheng Zhao; Wei Ju; Ming Zhang; |
400 | XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation XMask3D:跨模态掩蔽推理用于开放词汇 3D 语义分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Nevertheless, traditional techniques such as global feature alignment or vision-language model distillation tend to impose only approximate correspondence, struggling notably with delineating fine-grained segmentation boundaries. To address this gap, we propose a more meticulous mask-level alignment between 3D features and the 2D-text embedding space through a cross-modal mask reasoning framework, XMask3D. 亮点:然而,传统技术如全局特征对齐或视觉-语言模型蒸馏往往仅施加近似对应,特别在细粒度分割边界的划分上面临困难。为了解决这一问题,我们提出了一种更为细致的 3D 特征与 2D 文本嵌入空间之间的掩码级对齐,通过跨模态掩码推理框架 XMask3D。 |
Ziyi Wang; Yanbo Wang; Xumin Yu; Jie Zhou; Jiwen Lu; 王子怡; 王彦博; 于旭敏; 周杰; 陆继文; |
401 | Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum 数据集分解:使用可变序列长度课程加速 LLM 训练 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. 亮点:此外,由于注意力的平方成本,长序列的训练变得计算上不可行。在本研究中,我们引入了数据集分解,这是一种新颖的可变序列长度训练技术,以应对这些挑战。 |
Hadi Pouransari; Chun-Liang Li; Jen-Hao Chang; Pavan Kumar Anasosalu Vasu; Cem Koc; Vaishaal Shankar; Oncel Tuzel; 哈迪·普兰萨里;李春亮;詹浩·张;帕万·库马尔·阿纳索拉·瓦苏;杰姆·科克;维沙尔·香卡;翁切尔·图泽尔; |
402 | Towards Neuron Attributions in Multi-Modal Large Language Models 面向多模态大型语言模型中的神经元归因 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, while neuron attribution has made significant progress in deciphering text-only LLMs, its application to Multimodal LLMs (MLLMs) remains less explored. To address this gap, we propose a novel Neuron Attribution method tailored for MLLMs, termed NAM. 亮点:然而,尽管神经元归因在解读仅文本的 LLMs 方面取得了显著进展,但其在多模态 LLMs(MLLMs)中的应用仍然较少探索。为了解决这一空白,我们提出了一种针对 MLLMs 的新型神经元归因方法,称为 NAM。 |
Junfeng Fang; Zac Bi; Ruipeng Wang; Houcheng Jiang; Yuan Gao; Kun Wang; An Zhang; Jie Shi; Xiang Wang; Tat-Seng Chua; 冯俊峰; 比扎克; 王瑞鹏; 姜厚成; 高源; 王昆; 张安; 石杰; 王翔; 蔡达生; |
403 | Metric Transforms and Low Rank Representations of Kernels 度量变换与核的低秩表示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce a new linear-algebraic tool based on Abelian group representation theory, and use it to address three key problems in machine learning.1. 亮点:我们引入了一种基于阿贝尔群表示理论的新线性代数工具,并利用它解决机器学习中的三个关键问题。1. |
Timothy Chu; Josh Alman; Gary L. Miller; Shyam Narayanan; Mark Sellke; Zhao Song; 蒂莫西·朱;乔希·阿尔曼;加里·L·米勒;夏姆·纳拉扬;马克·塞尔克;赵松; |
404 | Differentiable Structure Learning with Partial Orders 可微结构学习与偏序关系 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: The main difficulty lies in adapting these constraints, typically suited for the space of total orderings, to the continuous optimization context of structure learning in the graph space. To bridge this gap, this paper formalizes a set of equivalent constraints that map partial orders onto graph spaces and introduces a plug-and-play module for their efficient application. 亮点:主要困难在于将这些约束(通常适用于总序列的空间)调整到图空间中的结构学习的连续优化上下文。为了解决这个问题,本文形式化了一组等效约束,将偏序映射到图空间,并引入了一个即插即用模块以便于其高效应用。 |
Taiyu Ban; Lyuzhou Chen; Xiangyu Wang; Xin Wang; Derui Lyu; Huanhuan Chen; |
405 | R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction R$^2$-高斯:用于断层重建的修正辐射高斯喷溅 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: By carefully deriving X-ray rasterization functions, we discover a previously unknown \emph{integration bias} in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. 亮点:通过仔细推导 X 射线光栅化函数,我们发现标准 3DGS 公式中存在一个之前未知的\emph{积分偏差},这妨碍了准确的体积检索。为了解决这个问题,我们提出了一种通过重构从 3D 到 2D 高斯的投影的新型修正技术。 |
Ruyi Zha; Tao Jun Lin; Yuanhao Cai; Jiwen Cao; Yanhao Zhang; Hongdong Li; 如意扎;林涛俊;蔡元浩;曹继文;张彦豪;李洪东; |
406 | GPT As Visual Explainer GPT 作为视觉解释者 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we present Language Model as Visual Explainer (\texttt{LVX}), a systematic approach for interpreting the internal workings of vision models using a tree-structured linguistic explanation, without the need for model training. 重点:在本文中,我们提出了语言模型作为视觉解释器(\texttt{LVX}),这是一种系统化的方法,通过树状语言解释来阐释视觉模型的内部工作原理,而无需进行模型训练。 |
Xingyi Yang; Xinchao Wang; 杨兴义; 王新超; |
407 | A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation 一种几乎最优且低切换的强化学习算法,适用于一般函数逼近 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. 亮点:在本文中,我们提出了一种新的算法,单调 Q 学习与上置信界限(MQL-UCB),用于具有一般函数逼近的强化学习。 |
Heyang Zhao; Jiafan He; Quanquan Gu; |
408 | Unveiling The Power of Diffusion Features For Personalized Segmentation and Retrieval 揭示扩散特征在个性化分割和检索中的力量 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, a significant flaw in these models is evident: they struggle to locate a desired instance when other instances within the same class are presented. In this paper, we explore text-to-image diffusion models for these tasks. 亮点:然而,这些模型存在一个显著的缺陷:当同一类别中出现其他实例时,它们难以定位所需的实例。本文探讨了用于这些任务的文本到图像扩散模型。 |
Dvir Samuel; Rami Ben-Ari; Matan Levy; Nir Darshan; Gal Chechik; |
409 | Empowering and Assessing The Utility of Large Language Models in Crop Science 赋能与评估大型语言模型在作物科学中的实用性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. 亮点:然而,他们在作物科学方面未被开发的潜力为进步提供了机会。为了缩小这一差距,我们引入了 CROP,其中包括一个新颖的指令调优数据集,专门设计用于提升LLMs在作物科学领域的专业能力,以及一个基准,作为对LLMs对该领域知识理解的全面评估。 |
Hang Zhang; Jiawei SUN; Renqi Chen; Wei Liu; Zhonghang Yuan; Xinzhe Zheng; Zhefan Wang; Zhiyuan Yang; Hang Yan; Han-Sen Zhong; Xiqing Wang; Fan Yang; Nanqing Dong; Wanli Ouyang; 张航; 孙家伟; 陈仁奇; 刘伟; 袁中航; 郑新哲; 王哲凡; 杨志远; 闫航; 钟汉森; 王希青; 杨帆; 董南青; 欧阳万里; |
410 | Can LLMs Learn By Teaching? A Preliminary Study LLMs 能通过教学学习吗?一项初步研究 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: If yes, we can potentially unlock the possibility of continuously advancing the models without solely relying on human-produced data or stronger models. In this paper, we provide a preliminary exploration of this ambitious agenda. 亮点:如果是的话,我们有可能解锁持续推进模型的可能性,而不单单依赖于人类生成的数据或更强大的模型。在本文中,我们对这一雄心勃勃的议程进行了初步探索。 |
Xuefei Ning; Zifu Wang; Shiyao Li; Zinan Lin; Peiran Yao; Tianyu Fu; Matthew Blaschko; Guohao Dai; Huazhong Yang; Yu Wang; |
411 | Alleviating Distortion in Image Generation Via Multi-Resolution Diffusion Models 通过多分辨率扩散模型缓解图像生成中的失真 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. 亮点:本文通过整合一种新颖的多分辨率网络和时间依赖层归一化,提出了对扩散模型的创新增强。 |
Qihao Liu; Zhanpeng Zeng; Ju He; Qihang Yu; Xiaohui Shen; Liang-Chieh Chen; 刘启浩; 曾展鹏; 何炬; 余启航; 沈晓辉; 陈良杰; |
412 | SimGen: Simulator-conditioned Driving Scene Generation SimGen:模拟器条件下的驾驶场景生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world. 亮点:在本研究中,我们介绍了一种名为 SimGen 的模拟器条件场景生成框架,该框架可以通过混合来自模拟器和现实世界的数据来学习生成多样化的驾驶场景。 |
Yunsong Zhou; Michael Simon; Zhenghao (Mark) Peng; Sicheng Mo; Hongzi Zhu; Minyi Guo; Bolei Zhou; |
413 | Parallelizing Linear Transformers with The Delta Rule Over Sequence Length 使用 Delta 规则对序列长度进行线性变换器的并行化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: This work describes a hardware-efficient algorithm for training a generalized variant of linear Transformers (of which DeltaNet is a special case) which exploits the WY representation for computing products of Householder matrices. 亮点:本工作描述了一种硬件高效的算法,用于训练线性变换器的广义变体(其中 DeltaNet 是一个特例),该算法利用 WY 表示法计算 Householder 矩阵的乘积。 |
Songlin Yang; Bailin Wang; Yu Zhang; Yikang Shen; Yoon Kim; 杨松林; 王百林; 张宇; 沈怡康; 金尹 |
414 | A Closer Look at Deep Learning Phenomena Through A Telescoping Lens 通过望远镜透镜更深入地观察深度学习现象 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We demonstrate that this model presents a pedagogical formalism allowing us to isolate components of the training process even in complex contemporary settings, providing a sharp lens to reason about the effects of design choices such as architecture and optimization strategy, and reveals surprising parallels between neural network learning and gradient boosting. 亮点:我们证明了该模型提供了一种教学形式,使我们能够在复杂的当代环境中孤立训练过程的各个组成部分,为思考设计选择(如架构和优化策略)的影响提供了清晰的视角,并揭示了神经网络学习与梯度提升之间的惊人相似之处。 |
Alan Jeffares; Alicia Curth; Mihaela van der Schaar; 艾伦·杰法雷斯; 艾丽西亚·库尔斯; 米哈伊拉·范德·沙尔; |
415 | Communication Bounds for The Distributed Experts Problem 分布式专家问题的通信界限 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we study the experts problem in the distributed setting where an expert’s cost needs to be aggregated across multiple servers. 亮点:在本研究中,我们研究了分布式环境中的专家问题,其中专家的成本需要在多个服务器之间进行汇总。 |
Zhihao Jia; Qi Pang; Trung Tran; David Woodruff; Zhihao Zhang; Wenting Zheng; 贾志豪; 庞奇; 陈中; 大卫·伍德拉夫; 张志豪; 郑文婷; |
416 | Implicit Multimodal Alignment: On The Generalization of Frozen LLMs to Multimodal Inputs 隐式多模态对齐:关于冻结 LLMs 对多模态输入的泛化 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we expose frozen LLMs to image, video, audio and text inputs and analyse their internal representation with the attempt to understand their generalization beyond textual inputs. 亮点:在这项工作中,我们将冻结的 LLMs 暴露于图像、视频、音频和文本输入,并分析它们的内部表示,试图理解它们在文本输入之外的泛化能力。 |
Mustafa Shukor; Matthieu Cord; 穆斯塔法·舒克尔; 马修·科尔; |
417 | Mitigating Fine-tuning Based Jailbreak Attack with Backdoor Enhanced Safety Alignment 通过后门增强安全对齐来缓解基于微调的越狱攻击 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. 亮点:为了在 LMaaS 下有效防御有限安全示例下的 FJAttack,我们提出了受后门攻击概念启发的后门增强安全对齐方法。 |
Jiongxiao Wang; Jiazhao LI; Yiquan Li; Xiangyu Qi; Junjie Hu; Sharon Li; Patrick McDaniel; Muhao Chen; Bo Li; Chaowei Xiao; |
418 | Game-Traversal-Benchmark: Evaluating Planning Abilities Of Large Language Models Via Traversing 2D Game Maps 游戏遍历基准:通过遍历二维游戏地图评估大型语言模型的规划能力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: They have also shown potential outside the natural language domain, but can LLMs plan? There has been a debate around this question. We contribute to this debate by proposing Game-Traversal-Benchmark (GTB), a benchmark consisting of diverse 2D grid-based game maps to evaluate the planning and reasoning abilities of an LLM. 亮点:它们在自然语言领域之外也显示出了潜力,但能否 LLMs 进行规划?围绕这个问题一直存在争论。我们通过提出游戏遍历基准(Game-Traversal-Benchmark,GTB)来为这一争论做出贡献,该基准由多样化的二维网格游戏地图组成,以评估 LLM 的规划和推理能力。 |
Muhammad Umair Nasir; Steven James; Julian Togelius; 穆罕默德·乌迈尔·纳西尔;史蒂文·詹姆斯;朱利安·托戈利乌斯; |
419 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? 大型语言模型在预训练过程中如何获取事实知识? Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. 亮点:尽管最近观察到大型语言模型(LLMs)可以存储大量事实知识,但对它们如何通过预训练获取事实知识的机制了解有限。本研究通过研究LLMs在预训练期间如何获取事实知识来填补这一空白。 |
Hoyeon Chang; Jinho Park; Seonghyeon Ye; Sohee Yang; Youngkyung Seo; Du-Seong Chang; Minjoon Seo; |
420 | One-shot Federated Learning Via Synthetic Distiller-Distillate Communication 通过合成蒸馏器-蒸馏通信的一次性联邦学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Additionally, they may encounter scalability issues with complex datasets due to inherent two-step information loss: first, during local training (from data to model), and second, when transferring knowledge to the server model (from model to inversed data). In this paper, we propose FedSD2C, a novel and practical one-shot FL framework designed to address these challenges. 亮点:此外,由于固有的两步信息损失,他们可能会在处理复杂数据集时遇到可扩展性问题:首先,在本地训练期间(从数据到模型),其次,在将知识转移到服务器模型时(从模型到反向数据)。在本文中,我们提出了 FedSD2C,这是一种新颖且实用的一次性 FL 框架,旨在解决这些挑战。 |
JUNYUAN ZHANG; Songhua Liu; Xinchao Wang; |
421 | RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar RadarOcc:基于 4D 成像雷达的稳健 3D 占用预测 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To improve perception robustness, we leverage the recent advances in automotive radars and introduce a novel approach that utilizes 4D imaging radar sensors for 3D occupancy prediction. 亮点:为了提高感知的鲁棒性,我们利用汽车雷达的最新进展,提出了一种新方法,利用 4D 成像雷达传感器进行 3D 占用预测。 |
Fangqiang Ding; Xiangyu Wen; Yunzhou Zhu; Yiming Li; Chris Xiaoxuan Lu; 丁方强; 温向宇; 朱云洲; 李逸铭; 陆晓轩; |
422 | FlexPlanner: Flexible 3D Floorplanning Via Deep Reinforcement Learning in Hybrid Action Space with Multi-Modality Representation FlexPlanner:通过深度强化学习在混合动作空间中实现灵活的 3D 平面布局,采用多模态表示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Besides, they typically face difficulties in aligning the cross-die modules in 3D ICs due to their heuristic representations, which could potentially result in severe data transfer failures. To address these issues, we propose FlexPlanner, a flexible learning-based method in hybrid action space with multi-modality representation to simultaneously handle position, aspect ratio, and alignment of blocks. 亮点:此外,由于它们的启发式表示,3D IC 中的跨芯片模块通常面临对齐困难,这可能导致严重的数据传输失败。为了解决这些问题,我们提出了 FlexPlanner,这是一种基于灵活学习的混合动作空间方法,具有多模态表示,能够同时处理块的位置、纵横比和对齐。 |
Ruizhe Zhong; Xingbo Du; Shixiong Kai; Zhentao Tang; Siyuan Xu; Jianye Hao; Mingxuan Yuan; Junchi Yan; |
423 | Learning Cooperative Trajectory Representations for Motion Forecasting 学习合作轨迹表示用于运动预测 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a forecasting-oriented representation paradigm to utilize motion and interaction features from cooperative information. 亮点:在本文中,我们提出了一种面向预测的表示范式,以利用来自协作信息的运动和交互特征。 |
Hongzhi Ruan; Haibao Yu; Wenxian Yang; Siqi Fan; Zaiqing Nie; 阮洪志;余海宝;杨文贤;范思琪;聂在青; |
424 | Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset 使用 MATH-Vision 数据集测量多模态数学推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. 亮点:然而,我们观察到这些基准在问题的多样性和涵盖主题的广度方面存在显著限制。为了解决这个问题,我们提出了 MATH-Vision (MATH-V) 数据集,这是一个精心策划的包含 3,040 个高质量数学问题的集合,这些问题具有来自真实数学竞赛的视觉背景。 |
Ke Wang; Junting Pan; Weikang Shi; Zimu Lu; Houxing Ren; Aojun Zhou; Mingjie Zhan; Hongsheng Li; 凯·王;俊廷·潘;伟康·施;子木·卢;厚兴·任;傲君·周;明杰·詹;洪生·李; |
425 | Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration 朝着统一的多模态编辑与增强的知识协作 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting. 亮点:在本论文中,我们提出了 UniKE,一种新颖的多模态编辑方法,建立了内在知识编辑和外部知识恢复的统一视角和范式。 |
Kaihang Pan; Zhaoyu Fan; Juncheng Li; Qifan Yu; Hao Fei; Siliang Tang; Richang Hong; Hanwang Zhang; QIANRU SUN; |
426 | LACIE: Listener-Aware Finetuning for Calibration in Large Language Models LACIE:面向听众的微调以实现大型语言模型的校准 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that directly models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. 亮点:为了校准隐性和显性信心标记,我们引入了一种务实的、考虑听众的微调方法(LACIE),该方法直接对听众进行建模,不仅考虑答案是否正确,还考虑答案是否会被听众接受。 |
Elias Stengel-Eskin; Peter Hase; Mohit Bansal; 埃利亚斯·斯滕格尔-埃斯金;彼得·哈塞;莫希特·班萨尔; |
427 | AnonFair: A Flexible Toolkit for Algorithmic Fairness AnonFair:一个灵活的算法公平性工具包 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present AnonFair, a new open source toolkit for enforcing algorithmic fairness. 亮点:我们推出了 AnonFair,一个新的开源工具包,用于实施算法公平性。 |
Eoin Delaney; Zihao Fu; Chris Russell; |
428 | SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models SpatialRGPT:视觉-语言模型中的基础空间推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce Spatial Region GPT (SpatialRGPT) to enhance VLMs’ spatial perception and reasoning capabilities. 亮点:在本研究中,我们引入了空间区域 GPT(SpatialRGPT),以增强视觉语言模型(VLMs)的空间感知和推理能力。 |
AnChieh Cheng; Hongxu Yin; Yang Fu; Qiushan Guo; Ruihan Yang; Jan Kautz; Xiaolong Wang; Sifei Liu; |
429 | Learning Partitions from Context 从上下文中学习分区 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper we study the problem of learning the structure of a discrete set of tokens from their interaction with other tokens. 亮点:在本文中,我们研究了从令牌与其他令牌的交互中学习离散令牌集结构的问题。 |
Simon Buchholz; 西蒙·布赫霍尔茨 |
430 | APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets APIGen:用于生成可验证和多样化函数调用数据集的自动化管道 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper presents APIGen, an automated data generation pipeline designed to produce verifiable high-quality datasets for function-calling applications. 亮点:本文介绍了 APIGen,一个自动化数据生成管道,旨在为函数调用应用程序生成可验证的高质量数据集。 |
Zuxin Liu; Thai Hoang; Jianguo Zhang; Ming Zhu; Tian Lan; Shirley kokane; Juntao Tan; Weiran Yao; Zhiwei Liu; Yihao Feng; Rithesh R N; Liangwei Yang; Silvio Savarese; Juan Carlos Niebles; Huan Wang; Shelby Heinecke; Caiming Xiong; 刘祖欣; 黄泰; 张建国; 朱明; 兰天; 肖莉·科坎; 谭俊涛; 姚伟然; 刘志伟; 冯逸豪; Rithesh R N; 杨良伟; 西尔维奥·萨瓦雷斯; 胡安·卡洛斯·尼布莱斯; 王欢; 谢尔比·海内克; 熊才明; |
431 | Multi-Label Learning with Stronger Consistency Guarantees 具有更强一致性保证的多标签学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present a detailed study of surrogate losses and algorithms for multi-label learning, supported by $H$-consistency bounds. 亮点:我们对多标签学习的替代损失和算法进行了详细研究,并得到了 $H$-一致性界限的支持。 |
Anqi Mao; Yutao Zhong; Mehryar Mohri; |
432 | Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer 可实现的 $H$-一致和贝叶斯一致的损失函数用于学习延迟决策 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present a comprehensive study of surrogate loss functions for learning to defer. 亮点:我们提出了一项关于学习延迟的替代损失函数的综合研究。 |
Anqi Mao; Yutao Zhong; Mehryar Mohri; 毛安琪; 钟宇涛; 梅赫里亚尔·莫赫里; |
433 | CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations CoVoMix:推进零样本语音生成以实现类人多说话者对话 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. 亮点:在本文中,我们介绍了 CoVoMix:对话语音混合生成,一种用于零-shot、人类般的多说话者、多轮对话语音生成的新模型。 |
Leying Zhang; Yao Qian; Long Zhou; Shujie LIU; Dongmei Wang; Xiaofei Wang; Midia Yousefi; Yanmin Qian; Jinyu Li; Lei He; sheng zhao; Michael Zeng; |
434 | L4GM: Large 4D Gaussian Reconstruction Model L4GM:大型 4D 高斯重建模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input — in a single feed-forward pass that takes only a second. 亮点:我们提出了 L4GM,这是第一个 4D 大型重建模型,可以从单视角视频输入生成动画对象——在仅需一秒的单次前馈传递中完成。 |
Jiawei Ren; Cheng Xie; Ashkan Mirzaei; hanxue liang; xiaohui zeng; Karsten Kreis; Ziwei Liu; Antonio Torralba; Sanja Fidler; Seung Wook Kim; Huan Ling; 贾伟 任; 程 谢; 阿什坎 米尔扎伊; 汉雪 梁; 小辉 曾; 卡斯滕 克雷斯; 子维 刘; 安东尼奥 托拉尔巴; 桑娅 费德勒; 成旭 吴金; 欢 灵; |
435 | CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models CARES:医疗视觉语言模型可信度的综合基准 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we introduce CARES and aim to comprehensively evaluate the Trustworthiness of Med-LVLMs across the medical domain. 亮点:在本文中,我们介绍了 CARES,并旨在全面评估医疗领域 Med-LVLMs 的可信度。 |
Peng Xia; Ze Chen; Juanxi Tian; Yangrui Gong; Ruibo Hou; Yue Xu; Zhenbang Wu; Zhiyuan Fan; Yiyang Zhou; Kangyu Zhu; Wenhao Zheng; Zhaoyang Wang; Xiao Wang; Xuchao Zhang; Chetan Bansal; Marc Niethammer; Junzhou Huang; Hongtu Zhu; Yun Li; Jimeng Sun; Zongyuan Ge; Gang Li; James Zou; Huaxiu Yao; 彭霞; 陈泽; 田娟熙; 龚扬瑞; 侯瑞博; 许悦; 吴振邦; 范志远; 周怡扬; 朱康宇; 郑文浩; 王兆阳; 王晓; 张旭超; 切坦·班萨尔; 马克·尼特哈默; 黄俊洲; 朱洪图; 李云; 孙吉萌; 葛宗源; 李刚; 赵詠; 姚华秀; |
436 | Guiding A Diffusion Model with A Bad Version of Itself 用自身的劣质版本引导扩散模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. 亮点:我们惊讶地观察到,通过使用模型自身的一个较小、训练较少的版本来引导生成,而不是使用无条件模型,可以在不影响变化量的情况下实现对图像质量的解耦控制。 |
Tero Karras; Miika Aittala; Tuomas Kynkäänniemi; Jaakko Lehtinen; Timo Aila; Samuli Laine; |
437 | Elo Uncovered: Robustness and Best Practices in Language Model Evaluation Elo Uncovered: 语言模型评估中的鲁棒性和最佳实践 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We conduct extensive evaluation of Elo behaviour, illustrating that individual Elo computations exhibit volatility and investigating the impact of varying the Elo rating system’s hyperparameters. 亮点:我们对 Elo 行为进行了广泛评估,说明个体 Elo 计算表现出波动性,并研究了改变 Elo 评级系统超参数的影响。 |
Meriem Boubdir; Edward Kim; Beyza Ermis; Sara Hooker; Marzieh Fadaee; |
438 | Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions 数据混合推断攻击:BPE 分词器揭示训练数据组成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we tackle a task which we call *data mixture inference*, which aims to uncover the distributional make-up of the pretraining data. 亮点:在这项工作中,我们处理一个我们称之为*数据混合推断*的任务,旨在揭示预训练数据的分布构成。 |
Jonathan Hayase; Alisa Liu; Yejin Choi; Sewoong Oh; Noah Smith; 乔纳森·哈亚斯; 阿丽莎·刘; 崔艺珍; 吴世雄; 诺亚·史密斯; |
439 | Theoretical and Empirical Insights Into The Origins of Degree Bias in Graph Neural Networks 图神经网络中学位偏差起源的理论与实证见解 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We validate our theoretical findings on 8 common real-world networks, and based on our theoretical and empirical insights, describe a roadmap to alleviate degree bias. 亮点:我们在 8 个常见的现实世界网络上验证了我们的理论发现,并基于我们的理论和实证见解,描述了一条减轻度偏差的路线图。 |
Arjun Subramonian; Jian Kang; Yizhou Sun; 阿尔君·苏布拉莫尼安; 姜康; 孙怡洲; |
440 | Diffusion Models Are Certifiably Robust Classifiers 扩散模型是可认证的鲁棒分类器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this study, we prove that diffusion classifiers possess $O(1)$ Lipschitzness, and establish their certified robustness, demonstrating their inherent resilience. 亮点:在本研究中,我们证明了扩散分类器具有 $O(1)$ 的利普希茨性,并建立了它们的认证鲁棒性,展示了它们固有的抗干扰能力。 |
Huanran Chen; Yinpeng Dong; Shitong Shao; Hao Zhongkai; Xiao Yang; Hang Su; Jun Zhu; |
441 | EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models EvolveDirector:利用大型视觉语言模型接近高级文本到图像生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To explore the feasibility of training a text-to-image generation model comparable to advanced models using publicly available resources, we introduce EvolveDirector. 亮点:为了探索使用公开可用资源训练一个可与先进模型相媲美的文本到图像生成模型的可行性,我们引入了 EvolveDirector。 |
Rui Zhao; Hangjie Yuan; Yujie Wei; Shiwei Zhang; Yuchao Gu; Lingmin Ran; Xiang Wang; Jay Zhangjie Wu; David Junhao Zhang; Yingya Zhang; Mike Zheng Shou; |
442 | SAFE: Slow and Fast Parameter-Efficient Tuning for Continual Learning with Pre-Trained Models SAFE: 慢速和快速参数高效调优用于基于预训练模型的持续学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Additionally, freezing the parameters in incremental sessions hinders models’ plasticity to novel concepts not covered in the first session. To solve the above issues, we propose a Slow And Fast parameter-Efficient tuning (SAFE) framework. 亮点:此外,在增量会话中冻结参数会阻碍模型对第一会话中未涵盖的新概念的适应性。为了解决上述问题,我们提出了一种慢速与快速参数高效调优(SAFE)框架。 |
Linglan Zhao; Xuerui Zhang; Weiran Huang; Ke Yan; Shouhong Ding; 赵灵兰; 张学瑞; 黄维然; 闫可; 丁守宏; |
443 | A Simple Image Segmentation Framework Via In-Context Examples 通过上下文示例的简单图像分割框架 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, these methods still struggle with task ambiguity in in-context segmentation, as not all in-context examples can accurately convey the task information. In order to address this issue, we present SINE, a simple image $\textbf{S}$egmentation framework utilizing $\textbf{in}$-context $\textbf{e}$xamples. 亮点:然而,这些方法在上下文分割中仍然面临任务模糊性的问题,因为并非所有上下文示例都能准确传达任务信息。为了解决这个问题,我们提出了 SINE,一个利用上下文示例的简单图像$\textbf{S}$egmentation 框架。 |
Yang Liu; Chenchen Jing; Hengtao Li; Muzhi Zhu; Hao Chen; Xinlong Wang; Chunhua Shen; 杨柳;景晨晨;李恒涛;朱睦之;陈浩;王新龙;沈春华; |
444 | Stress-Testing Capability Elicitation With Password-Locked Models 密码锁定模型的压力测试能力引导 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. 亮点:在本文中,我们研究了基于微调的引导足以引出能力的条件。 |
Ryan Greenblatt; Fabien Roger; Dmitrii Krasheninnikov; David Krueger; 瑞安·格林布拉特;法比安·罗杰;德米特里·克拉申尼科夫;大卫·克鲁格; |
445 | Consistency Diffusion Bridge Models 一致性扩散桥模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, DDBM’s sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. 亮点:然而,DDBM 的采样过程通常需要数百次网络评估才能达到良好的性能,这可能由于高计算需求而阻碍其实际部署。在这项工作中,受到 DM 中一致性模型最近进展的启发,我们通过学习 DDBM 的概率流常微分方程(PF-ODE)的一致性函数来解决这个问题,该函数直接预测在 ODE 轨迹上的任何点给定的起始步骤的解。 |
Guande He; Kaiwen Zheng; Jianfei Chen; Fan Bao; Jun Zhu; |
446 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning 在多模态学习中利用视觉标记扩展文本上下文 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present \ModelFullName (\ModelName), which processes long in-context text using visual tokens. 亮点:我们提出了\ModelFullName(\ModelName),它使用视觉标记处理长上下文文本。 |
Jinpeng Wang; Linjie Li; Yiqi Lin; Min Li; Lijuan Wang; Mike Zheng Shou; 金鹏王;林杰李;易奇林;敏李;丽娟王;迈克郑寿; |
447 | $\textit{Bifr\ost}$: 3D-Aware Image Composing with Language Instructions $\textit{Bifr\ost}$:基于语言指令的 3D 感知图像合成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This paper introduces $\textit{Bifr\ost}$, a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. 亮点:本文介绍了 $\textit{Bifr\ost}$,一个基于扩散模型构建的新颖 3D 感知框架,用于执行基于指令的图像合成。 |
Lingxiao Li; Kaixiong Gong; Wei-Hong Li; xili dai; Tao Chen; Xiaojun Yuan; Xiangyu Yue; |
448 | ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving ZOPP:一种用于自动驾驶的零样本离线全景感知框架 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a novel multi-modal Zero-shot Offboard Panoptic Perception (ZOPP) framework for autonomous driving scenes. 亮点:在本文中,我们提出了一种新颖的多模态零-shot 离板全景感知(ZOPP)框架,用于自动驾驶场景。 |
Tao MA; Hongbin Zhou; Qiusheng Huang; Xuemeng Yang; Jianfei Guo; Bo Zhang; Min Dou; Yu Qiao; Botian Shi; Hongsheng Li; 陶马;周洪彬;黄秋生;杨雪萌;郭剑飞;张博;窦敏;乔宇;石博天;李洪生; |
449 | Geometric Trajectory Diffusion Models 几何轨迹扩散模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose geometric trajectory diffusion models (GeoTDM), the first diffusion model for modeling the temporal distribution of 3D geometric trajectories. 重点:在本研究中,我们提出了几何轨迹扩散模型(GeoTDM),这是第一个用于建模三维几何轨迹时间分布的扩散模型。 |
Jiaqi Han; Minkai Xu; Aaron Lou; Haotian Ye; Stefano Ermon; 贾琦·韩; 闵凯·徐; 亚伦·娄; 浩天·叶; 斯特凡·埃尔蒙 |
450 | Lexicon3D: Probing Visual Encoding Models for Complex 3D Scene Understanding Lexicon3D:探究复杂 3D 场景理解的视觉编码模型 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, the optimal scene encoding strategies for various scenarios remain unclear, particularly compared to their image-based counterparts. To address this issue, we present a comprehensive study that probes various visual encoding models for 3D scene understanding, identifying the strengths and limitations of each model across different scenarios. 亮点:然而,针对各种场景的最佳场景编码策略仍不明确,特别是与基于图像的对应策略相比。为了解决这个问题,我们提出了一项综合研究,探讨了用于 3D 场景理解的各种视觉编码模型,识别了每个模型在不同场景中的优缺点。 |
Yunze Man; Shuhong Zheng; Zhipeng Bao; Martial Hebert; Liangyan Gui; Yu-Xiong Wang; |
451 | PediatricsGPT: Large Language Models As Chinese Medical Assistants for Pediatric Applications PediatricsGPT:大型语言模型作为中国医学助手在儿科应用中的作用 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. 亮点:在持续的预训练阶段,我们引入了一种混合指令预训练机制,以减轻LLMs在医学领域适应中的内部注入知识不一致性。 |
Dingkang Yang; Jinjie Wei; Dongling Xiao; Shunli Wang; Tong Wu; Gang Li; Mingcheng Li; Shuaibing Wang; Jiawei Chen; Yue Jiang; Qingyao Xu; Ke Li; Peng Zhai; Lihua Zhang; 丁康 杨; 晋杰 魏; 东玲 肖; 顺利 王; 通 吴; 刚 李; 明诚 李; 帅兵 王; 家伟 陈; 岳 江; 青尧 许; 可 李; 彭 翟; 丽华 张; |
452 | HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed Via Gaussian Splatting HDR-GS:通过高斯溅射以 1000 倍速度实现高效高动态范围新视图合成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. 亮点:在本文中,我们提出了一种新的框架,高动态范围高斯喷溅(HDR-GS),该框架可以高效渲染新的 HDR 视图并根据用户输入的曝光时间重建 LDR 图像。 |
Yuanhao Cai; Zihao Xiao; Yixun Liang; Minghan Qin; Yulun Zhang; Xiaokang Yang; Yaoyao Liu; Alan Yuille; 蔡元浩; 肖子豪; 梁逸勋; 秦明涵; 张宇伦; 杨晓康; 刘瑶瑶; 阿兰·尤尔 |
453 | Transferable Boltzmann Generators 可转移的玻尔兹曼生成器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Recently, flow matching has been employed to train Boltzmann Generators for small molecular systems in Cartesian coordinates. We extend this work and propose a first framework for Boltzmann Generators that are transferable across chemical space, such that they predict zero-shot Boltzmann distributions for test molecules without being retraining for these systems. 亮点:最近,流匹配已被用于训练小分子系统的玻尔兹曼生成器,采用笛卡尔坐标。我们扩展了这项工作,提出了一个首个可跨化学空间转移的玻尔兹曼生成器框架,使其能够在不对这些系统进行重新训练的情况下,为测试分子预测零-shot 玻尔兹曼分布。 |
Leon Klein; Frank Noe; 莱昂·克莱因; 弗兰克·诺伊 |
454 | PromptFix: You Prompt and We Fix The Photo PromptFix:您发送提示,我们修复照片 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. 亮点:此外,扩散过程的随机性导致在需要详细保留生成图像的图像生成或编辑任务中存在不足。为了解决这些限制,我们提出了 PromptFix,这是一个综合框架,使扩散模型能够遵循人类指令执行各种图像处理任务。 |
yongsheng yu; Ziyun Zeng; Hang Hua; Jianlong Fu; Jiebo Luo; |
455 | HEST-1k: A Dataset For Spatial Transcriptomics and Histology Image Analysis HEST-1k:用于空间转录组学和组织学图像分析的数据集 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Here, we introduce HEST-1k, a collection of 1,108 spatial transcriptomic profiles, each linked to a WSI and metadata. 亮点:在这里,我们介绍 HEST-1k,这是一个包含 1,108 个空间转录组特征的集合,每个特征都与 WSI 和元数据相关联。 |
Guillaume Jaume; Paul Doucet; Andrew Song; Ming Y. Lu; Cristina Almagro Pérez; Sophia Wagner; Anurag Vaidya; Richard Chen; Drew Williamson; Ahrong Kim; Faisal Mahmood; 吉约姆·贾梅;保罗·杜塞;安德鲁·宋;吕明阳;克里斯蒂娜·阿尔马格罗·佩雷斯;索非亚·瓦格纳;阿努拉格·维迪亚;理查德·陈;德鲁·威廉姆森;金阿荣;法伊萨尔·马哈茂德; |
456 | MiniCache: KV Cache Compression in Depth Dimension for Large Language Models MiniCache:大型语言模型的深度维度键值缓存压缩 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference. 亮点:在本文中,我们提出了一种简单而有效的方法,称为 MiniCache,从新的深度视角压缩跨层的 KV 缓存,显著减少了LLM推理的内存占用。 |
Akide Liu; Jing Liu; Zizheng Pan; Yefei He; Reza Haffari; Bohan Zhuang; |
457 | Improving Context-Aware Preference Modeling for Language Models 改善语言模型的上下文感知偏好建模 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To this end, we contribute several \textit{context-conditioned} preference datasets and accompanying experiments that investigate the ability of language models to evaluate context-specific preference. 亮点:为此,我们贡献了几个\textit{context-conditioned}偏好数据集和相关实验,研究语言模型评估特定上下文偏好的能力。 |
Silviu Pitis; Ziang Xiao; Nicolas Le Roux; Alessandro Sordoni; |
458 | Unleashing The Potential of The Diffusion Model in Few-shot Semantic Segmentation 释放扩散模型在少样本语义分割中的潜力 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Drawing from the extensive potential unveiled by the Diffusion Model in both semantic correspondence and open vocabulary segmentation, our work initiates an investigation into employing the Latent Diffusion Model for Few-shot Semantic Segmentation. 亮点:基于扩散模型在语义对应和开放词汇分割中揭示的广泛潜力,我们的工作开始探讨利用潜在扩散模型进行少样本语义分割。 |
Muzhi Zhu; Yang Liu; Zekai Luo; Chenchen Jing; Hao Chen; Guangkai Xu; Xinlong Wang; Chunhua Shen; |
459 | Self-Refining Diffusion Samplers: Enabling Parallelization Via Parareal Iterations 自我精炼扩散采样器:通过 Parareal 迭代实现并行化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In contrast, we introduce Self-Refining Diffusion Samplers (SRDS) that retain sample quality and can improve latency at the cost of additional parallel compute. 亮点:相对而言,我们引入了自我精炼扩散采样器(SRDS),它们保持样本质量,并且可以在增加并行计算的成本下提高延迟。 |
Nikil Selvam; Amil Merchant; Stefano Ermon; 尼基尔·塞尔瓦姆;阿米尔·梅尔尚特;斯特法诺·埃尔蒙 |
460 | Text to Blind Motion 盲动文本 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce BlindWays, the first multimodal motion benchmark for pedestrians who are blind. 重点:在本工作中,我们介绍了 BlindWays,这是针对盲人行人的第一个多模态运动基准。 |
Hee Jae Kim; Kathakoli Sengupta; Masaki Kuribayashi; Hernisa Kacorri; Eshed Ohn-Bar; |
461 | Learning De-Biased Representations for Remote-Sensing Imagery 学习去偏见的遥感图像表示 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose debLoRA—a generic training approach that works with any LoRA variants to yield debiased features. 亮点:在本文中,我们提出了 debLoRA——一种通用的训练方法,适用于任何 LoRA 变体,以产生去偏差特征。 |
Zichen Tian; Zhaozheng CHEN; QIANRU SUN; |
462 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos DreamScene4D:基于单目视频的动态多物体场景生成 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via 360-degree novel view synthesis. 亮点:我们提出了 DreamScene4D,这是第一种通过 360 度新视角合成从单目视频生成多个物体的 3D 动态场景的方法。 |
Wen-Hsuan Chu; Lei Ke; Katerina Fragkiadaki; 朱文轩; 柯磊; 卡特里娜·弗拉基亚达基; |
463 | Learning 3D Garment Animation from Trajectories of A Piece of Cloth 从一块布的轨迹学习 3D 服装动画 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, instead of using garment-wise supervised-learning we adopt a disentangled scheme to learn how to animate observed garments: 1). 亮点:在本文中,我们采用了一种解耦方案来学习如何动画化观察到的服装,而不是使用逐件服装的监督学习:1)。 |
Yidi Shao; Chen Change Loy; Bo Dai; |
464 | Measuring Per-Unit Interpretability at Scale Without Humans 在没有人类的情况下大规模测量每单位可解释性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce the first scalable method to measure the per-unit interpretability in vision DNNs. 亮点:在这项工作中,我们提出了第一种可扩展的方法来测量视觉深度神经网络中的每单位可解释性。 |
Roland S. Zimmermann; David Klindt; Wieland Brendel; 罗兰·S·齐默尔曼;大卫·克林特;维兰德·布伦德尔; |
465 | Spatio-Spectral Graph Neural Networks 时空谱图神经网络 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, key limitations of *ℓ*-step MPGNNs are that their receptive field is typically limited to the *ℓ*-hop neighborhood of a node and that information exchange between distant nodes is limited by over-squashing. Motivated by these limitations, we propose *Spatio-Spectral Graph Neural Networks (S²GNNs)* – a new modeling paradigm for Graph Neural Networks (GNNs) that synergistically combines spatially and spectrally parametrized graph filters. 亮点:然而,*ℓ*-步 MPGNN 的主要局限性在于它们的感受野通常仅限于节点的*ℓ*-跳邻域,并且远程节点之间的信息交换受到过度压缩的限制。基于这些局限性,我们提出了*时空谱图神经网络(S²GNNs)*——一种新的图神经网络(GNN)建模范式,协同结合了空间和谱参数化的图滤波器。 |
Simon Geisler; Arthur Kosmala; Daniel Herbst; Stephan Günnemann; 西蒙·盖斯勒;亚瑟·科斯马拉;丹尼尔·赫尔布斯特;斯特凡·居内曼 |
466 | No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance 没有零样本学习就没有指数数据:预训练概念频率决定多模态模型性能 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, it is unclear how meaningful the notion of zero-shot generalization is for such multimodal models, as it is not known to what extent their pretraining datasets encompass the downstream concepts targeted for during zero-shot evaluation. In this work, we ask: How is the performance of multimodal models on downstream concepts influenced by the frequency of these concepts in their pretraining datasets? 亮点:然而,目前尚不清楚零-shot 泛化的概念对这些多模态模型有多大意义,因为尚不清楚它们的预训练数据集在多大程度上涵盖了在零-shot 评估中针对的下游概念。在本研究中,我们提出:多模态模型在下游概念上的表现如何受到这些概念在其预训练数据集中频率的影响? |
Vishaal Udandarao; Ameya Prabhu; Adhiraj Ghosh; Yash Sharma; Philip Torr; Adel Bibi; Samuel Albanie; Matthias Bethge; |
467 | The Best of Both Worlds: Toward An Honest and Helpful Large Language Model 两全其美:朝着一个诚实且有帮助的大型语言模型迈进 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. 亮点:具体而言,我们提出了一种名为好奇心驱动提示的无训练方法,使得 LLMs 能够表达他们对给定查询的内部困惑和不确定性,然后优化他们的回应。 |
Gao Chujie; Qihui Zhang; Dongping Chen; Yue Huang; Siyuan Wu; Zhengyan Fu; Yao Wan; Xiangliang Zhang; Lichao Sun; 高楚杰;张启辉;陈东平;黄跃;吴思源;傅正彦;万瑶;张向亮;孙立超; |
468 | Breaking The Multi-Task Barrier in Meta-Reinforcement Learning with Transformers 打破元强化学习中多任务障碍的变压器 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: It is difficult to scale towards more general behavior without confronting challenges in multi-task optimization, but few solutions are compatible with meta-RL’s goal of learning from large training sets of unlabeled tasks. To address this challenge, we revisit the idea that multi-task RL is bottlenecked by imbalanced training losses created by uneven return scales across different tasks. 亮点:在不面对多任务优化中的挑战的情况下,向更一般的行为扩展是困难的,但很少有解决方案与元强化学习从大量无标签任务的训练集中学习的目标兼容。为了解决这个挑战,我们重新审视了多任务强化学习受到不同任务之间不均衡回报规模所产生的不平衡训练损失的瓶颈这一观点。 |
Jake Grigsby; Justin Sasek; Samyak Parajuli; Ikechukwu D. Adebi; Amy Zhang; Yuke Zhu; 杰克·格里格斯比;贾斯廷·萨塞克;萨米亚克·帕拉朱利;伊克丘克·D·阿德比;艾米·张;余克·朱; |
469 | Federated Fine-tuning of Large Language Models Under Heterogeneous Tasks and Client Resources 异构任务和客户端资源下的大型语言模型的联邦微调 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the buckets effect in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. 亮点:尽管前景可期,但由于客户端的异构资源和数据分布,它带来了重大挑战。本研究介绍了 FlexLoRA,这是一种简单而有效的LLM微调聚合方案,缓解了传统 FL 中的桶效应,该效应通过将资源丰富的客户端与资源最少的参与者的能力绑定在一起,从而限制了其潜力。 |
Jiamu Bai; Daoyuan Chen; Bingchen Qian; Liuyi Yao; Yaliang Li; 贾木 白; 道源 陈; 冰晨 钱; 柳毅 姚; 雅良 李; |
470 | Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) 使用稀疏线性概念嵌入(SpLiCE)解释 CLIP Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this work, we show that the semantic structure of CLIP’s latent space can be leveraged to provide interpretability, at no cost to downstream performance, by decomposing representations into semantic concepts. 亮点:在这项工作中,我们展示了 CLIP 潜在空间的语义结构可以被利用来提供可解释性,而不影响下游性能,通过将表示分解为语义概念。 |
Usha Bhalla; Alex Oesterling; Suraj Srinivas; Flavio Calmon; Himabindu Lakkaraju; 乌莎·巴拉;亚历克斯·奥斯特林;苏拉杰·斯里尼瓦斯;弗拉维奥·卡尔蒙;希玛宾杜·拉卡拉朱; |
471 | Invariant Tokenization for Language Model Enabled Crystal Materials Generation 语言模型驱动的晶体材料生成的不变标记化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Prior studies used the crystallographic information framework (CIF) file stream, which fails to ensure SE(3) and periodic invariance and may not lead to unique sequence representations for a given crystal structure. Here, we propose a novel method, known as Mat2Seq, to tackle this challenge. 亮点:先前的研究使用了晶体学信息框架(CIF)文件流,这无法确保 SE(3)和周期不变性,并且可能无法为给定的晶体结构提供唯一的序列表示。在这里,我们提出了一种新方法,称为 Mat2Seq,以应对这一挑战。 |
Keqiang Yan; Xiner Li; Hongyi Ling; Kenna Ashen; Carl Edwards; Raymundo Arroyave; Marinka Zitnik; Heng Ji; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji; |
472 | A Simplicity Bias in The Learning Dynamics of Transformers 变压器学习动态中的简单性偏见 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To conduct this analysis, we develop a procedure to generate \textit{clones} of a given natural language data set, which capture the interactions between tokens up to a specified order. 亮点:为了进行此分析,我们开发了一种程序来生成给定自然语言数据集的\textit{克隆},以捕捉令牌之间的交互,直到指定的阶数。 |
Riccardo Rende; Federica Gerace; Alessandro Laio; Sebastian Goldt; |
473 | CountGD: Multi-Modal Open-World Counting CountGD: 多模态开放世界计数 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: The goal of this paper is to improve the generality and accuracy of open-vocabulary object counting in images. 亮点:本文的目标是提高图像中开放词汇物体计数的普遍性和准确性。 |
Niki Amini-Naieni; Tengda Han; Andrew Zisserman; |
474 | Poseidon: Efficient Foundation Models for PDEs 波塞冬:高效的偏微分方程基础模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. 亮点:我们介绍 Poseidon,一个用于学习偏微分方程解算子的基础模型。 |
Maximilian Herde; Bogdan Raonic; Tobias Rohner; Roger Käppeli; Roberto Molinaro; Emmanuel de Bézenac; Siddhartha Mishra; 马克西米连·赫尔德; 博格丹·拉奥尼克; 托比亚斯·罗纳; 罗杰·凯佩利; 罗伯托·莫利纳罗; 埃马纽埃尔·德·贝泽纳克; 西达尔塔·米什拉; |
475 | AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback AMOR:通过过程反馈构建可适应的模块化知识代理的方案 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We present AMOR, an agent framework based on open-source LLMs, which reasons with external knowledge bases and adapts to specific domains through human supervision to the reasoning process. 亮点:我们提出了 AMOR,一个基于开源LLMs的代理框架,它通过外部知识库进行推理,并通过人类监督适应特定领域的推理过程。 |
Jian Guan; Wei Wu; zujie wen; Peng Xu; Hongning Wang; Minlie Huang; |
476 | CAT3D: Create Anything in 3D with Multi-View Diffusion Models CAT3D:使用多视角扩散模型创建任何三维物品 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. 亮点:3D 重建的进展使高质量的 3D 捕捉成为可能,但需要用户收集数百到数千张图像以创建 3D 场景。我们提出了 CAT3D,一种通过使用多视角扩散模型模拟这一现实世界捕捉过程来创建任何 3D 物体的方法。 |
Ruiqi Gao; Aleksander Holynski; Philipp Henzler; Arthur Brussee; Ricardo Martin Brualla; Pratul Srinivasan; Jonathan Barron; Ben Poole; |
477 | G2D: From Global to Dense Radiography Representation Learning Via Vision-Language Pre-training G2D:通过视觉-语言预训练从全球到密集放射摄影表示学习 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This focus hinders the learning of dense (pixel-level) visual features and is suboptimal for dense prediction tasks (e.g., medical image segmentation). To address this challenge, we propose a novel medical VLP framework, named **Global to Dense level representation learning (G2D)**, which aims to learn global and dense visual features simultaneously using only image-text pairs without extra annotations. 亮点:这种关注阻碍了密集(像素级)视觉特征的学习,并且对于密集预测任务(例如,医学图像分割)并不理想。为了解决这个挑战,我们提出了一种新颖的医学 VLP 框架,名为**Global to Dense level representation learning (G2D)**,旨在仅使用图像-文本对而无需额外注释,同时学习全局和密集视觉特征。 |
Che Liu; Cheng Ouyang; Sibo Cheng; Anand Shah; Wenjia Bai; Rossella Arcucci; 切刘;程欧阳;司博程;阿南德·沙;文佳白;罗塞拉·阿尔库奇; |
478 | HelpSteer 2: Open-source Dataset for Training Top-performing Reward Models HelpSteer 2:用于训练顶尖奖励模型的开源数据集 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4.0). 亮点:为了改善生成的响应和属性标注质量,我们发布了 HelpSteer2,这是一个许可友好的偏好数据集(CC-BY-4.0)。 |
Zhilin Wang; Yi Dong; Olivier Delalleau; Jiaqi Zeng; Gerald Shen; Daniel Egert; Jimmy Zhang; Makesh Narsimhan Sreedhar; Oleksii Kuchaiev; |
479 | Benchmarking Complex Instruction-Following with Multiple Constraints Composition 基准测试复杂指令遵循与多重约束组合 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: We propose a hierarchical taxonomy for complex instructions, including 4 constraint types, 19 constraint dimensions, and 4 composition types, and manually collect a high-quality dataset accordingly. 亮点:我们提出了一种复杂指令的分层分类法,包括 4 种约束类型、19 种约束维度和 4 种组合类型,并据此手动收集了高质量的数据集。 |
Bosi Wen; Pei Ke; Xiaotao Gu; Lindong Wu; Hao Huang; Jinfeng Zhou; Wenchuang Li; Binxin Hu; Wendy Gao; Jiaxing Xu; Yiming Liu; Jie Tang; Hongning Wang; Minlie Huang; 博思·温;裴可;肖涛·顾;林东·吴;浩·黄;金锋·周;文创·李;彬鑫·胡;温迪·高;嘉兴·徐;怡铭·刘;杰·唐;洪宁·王;敏烈·黄; |
480 | Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach 网络规模视觉实体识别:一种LLM驱动的数据方法 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Web-scale visual entity recognition, the task of associating images with their corresponding entities within vast knowledge bases like Wikipedia, presents significant challenges due to the lack of clean, large-scale training data. In this paper, we propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. 亮点:网络规模的视觉实体识别,即将图像与维基百科等庞大知识库中的相应实体关联的任务,由于缺乏干净的大规模训练数据,面临重大挑战。在本文中,我们提出了一种新颖的方法论,以策划这样的数据集,利用多模态大型语言模型(LLM)进行标签验证、元数据生成和理由解释。 |
Mathilde Caron; Alireza Fathi; Cordelia Schmid; Ahmet Iscen; |
481 | Learning to Assist Humans Without Inferring Rewards 学习在不推断奖励的情况下帮助人类 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Theoretically, our work connects ideas from information theory, neuroscience, and reinforcement learning, and charts a path for representations to play a critical role in solving assistive problems. 亮点:从理论上讲,我们的工作将信息理论、神经科学和强化学习的思想联系起来,并为表征在解决辅助问题中发挥关键作用铺平了道路。 |
Vivek Myers; Evan Ellis; Benjamin Eysenbach; Sergey Levine; Anca Dragan; |
482 | UPS: Unified Projection Sharing for Lightweight Single-Image Super-resolution and Beyond UPS:统一投影共享用于轻量级单图像超分辨率及其他应用 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we introduce a novel Unified Projection Sharing algorithm(UPS) to decouple the feature extraction and similarity modeling, achieving notable performance. 亮点:在这项工作中,我们提出了一种新颖的统一投影共享算法(UPS),以解耦特征提取和相似性建模,取得了显著的性能。 |
Kun Zhou; Xinyu Lin; Zhonghang LIU; Xiaoguang Han; Jiangbo Lu; |
483 | Full-Atom Peptide Design with Geometric Latent Diffusion 全原子肽设计与几何潜在扩散 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a generative model for full-atom Peptide design with Geometric LAtent Diffusion (PepGLAD). 亮点:在本文中,我们提出了一种用于全原子肽设计的生成模型,称为几何潜在扩散(PepGLAD)。 |
Xiangzhe Kong; Yinjun Jia; Wenbing Huang; Yang Liu; |
484 | Exocentric-to-Egocentric Video Generation 外向中心到自我中心的视频生成 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce Exo2Ego-V, a novel exocentric-to-egocentric diffusion-based video generation method for daily-life skilled human activities where sparse 4-view exocentric viewpoints are configured 360° around the scene. 亮点:我们介绍了 Exo2Ego-V,一种新颖的基于扩散的视频生成方法,旨在生成日常生活中的熟练人类活动,该方法将稀疏的 4 视角外部视点配置在场景周围的 360°。 |
Jia-Wei Liu; Weijia Mao; Zhongcong XU; Jussi Keppo; Mike Zheng Shou; 刘家伟; 毛维佳; 许忠聪; 朱西·凯波; 郑寿迈; |
485 | Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability Vista:一种具有高保真度和多功能可控性的可推广驾驶世界模型 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability. 亮点:在本文中,我们提出了 Vista,一个具有高保真度和多功能可控性的可推广驾驶世界模型。 |
Shenyuan Gao; Jiazhi Yang; Li Chen; Kashyap Chitta; Yihang Qiu; Andreas Geiger; Jun Zhang; Hongyang Li; |
486 | Most Influential Subset Selection: Challenges, Promises, and Beyond 最具影响力的子集选择:挑战、承诺与未来 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. 亮点:我们对 MISS 中现有的方法进行了全面分析,阐明了它们的优缺点。 |
Yuzheng Hu; Pingbang Hu; Han Zhao; Jiaqi Ma; |
487 | SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for The Legal Domain SaulLM-54B 和 SaulLM-141B:扩大法律领域的领域适应性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we introduce SaulLM-medium and SaulLM-large, two large language models (LLMs) families tailored for the legal sector. 亮点:在本文中,我们介绍了 SaulLM-medium 和 SaulLM-large,这两个大型语言模型(LLMs)系列专为法律行业量身定制。 |
Pierre Colombo; Telmo Pessoa Pires; Malik Boudiaf; Rui Melo; Gabriel Hautreux; Etienne Malaboeuf; Johanne Charpentier; Dominic Culver; Michael Desa; 皮埃尔·科隆博;特尔莫·佩索阿·皮雷斯;马利克·布迪亚夫;鲁伊·梅洛;加布里埃尔·奥特鲁;埃蒂安·马拉博夫;乔汉娜·沙尔潘捷;多米尼克·卡尔弗;迈克尔·德萨; |
488 | MediQ: Question-Asking LLMs for Adaptive and Reliable Medical Reasoning MediQ: 问题提问 LLMs 以实现自适应和可靠的医学推理 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: We introduce MEDIQ, a framework to simulate realistic clinical interactions, which incorporates a Patient System and an adaptive Expert System. 亮点:我们介绍 MEDIQ,一个模拟真实临床互动的框架,包含一个患者系统和一个自适应专家系统。 |
Shuyue Stella Li; Vidhisha Balachandran; Shangbin Feng; Jonathan Ilgen; Emma Pierson; Pang Wei Koh; Yulia Tsvetkov; |
489 | A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding 一种具有姿态嵌入的全球深度范围自由多视图立体变换网络 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. 亮点:在本文中,我们提出了一种新颖的多视图立体(MVS)框架,摆脱了深度范围先验。 |
Yitong Dong; Yijin Li; Zhaoyang Huang; Weikang Bian; Jingbo Liu; Hujun Bao; Zhaopeng Cui; Hongsheng Li; Guofeng Zhang; 董怡彤; 李怡金; 黄兆阳; 边伟康; 刘京博; 包胡俊; 崔兆鹏; 李洪生; 张国锋; |
490 | Breaking The False Sense of Security in Backdoor Defense Through Re-Activation Attack 通过重新激活攻击打破后门防御中的虚假安全感 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: More practically, we extend our backdoor re-activation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. 亮点:更实际地说,我们将后门重新激活扩展到黑箱场景,在该场景中,防御模型只能在推理过程中被对手查询,并开发了两种有效的方法,即基于查询的和基于转移的后门重新激活攻击。 |
Mingli Zhu; Siyuan Liang; Baoyuan Wu; 朱明利; 梁思源; 吴宝源; |
491 | InfoRM: Mitigating Reward Hacking in RLHF Via Information-Theoretic Reward Modeling InfoRM:通过信息论奖励建模减轻 RLHF 中的奖励黑客行为 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this problem from an information-theoretic perspective and propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective to filter out irrelevant information. 重点:这个问题主要来自奖励误概化,其中奖励模型(RMs)使用与人类偏好无关的虚假特征来计算奖励。在这项工作中,我们从信息论的角度解决了这个问题,并提出了一个奖励建模框架,即 InfoRM,通过引入变分信息瓶颈目标来过滤掉无关信息。 |
Yuchun Miao; Sen Zhang; Liang Ding; Rong Bao; Lefei Zhang; Dacheng Tao; 苗玉春; 张森; 丁亮; 包荣; 张乐飞; 陶大成; |
492 | Fishers and Hessians of Continuous Relaxations 连续松弛的费舍尔和海森矩阵 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we explore a technique for using the empirical Fisher matrices and Hessians of relaxations to alleviate the training bottleneck that arises from vanishing and exploding gradients in the objective function. 亮点:在本研究中,我们探讨了一种利用经验费舍尔矩阵和松弛的海森矩阵来缓解目标函数中出现的梯度消失和梯度爆炸所导致的训练瓶颈的技术。 |
Felix Petersen; Christian Borgelt; Tobias Sutter; Hilde Kuehne; Oliver Deussen; Stefano Ermon; 费利克斯·彼得森; 克里斯蒂安·博尔盖特; 托比亚斯·苏特; 希尔德·库赫内; 奥利弗·德乌森; 斯特凡诺·埃尔蒙 |
493 | TrAct: Making First-layer Pre-Activations Trainable TrAct: 使第一层预激活可训练 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. 亮点:在本研究中,我们提出对模型第一层生成的嵌入进行梯度下降。 |
Felix Petersen; Christian Borgelt; Stefano Ermon; 费利克斯·彼得森; 克里斯蒂安·博尔盖特; 斯特凡诺·埃尔蒙 |
494 | Convolutional Differentiable Logic Gate Networks 卷积可微逻辑门网络 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Recently, an approach for learning logic gate networks directly via a differentiable relaxation was proposed.Logic gate networks are faster than conventional neural network approaches because their inference only requires logic gate operators such as NAND, OR, and XOR, which are the underlying building blocks of current hardware and can be efficiently executed.We build on this idea, extending it by deep logic gate tree convolutions, logical OR pooling, and residual initializations. 亮点:最近提出了一种通过可微松弛直接学习逻辑门网络的方法。逻辑门网络比传统神经网络方法更快,因为它们的推理只需要逻辑门运算符,如 NAND、OR 和 XOR,这些是当前硬件的基本构建块,可以高效执行。我们在这个想法的基础上进行了扩展,通过深度逻辑门树卷积、逻辑 OR 池化和残差初始化。 |
Felix Petersen; Hilde Kuehne; Christian Borgelt; Julian Welzel; Stefano Ermon; 费利克斯·彼得森; 希尔德·库赫内; 克里斯蒂安·博尔盖特; 朱利安·韦尔策尔; 斯特法诺·埃尔蒙 |
495 | Zero-Shot Image Segmentation Via Recursive Normalized Cut on Diffusion Features 基于扩散特征的递归归一化切割的零样本图像分割 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we consider a diffusion UNet encoder as a foundation vision encoder and we introduce DiffCut, an unsupervised zero-shot segmentation method that solely harnesses the output features from the final self-attention block. 亮点:在本文中,我们将扩散 UNet 编码器视为基础视觉编码器,并引入 DiffCut,这是一种无监督的零-shot 分割方法,仅利用来自最终自注意力块的输出特征。 |
Paul Couairon; Mustafa Shukor; Jean-Emmanuel HAUGEARD; Matthieu Cord; Nicolas THOME; 保罗·库阿龙;穆斯塔法·舒科尔;让-埃马纽埃尔·霍热尔;马蒂厄·科尔;尼古拉·托梅; |
496 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation OmniTokenizer:一种用于视觉生成的联合图像-视频标记器 Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View 相关论文 相关专利 相关资助 相关场所 相关专家 相关代码 查看 Highlight: Based on the finding that existing tokenizers are tailored to either image or video inputs, this paper presents OmniTokenizer, a transformer-based tokenizer for joint image and video tokenization. 亮点:基于现有分词器针对图像或视频输入的发现,本文提出了 OmniTokenizer,一种用于图像和视频联合分词的基于变换器的分词器。 |
Junke Wang; Yi Jiang; Zehuan Yuan; BINGYUE PENG; Zuxuan Wu; Yu-Gang Jiang; |
497 | InstructG2I: Synthesizing Images from Multimodal Attributed Graphs InstructG2I: 从多模态属性图合成图像 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). 亮点:在本文中,我们探讨了一个被忽视但至关重要的任务 Graph2Image:从多模态属性图(MMAGs)生成图像。 |
Bowen Jin; Ziqi Pang; Bingjun Guo; Yu-Xiong Wang; Jiaxuan You; Jiawei Han; |
498 | On The Scalability of GNNs for Molecular Graphs 关于分子图的 GNN 可扩展性 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. 亮点:然而,基于结构的架构,如图神经网络(GNNs),尚未显示出规模的优势,主要是由于稀疏操作的效率较低、大数据需求以及对各种架构有效性的缺乏明确性。我们通过研究 GNNs 的扩展行为来解决这一缺陷。 |
Maciej Sypetkowski; Frederik Wenkel; Farimah Poursafaei; Nia Dickson; Karush Suri; Philip Fradkin; Dominique Beaini; |
499 | Efficient LLM Scheduling By Learning to Rank 高效的 LLM 排程通过学习排序 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Line (HOL) blocking and reduced throughput and service quality. In this paper, we reexamine this assumption — we show that, although predicting the exact generation length of each request is infeasible, it is possible to predict the relative ranks of output lengths in a batch of requests, using learning to rank. 亮点:因此,大多数 LLM 服务系统采用简单的先到先服务 (FCFS) 调度策略,这导致了队首阻塞 (HOL) 和吞吐量及服务质量的降低。在本文中,我们重新审视这一假设——我们表明,尽管预测每个请求的确切生成长度是不可行的,但可以通过学习排序来预测一批请求中输出长度的相对排名。 |
Yichao Fu; Siqi Zhu; Runlong Su; Aurick Qiao; Ion Stoica; Hao Zhang; |
500 | Antigen-Specific Antibody Design Via Direct Energy-based Preference Optimization 抗原特异性抗体设计通过直接能量基础偏好优化 Related Papers Related Patents Related Grants Related Venues Related Experts View 相关论文 相关专利 相关资助 相关场所 相关专家 查看 Highlight: In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. 亮点:在本文中,我们将抗原特异性抗体序列-结构共同设计视为一个优化问题,以特定偏好为目标,考虑理性和功能性。 |
Xiangxin Zhou; Dongyu Xue; Ruizhe Chen; Zaixiang Zheng; Liang Wang; Quanquan Gu; |
This table only includes 500 papers selected by our daily digest algorithm. To continue with the full list (~4,500 papers), please visit Paper Digest: NeurIPS-2024 (Full List).
此表仅包括我们每日摘要算法选择的 500 篇论文。要查看完整列表(约 4,500 篇论文),请访问 Paper Digest: NeurIPS-2024(完整列表)。