A quick guide to Amazon's papers at NeurIPS 2024


A quick guide to Amazon’s papers at NeurIPS 2024

While large language models and other foundation models are well represented, traditional Amazon interests such as bandit problems and new topics such as AI for automated reasoning also get their due.

Machine learning

The 2024 Conference on Neural Information Processing Systems (NeurIPS) the premier conference in the field of AI begins today, and the Amazon papers accepted there display the breadth of the companys AI research.

Large language models (LLMs) and other foundation models have dominated the field for the past few years, and Amazons papers reflect that trend, covering topics such as retrieval-augmented generation, the use of LLMs for code generation, commonsense reasoning, and multimodal models. Training methodology also emerges as an area of focus, with papers on memory-efficient training, reinforcement learning with human feedback, classification with rejection, and convergence rates in transformer models.

But Amazons papers also demonstrate an abiding interest in topics such as bandit problems long a staple of Amazons NeurIPS submissions and speech processing, as well as newer concerns such as the applications of machine learning to scientific computing and automated reasoning. And one paper, BMOJO: Hybrid state space realizations of foundation models with eidetic and fading memory, proposes a new paradigm of machine learning, rooted in the concept of transductive learning.

Automated reasoning

Neural model checking

Mirco Giacobbe, Daniel Kroening, Abhinandan Pal, Michael Tautschnig

Bandit problems

Adaptive experimentation when you cant experiment

Yao Zhao, Kwang-Sung Jun, Tanner Fiez, Lalit Jain

Online posterior sampling with a diffusion prior

Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song

Code generation

Training LLMs to better self-debug and explain code

Nan Jiang, Xiaopeng LI, Shiqi Wang, Qiang Zhou, Baishakhi Ray, Varun Kumar, Xiaofei Ma, Anoop Deoras

The data collection and model-training framework proposed in “<a href=”https://www.amazon.science/publications/training-llms-to-better-self-debug-and-explain-code” data-cms-id=”00000192-fd24-d1ad-a9f2-fded30910000″ data-cms-href=”https://www.amazon.science/publications/training-llms-to-better-self-debug-and-explain-code” link-data=”{&quot;cms.site.owner&quot;:{&quot;_ref&quot;:&quot;0000016e-17e7-d263-a5fe-fff724f30000&quot;,&quot;_type&quot;:&quot;ae3387cc-b875-31b7-b82d-63fd8d758c20&quot;},&quot;cms.content.publishDate&quot;:1733775131704,&quot;cms.content.publishUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;cms.content.updateDate&quot;:1733775131704,&quot;cms.content.updateUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;link&quot;:{&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;attributes&quot;:[],&quot;item&quot;:{&quot;_ref&quot;:&quot;00000192-fd24-d1ad-a9f2-fded30910000&quot;,&quot;_type&quot;:&quot;91d74bfc-4a20-30f0-8926-e52f02f15c04&quot;},&quot;_id&quot;:&quot;00000193-ad0c-db73-adf3-ff1df3cc0001&quot;,&quot;_type&quot;:&quot;c3f0009d-3dd9-3762-acac-88c3a292c6b2&quot;},&quot;linkText&quot;:&quot;Training LLMs to better self-debug and explain code&quot;,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template&quot;:null,&quot;_id&quot;:&quot;00000193-ad0c-db73-adf3-ff1df3cc0000&quot;,&quot;_type&quot;:&quot;809caec9-30e2-3666-8b71-b32ddbffc288&quot;}”>Training LLMs to better self-debug and explain code</a>”.

Commonsense reasoning

Can language models learn to skip steps?

Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Jiayang Cheng, Yue Zhang, Xipeng Qiu, Zheng Zhang

Computational fluid dynamics

WindsorML: High-fidelity computational fluid dynamics dataset for automotive aerodynamics

Neil Ashton, Jordan B. Angel, Aditya S. Ghate, Gaetan K. W. Kenway, Man Long Wong, Cetin Kiris, Astrid Walle, Danielle Maddix Robinson, Gary Page

LLM evaluation

SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models

Bardiya Akhbari, Manish Gawali, Nicholas Dronen

To evaluate LLMs’ robustness to semantic variation in set members, Amazon researchers and their colleagues created deceptive sets by sampling pairs of hypernyms (e.g., mammal and vehicle) and, from them, extracting hyponyms under three different conditions: (1) with the hyponyms as sampled; (2) with half of the set members swapped; and (3) with random sampling. LLMs exhibit a unique failure mode under the second condition (swapped), and the mean and variance in accuracy of the first condition (not swapped) are better than in the random baseline. This figure can be found in “<a href=”https://www.amazon.science/publications/setlexsem-challenge-using-set-operations-to-evaluate-the-lexical-and-semantic-robustness-of-language-models” data-cms-id=”00000192-f929-d0ee-a7d3-f92bb9390000″ data-cms-href=”https://www.amazon.science/publications/setlexsem-challenge-using-set-operations-to-evaluate-the-lexical-and-semantic-robustness-of-language-models” link-data=”{&quot;cms.site.owner&quot;:{&quot;_ref&quot;:&quot;0000016e-17e7-d263-a5fe-fff724f30000&quot;,&quot;_type&quot;:&quot;ae3387cc-b875-31b7-b82d-63fd8d758c20&quot;},&quot;cms.content.publishDate&quot;:1733775295811,&quot;cms.content.publishUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;cms.content.updateDate&quot;:1733775295811,&quot;cms.content.updateUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;link&quot;:{&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;attributes&quot;:[],&quot;item&quot;:{&quot;_ref&quot;:&quot;00000192-f929-d0ee-a7d3-f92bb9390000&quot;,&quot;_type&quot;:&quot;91d74bfc-4a20-30f0-8926-e52f02f15c04&quot;},&quot;_id&quot;:&quot;00000193-ad0f-db73-adf3-ff1f72de0001&quot;,&quot;_type&quot;:&quot;c3f0009d-3dd9-3762-acac-88c3a292c6b2&quot;},&quot;linkText&quot;:&quot;SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models&quot;,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template&quot;:null,&quot;_id&quot;:&quot;00000193-ad0f-db73-adf3-ff1f72de0000&quot;,&quot;_type&quot;:&quot;809caec9-30e2-3666-8b71-b32ddbffc288&quot;}”>SetLexSem Challenge: Using set operations to evaluate the lexical and semantic robustness of language models</a>”.

Memory management

Online weighted paging with unknown weights

Orin Levy, Aviv Rosenberg, Noam Touitou

Model architecture

BMOJO: Hybrid state space realizations of foundation models with eidetic and fading memory

Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Ben Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

Privacy

Pre-training differentially private models with limited public data

Zhiqi Bu, Xinwei Zhang, Sheng Zha, Mingyi Hong

Reconstruction attacks on machine unlearning: Simple models are vulnerable

Martin Bertran Lopez, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu

Retrieval-augmented generation (RAG)

RAGChecker: A fine-grained framework for diagnosing retrieval-augmented generation

Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, Zheng Zhang

Speech processing

CA-SSLR: Condition-aware self-supervised learning representation for generalized speech processing

Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba

The CA-SSLR scheme and its time-channel attention conditioner, as proposed in “<a href=”https://www.amazon.science/publications/ca-sslr-condition-aware-self-supervised-learning-representation-for-generalized-speech-processing” data-cms-id=”00000192-f91b-d0ee-a7d3-f91b3aa10000″ data-cms-href=”https://www.amazon.science/publications/ca-sslr-condition-aware-self-supervised-learning-representation-for-generalized-speech-processing” link-data=”{&quot;cms.site.owner&quot;:{&quot;_ref&quot;:&quot;0000016e-17e7-d263-a5fe-fff724f30000&quot;,&quot;_type&quot;:&quot;ae3387cc-b875-31b7-b82d-63fd8d758c20&quot;},&quot;cms.content.publishDate&quot;:1733775862919,&quot;cms.content.publishUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;cms.content.updateDate&quot;:1733775862919,&quot;cms.content.updateUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;link&quot;:{&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;attributes&quot;:[],&quot;item&quot;:{&quot;_ref&quot;:&quot;00000192-f91b-d0ee-a7d3-f91b3aa10000&quot;,&quot;_type&quot;:&quot;91d74bfc-4a20-30f0-8926-e52f02f15c04&quot;},&quot;_id&quot;:&quot;00000193-ad17-db73-adf3-ff17f8460001&quot;,&quot;_type&quot;:&quot;c3f0009d-3dd9-3762-acac-88c3a292c6b2&quot;},&quot;linkText&quot;:&quot;CA-SSLR: Condition-aware self-supervised learning representation for generalized speech processing&quot;,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template&quot;:null,&quot;_id&quot;:&quot;00000193-ad17-db73-adf3-ff17f8460000&quot;,&quot;_type&quot;:&quot;809caec9-30e2-3666-8b71-b32ddbffc288&quot;}”>CA-SSLR: Condition-aware self-supervised learning representation for generalized speech processing</a>”. Only the conditioner and linear projections for the decoders are trainable; all other parameters are frozen during adaptation. CA-SSLR improves SSL features by integrating intermediate LID/SV conditions, keeping pretrained parameters frozen <i>(left)</i>. The trainable time-channel attention conditioner integrates language and speaker prediction <i>(right)</i>.

Training methods

CoMERA: Computing- and memory-efficient training via rank-adaptive tensor optimization

Zi Yang, Ziyue Liu, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, Zheng Zhang

Optimal design for human preference elicitation

Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

Rejection via learning density ratios

Alexander Soen, Hisham Husain, Philip Schulz, Vu Nguyen

Unraveling the gradient descent dynamics of transformers

Bingqing Song, Boran Han, Shuai Zhang, Jie Ding, Mingyi Hong

Video

One token to seg them all: Language instructed reasoning segmentation in videos

Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Pichao Wang, Zheng Zhang, Mike Zheng Shou

The video object segmentation framework proposed in “<a href=”https://www.amazon.science/publications/one-token-to-seg-them-all-language-instructed-reasoning-segmentation-in-videos” data-cms-id=”00000193-974d-d1ea-a597-976dbb0d0000″ data-cms-href=”https://www.amazon.science/publications/one-token-to-seg-them-all-language-instructed-reasoning-segmentation-in-videos” link-data=”{&quot;cms.site.owner&quot;:{&quot;_ref&quot;:&quot;0000016e-17e7-d263-a5fe-fff724f30000&quot;,&quot;_type&quot;:&quot;ae3387cc-b875-31b7-b82d-63fd8d758c20&quot;},&quot;cms.content.publishDate&quot;:1733778225820,&quot;cms.content.publishUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;cms.content.updateDate&quot;:1733778225820,&quot;cms.content.updateUser&quot;:{&quot;_ref&quot;:&quot;0000017f-b709-d2ad-a97f-f7fd25e30000&quot;,&quot;_type&quot;:&quot;6aa69ae1-35be-30dc-87e9-410da9e1cdcc&quot;},&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;link&quot;:{&quot;rekognitionVideo.timeFrameMetadata&quot;:[],&quot;attributes&quot;:[],&quot;item&quot;:{&quot;_ref&quot;:&quot;00000193-974d-d1ea-a597-976dbb0d0000&quot;,&quot;_type&quot;:&quot;91d74bfc-4a20-30f0-8926-e52f02f15c04&quot;},&quot;_id&quot;:&quot;00000193-ad3c-db73-adf3-ff3d1b330001&quot;,&quot;_type&quot;:&quot;c3f0009d-3dd9-3762-acac-88c3a292c6b2&quot;},&quot;linkText&quot;:&quot;One token to seg them all: Language instructed reasoning segmentation in videos&quot;,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText&quot;:null,&quot;theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template&quot;:null,&quot;_id&quot;:&quot;00000193-ad3c-db73-adf3-ff3d1b330000&quot;,&quot;_type&quot;:&quot;809caec9-30e2-3666-8b71-b32ddbffc288&quot;}”>One token to seg them all: Language instructed reasoning segmentation in videos</a>”.

Video token merging for long-form video understanding

Seon Ho Lee, Jue Wang, Zhikang Zhang, David Fan, Xinyu (Arthur) Li

Vision-language models

Unified lexical representation for interpretable visual-language alignment

Yifan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He

Research areas: Machine learning

Tags: NeurIPS, Vision-language models (VLMs), Video , Retrieval-augmented generation (RAG), Large language models (LLMs), Code generation

Liked Liked