A quick guide to Amazon’s 50-plus papers at EMNLP 2024

digitado ⋅ 14 de November de 2024

A quick guide to Amazons 50-plus papers at EMNLP 2024

Large language models predominate, both as a research subject themselves and as tools for researching topics of particular interest to Amazon, such as speech, recommendations, and information retrieval.

Conversational AI

November 14, 05:41 PM November 14, 06:03 PM

Large language models (LLMs) have come to dominate the field of natural-language processing, so its no surprise that they also dominate the research that Amazon scientists are presenting at this years Conference on Empirical Methods in Natural-Language Processing (EMNLP). LLM training is the topic with the greatest number of Amazon papers, followed closely by strategies for mitigating misinformation in LLMs outputs including but not limited to hallucinations. At the same time, a number of papers apply LLMs to topics of traditional interest at Amazon, such as speech, recommender systems, and information retrieval. (Papers marked with asterisks were accepted to Findings of EMNLP.)

AI agents

MARCO: Multi-agent real-time chat orchestration

Anubhav Shrimal, Shervin Malmasi, Kriti Biswas, Swarnalatha Raghuraman, Anish Nediyanchath, Yi Zhang, Promod Yenigalla

Code generation

CodeFort: Robust training for code generation models

Yuhao Zhang, Shiqi Wang, Haifeng Qian, Zijian Wang, Mingyue Shang, Linbo Liu, Sanjay Krishna Gouda, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

Socratic human feedback (SoHF): Expert steering strategies for LLM code generation

Subramanian Chidambaram, Erran Li, Min Bai, Xiaopeng LI, Kaixiang Lin, Xiong Zhou, Alex C. Williams

Structured object language modeling (SoLM): Native structured objects generation conforming to complex schemas with self-supervised denoising

Amir Tavanaei, Kee Kiat Koo, Hayreddin Ceker, Shaobai Jiang, Qi Li, Julien Han, Karim Bouyarmane

Contrastive decoding

Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM

Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung

Given a simple question with clues, contrastive decoding could have an obvious blindness (e.g., assigning higher probability to an uncommon answer, such as “invertebrate”, than to the most obvious answer, “bees”). In contrast, the asymptotic probability decoding proposed in “<a href=”https://www.amazon.science/publications/explaining-and-improving-contrastive-decoding-by-extrapolating-the-probabilities-of-a-huge-and-hypothetical-lm” data-cms-id=”00000192-9c18-df96-abb7-fffa8e510000″ data-cms-href=”https://www.amazon.science/publications/explaining-and-improving-contrastive-decoding-by-extrapolating-the-probabilities-of-a-huge-and-hypothetical-lm” link-data=”{"cms.site.owner":{"_ref":"0000016e-17e7-d263-a5fe-fff724f30000","_type":"ae3387cc-b875-31b7-b82d-63fd8d758c20"},"cms.content.publishDate":1731612347770,"cms.content.publishUser":{"_ref":"0000017f-b709-d2ad-a97f-f7fd25e30000","_type":"6aa69ae1-35be-30dc-87e9-410da9e1cdcc"},"cms.content.updateDate":1731612347770,"cms.content.updateUser":{"_ref":"0000017f-b709-d2ad-a97f-f7fd25e30000","_type":"6aa69ae1-35be-30dc-87e9-410da9e1cdcc"},"rekognitionVideo.timeFrameMetadata":[],"link":{"rekognitionVideo.timeFrameMetadata":[],"attributes":[],"item":{"_ref":"00000192-9c18-df96-abb7-fffa8e510000","_type":"91d74bfc-4a20-30f0-8926-e52f02f15c04"},"_id":"00000193-2c23-d7ff-a7b7-7dfb5c260000","_type":"c3f0009d-3dd9-3762-acac-88c3a292c6b2"},"linkText":"<b class="rte2-style-bold">Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM</b>","theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template":null,"_id":"00000193-2c23-d7ff-a7b7-7dfb5c1c0000","_type":"809caec9-30e2-3666-8b71-b32ddbffc288"}”>Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM</a><b>”</b> correctly assigns the highest probability to “bees” by leveraging the probabilities from multiple LMs of different sizes.

Data integration

ASTRA: Automatic schema matching using machine translation

Tarang Chugh, Deepak Zambre

Learning from natural language explanations for generalizable entity matching

Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris (Luyang) Kong

Pretraining and finetuning language models on geospatial networks for accurate address matching

Saket Maheshwary, Arpan Paul, Saurabh Sohoney

Retrieval augmented spelling correction for e-commerce applications

Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts

Dataset distillation

Textual dataset distillation via language model embedding

Yefan Tao, Chris (Luyang) Kong, Andrey Kan, Laurent Callot

Document understanding

DocKD: Knowledge distillation from LLMs for open-world document understanding models

Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto

Information retrieval

Evaluating D-MERIT of partial-annotation on information retrieval

Royi Rassin, Yaron Fairstein, Oren Kalinsky, Guy Kushilevitz, Nachshon Cohen, Alexander Libov, Yoav Goldberg

Identifying high consideration e-commerce search queries

Zhiyu Chen, Jason Choi, Besnik Fetahu, Shervin Malmasi

Learning when to retrieve, what to rewrite, and how to respond in conversational QA*

Nirmal Roy, Leonardo Ribeiro, Rexhina Blloshmi, Kevin Small

Natural-language understanding

Intent detection in the age of LLMs

Gaurav Arora, Shreya Jain, Srujana Merugu

Predicting entity salience in extremely short documents

Ben Bullough, Harrison Lundberg, Chen Hu, Weihang Xiao

LLM evaluation

AXCEL: Automated eXplainable consistency evaluation using LLMs*

P Aditya Sreekar, Sahil Verma, Suransh Chopra, Sarik Ghazarian, Abhishek Persad, Narayanan Sadagopan

Precise model benchmarking with only a few observations

Riccardo Fogliato, Pratik Patil, Nil-Jana Akpinar, Mathew Monfort

LLM fine tuning

AdaZeta: Adaptive zeroth-order tensor-train adaption for memory-efficient large language models fine-tuning

Yifan Yang, Kai Zhen, Ershad Banijamali, Thanasis Mouchtaris, Zheng Zhang

RoseLoRA: Row and column-wise sparse low-rank adaptation of pre-trained language model for knowledge editing and fine-tuning

Haoyu Wang, Tianci Liu, Ruirui Li, Monica Cheng, Tuo Zhao, Jing Gao

The row- and column-wise sparse low-rank adaptation (RoseLoRA) framework proposed in “<a href=”https://www.amazon.science/publications/roselora-row-and-column-wise-sparse-low-rank-adaptation-of-pre-trained-language-model-for-knowledge-editing-and-fine-tuning”>RoseLoRA: Row and column-wise sparse low-rank adaptation of pre-trained language model for knowledge editing and fine-tuning</a>”.

LLMs for speech

Speechworthy instruction-tuned language models

Hyundong Cho, Nicolaas Jedema, Leonardo Ribeiro, Karishma Sharma, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May

LLM misinformation mitigation

ECON: On the detection and resolution of evidence conflicts

Cheng Jiayang, Chunkit Chan, Qianqian Zhuang, Lin Qiu, Tianhang Zhang, Tengxiao Liu, Yangqiu Song, Yue Zhang, Pengfei Liu, Zheng Zhang

Generative subgraph retrieval for knowledge graphgrounded dialog generation

Jinyoung Park, Minseok Joo, Joo-Kyung Kim, Hyunwoo J. Kim

HalluMeasure: Fine-grained hallucination measurement using chain-of-thought reasoning

Shayan Ali Akbar, Md Mosharaf Hossain, Tess Wood, Si-Chi Chin, Erica Salinas, Victor Alvarez, Erwin Cornejo

Knowledge-centric hallucination detection

Xiangkun Hu, Dongyu Ru, Lin Qiu, Qipeng Guo, Tianhang Zhang, Yang Xu, Yun Luo, Pengfei Liu, Zheng Zhang, Yue Zhang

LLM reasoning

Auto-evolve: Enhancing large language models performance via self-reasoning framework*

Krishna Aswani, Alex Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse

LLM self-correction

LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints

Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, Nanyun Peng

In the DeCRIM pipeline proposed in “<a href=”https://www.amazon.science/publications/llm-self-correction-with-decrim-decompose-critique-and-refine-for-enhanced-following-of-instructions-with-multiple-constraints” data-cms-id=”00000192-9b29-da3a-afb3-bbe947d50000″ data-cms-href=”https://www.amazon.science/publications/llm-self-correction-with-decrim-decompose-critique-and-refine-for-enhanced-following-of-instructions-with-multiple-constraints” link-data=”{"cms.site.owner":{"_ref":"0000016e-17e7-d263-a5fe-fff724f30000","_type":"ae3387cc-b875-31b7-b82d-63fd8d758c20"},"cms.content.publishDate":1731612635428,"cms.content.publishUser":{"_ref":"0000017f-b709-d2ad-a97f-f7fd25e30000","_type":"6aa69ae1-35be-30dc-87e9-410da9e1cdcc"},"cms.content.updateDate":1731612635428,"cms.content.updateUser":{"_ref":"0000017f-b709-d2ad-a97f-f7fd25e30000","_type":"6aa69ae1-35be-30dc-87e9-410da9e1cdcc"},"rekognitionVideo.timeFrameMetadata":[],"link":{"rekognitionVideo.timeFrameMetadata":[],"attributes":[],"item":{"_ref":"00000192-9b29-da3a-afb3-bbe947d50000","_type":"91d74bfc-4a20-30f0-8926-e52f02f15c04"},"_id":"00000193-2c27-d56d-a7bf-feafd0c10000","_type":"c3f0009d-3dd9-3762-acac-88c3a292c6b2"},"linkText":"<b class="rte2-style-bold">LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints</b>","theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.enhancementAlignment":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs.overlayText":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.hbs._template":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.enhancementAlignment":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs.overlayText":null,"theme.0000016e-17e8-d263-a5fe-fff8347d0000.:core:enhancement:Enhancement.amp.hbs._template":null,"_id":"00000193-2c27-d56d-a7bf-feafd0bc0000","_type":"809caec9-30e2-3666-8b71-b32ddbffc288"}”>LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints</a>”, an LLM first generates a response to a user request. The Decomposer then breaks down the request into granular constraints, and the Critic model gives feedback on whether the response meets those constraints. If it does, the response is output; if not, the LLM uses the feedback to refine the response.

LLM training

Dancing in chains: Reconciling instruction following and faithfulness in language models

Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

DEM: Distribution edited model for training with mixed data distributions

Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha

The distribution-edited model <i>(<sub>D</sub>)</i> described in “<a href=”https://www.amazon.science/publications/dem-distribution-edited-model-for-training-with-mixed-data-distributions”>DEM: Distribution edited model for training with mixed data distributions</a>” results from fine-tuning a pretrained model <i>()</i> on <i>n</i> individual data distributions <i>(D<sub>i</sub>)</i> and combining the resulting models with basic element-wise vector operations. Here, the extracted distribution vectors <i>(<sub>Di</sub> )</i> are multiplied by weight coefficients, and the weighted sum is added to the base model.

Evolutionary contrastive distillation for language model alignment

Julian Katz-Samuels, Zheng Li, Hyokun Yun, Priyanka Nigam, Yi Xu, Vaclav Petricek, Bing Yin, Trishul Chilimbi

Hop, skip, jump to convergence: Dynamics of learning rate transitions for improved training of large language models

Shreyas Subramanian, Vignesh Ganapathiraman, Corey Barrett

Learning from relevant subgoals in successful dialogs using iterative training for task-oriented dialog systems

Magdalena Kaiser, Patrick Ernst, Gyuri Szarvas

Quality matters: Evaluating synthetic data for tool-using LLMs

Shadi Iskander, Nachshon Cohen, Zohar Karnin, Ori Shapira, Sofia Tolmach

Query autocompletion

AmazonQAC: A large-scale, naturalistic query autocomplete dataset

Dante Everaert, Rohit Patki, Tianqi Zheng, Christopher Potts

DiAL: Diversity aware listwise ranking for query auto-complete

Sonali Singh, Sachin Farfade, Prakash Mandayam Comar

Question answering

RAG-QA arena: Evaluating domain robustness for long-form retrieval-augmented question answering

Rujun Han, Yuhao Zhang, Peng Qi, Yumo Xu, Jenyuan Wang, Lan Liu, William Yang Wang, Bonan Min, Vittorio Castelli

Retrieving contextual information for long-form question answering using weak supervision

Philipp Christmann, Svitlana Vakulenko, Ionut Teodor Sorodoc, Bill Byrne, Adri de Gispert

Recommender systems

Efficient pointwise-pairwise learning-to-rank for news recommendation

Nithish Kannen Senthilkumar, Yao Ma, Gerrit van den Burg, Jean Baptiste Faddoul

An illustration of the GLIMPSE framework proposed in “<a href=”https://www.amazon.science/publications/efficient-pointwise-pairwise-learning-to-rank-for-news-recommendation”>Efficient pointwise-pairwise learning-to-rank for news recommendation</a>”. GLIMPSE adopts a multitask approach in which a pretrained language model is fine-tuned on both the relevance prediction task and the pairwise-preference task. During inference, the relevance predictions are used to produce an initial pointwise ranking, which is subsequently improved by one or more right-to-left (RTL) passes using pairwise comparisons.

PEARL: Preference extraction with exemplar augmentation and retrieval with LLM agents

Vijit Malik, Akshay Jagatap, Vinayak Puranik, Anirban Majumder

Sequential LLM framework for fashion recommendation

Han Liu, Xianfeng Tang, Tianlang Chen, Jiapeng Liu, Indu Indu, Henry Peng Zou, Peng Dai, Roberto Fernandez Galan, Mike Porter, Dongmei Jia, Ning Zhang, Lian Xiong