publications
2025
- From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones2025
-
- Context Length Alone Hurts LLM Performance Despite Perfect RetrievalIn Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
- spotlightThe Best Instruction-Tuning Data are Those That FitIn Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2025
- The Unreasonable Effectiveness of Entropy Minimization in LLM ReasoningIn Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2025
- Reinforcement Learning Finetunes Small Subnetworks in Large Language ModelsIn Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2025
- mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules2025
-
- Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse CapabilitiesIn Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
-
- FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMsIn Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
- A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial ContextsIn Proceedings of the International Conference on Learning Representations (ICLR), 2025
- oralRetrieval Head Mechanistically Explains Long-Context FactualityIn Proceedings of the International Conference on Learning Representations (ICLR), 2025
- OpenHands: An Open Platform for AI Software Developers as Generalist AgentsIn Proceedings of the International Conference on Learning Representations (ICLR), 2025
2024
- Free Process Rewards without Process LabelsIn Proceedings of the International Conference on Machine Learning (ICML), 2024
-
- PLUM: Preference Learning Plus Test Cases Yields Better Code Language ModelsarXiv preprint, 2024
- Source-Aware Training Enables Knowledge Attribution in Language ModelsIn Proceedings of the Conference on Language Modeling (COLM), 2024
- Examining LLMs’ Uncertainty Expression Towards Questions Outside Parametric KnowledgearXiv preprint, 2024
2023
2022
- How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained TransformersIn Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
- Modeling Context With Linear Attention for Scalable Document-Level TranslationIn Findings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
- Twist Decoding: Diverse Generators Guide Each OtherIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
- ABC: Attention with Bounded-memory ControlIn Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022
- Tailor: Generating and Perturbing Text with Semantic ControlsIn Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2022
2021
- Finetuning Pretrained Transformers into RNNsIn In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
- spotlightRandom Feature AttentionIn Proceedings of the International Conference on Learning Representations (ICLR), 2021
- Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine TranslationIn Proceedings of the International Conference on Learning Representations (ICLR), 2021
- Contextualized Perturbation for Textual Adversarial AttackIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021
- Infusing Finetuning with Semantic DependenciesTransactions of the Association for Computational Linguistics (TACL), 2021
2020
- A Mixture of h - 1 Heads is Better than h HeadsIn Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2020
2019
- PaLM: A Hybrid Parser and Language ModelIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
- RNN Architecture Learning with Sparse RegularizationIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
- Text Generation with Exemplar-based Adaptive DecodingIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019
2018
- Rational RecurrencesIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018
- best paper
honorable mentionBackpropagating through Structured Argmax using a SPIGOTIn Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2018 - Learning Joint Semantic Parsers from Disjoint DataIn Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018
- "You Are No Jack Kennedy": On Media Selection of Highlights from Presidential DebatesIn Proceedings of The Web Conference (WWW), 2018
2017
- Deep Multitask Learning for Semantic Dependency ParsingIn Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2017
2016
- A Convolutional Attention Network for Extreme Summarization of Source CodeIn Proceedings of the International Conference on Machine Learning (ICML), 2016
2015
- Discriminative Neural Sentence Modeling by Tree-Based ConvolutionIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015
- Classifying Relations via Long Short Term Memory Networks along Shortest Dependency PathsIn Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2015