CoLLEGe: Concept Embedding Generation for Large Language Models

New York University
COLM 2024

Abstract

Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language models. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a language model to make next word predictions in forthcoming sentences, making it compatible with language model pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training.

Building Machines that Learn Concepts Quickly

Imagine a student first attending a philosophy lecture on epistemology, wherein their professor discusses and critiques various philosophical positions and uses unfamiliar terms for newly-encountered concepts. After only a few examples, the student can quickly build an intuition about these new concepts and consolidate this knowledge.

In this way, humans can quickly infer the meaning of new words, even having only heard them used a few times, in settings like the philosophy lecture above or when encountering slang terms on social media, but LLMs fail at doing the same.

While humans can quickly abstract from a few examples to consolidate new knowledge, LLMs attempt to directly find and re-use information in the available context.

We:

  • Develop a simple add-on learnable module for few-shot, LLM concept learning, which transfers to diverse tasks with no additional finetuning
  • Build a few-shot concept learning dataset from a large pretraining dataset (The Pile)
  • Present three challenging datasets for new concept learning, CoLLEGe-GRE, CoLLEGe-DefGen, and CoLLEGe-Slang, used to measure the effectiveness of few-shot concept learning methods for LLMs. These datasets test both general and complex concept knowledge, naturalistic acquisition of new concepts, and relational abstraction

Method

CoLM Paper Architecture
We design a few-shot learning method which generates a new concept embedding using support sequences containing a new concept token. This embedding is optimized to allow a frozen pretrained LLM to model a query sequence containing the same new token.

Task-General Training for Zero-Shot Transfer

We select words to mask with new tokens based on frequency and construct few-shot support and query sequences from a large pretraining dataset, the Pile

Training mimics pretraining, allowing transfer to diverse tasks zero-shot, with no additional finetuning to transfer to our tasks

We train by minimizing cross entropy losses on positive and negative examples as well as distilling from the base LLM

CoLM Losses
We optimize using a combination of cross entropy losses and distillation losses.
CoLM Data
Our dataset consists of few-shot support and query sequences adapted the Books3, Books2, and Pile-CC subsets of The Pile.

With CoLLEGe we can...

Solve GRE Problems...

CoLM Paper Architecture
Second Image Description

Identify Twitter Slang...

CoLM Paper Architecture

& Define New Concepts

CoLLEGe Willies Example
Second Image Description

Conclusion & Future Work

CoLLEGe can learn general purpose embeddings for new concepts which outperform In-Context Learning on all tasks without additional finetuning. Using task-general training, CoLLEGe can solve GRE problems involving new concepts, identify unknown slang terms, and generate definitions with only a few examples.

This opens the door for future work focusing on online continual concept acquisition, which consists of incrementally identifying and compressing concepts from a stream of experience.

BibTeX

@inproceedings{
teehan2024college,
title={Co{LLEG}e: Concept Embedding Generation for Large Language Models},
author={Ryan Teehan and Brenden Lake and Mengye Ren},
booktitle={First Conference on Language Modeling},
year={2024},
url={https://openreview.net/forum?id=Fkr1yVUb9G}
}