Current language models are unable to quickly learn new concepts on the fly, often requiring a more involved finetuning process to learn robustly. Prompting in-context is not robust to context distractions, and often fails to confer much information about the new concepts. Classic methods for few-shot word learning in NLP, relying on global word vectors, are less applicable to large language models. In this paper, we introduce a novel approach named CoLLEGe (Concept Learning with Language Embedding Generation) to modernize few-shot concept learning. CoLLEGe is a meta-learning framework capable of generating flexible embeddings for new concepts using a small number of example sentences or definitions. Our primary meta-learning objective is simply to facilitate a language model to make next word predictions in forthcoming sentences, making it compatible with language model pretraining. We design a series of tasks to test new concept learning in challenging real-world scenarios, including new word acquisition, definition inference, and verbal reasoning, and demonstrate that our method succeeds in each setting without task-specific training.
Imagine a student first attending a philosophy lecture on epistemology, wherein their professor discusses and critiques various philosophical positions and uses unfamiliar terms for newly-encountered concepts. After only a few examples, the student can quickly build an intuition about these new concepts and consolidate this knowledge.
In this way, humans can quickly infer the meaning of new words, even having only heard them used a few times, in settings like the philosophy lecture above or when encountering slang terms on social media, but LLMs fail at doing the same.
While humans can quickly abstract from a few examples to consolidate new knowledge, LLMs attempt to directly find and re-use information in the available context.
We:
We select words to mask with new tokens based on frequency and construct few-shot support and query sequences from a large pretraining dataset, the Pile
Training mimics pretraining, allowing transfer to diverse tasks zero-shot, with no additional finetuning to transfer to our tasks
We train by minimizing cross entropy losses on positive and negative examples as well as distilling from the base LLM
CoLLEGe can learn general purpose embeddings for new concepts which outperform In-Context Learning on all tasks without additional finetuning. Using task-general training, CoLLEGe can solve GRE problems involving new concepts, identify unknown slang terms, and generate definitions with only a few examples.
This opens the door for future work focusing on online continual concept acquisition, which consists of incrementally identifying and compressing concepts from a stream of experience.
@inproceedings{
teehan2024college,
title={Co{LLEG}e: Concept Embedding Generation for Large Language Models},
author={Ryan Teehan and Brenden Lake and Mengye Ren},
booktitle={First Conference on Language Modeling},
year={2024},
url={https://openreview.net/forum?id=Fkr1yVUb9G}
}