This story draft by @escholar has not been reviewed by an editor, YET.

Many-Shot In-Context Learning in Multimodal Foundation Models: Related Work

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Methods and 3.1 Models

3.2 Datasets

3.3 Evaluation Metrics

4 Results and 4.1 Increasing number of demonstrating examples

4.2 Impact of batching queries

4.3 Cost and latency analysis

5 Discussion

6 Conclusion and References

A. Prompts used for ICL experiments

B. Prompt selection

C. GPT4(V)-Turbo performance under many-shot ICL

D. Performance of many-shot ICL on medical QA tasks

Acknowledgments and Disclosure of Funding

2 Related Work

Scaling ICL. The seminal work of Brown et al. [1] discovered performance improvements for LLMs from increasing the number of in-context examples, but the tested number of demonstrating examples was low (10 to 100), likely due to the restrictive context size (2048 tokens for GPT3). Increasing the number of in-context examples has only been explored recently by a few works [7–9]. Both Li et al. [7] and Agarwal et al. [8] explore scaling in-context learning to more than 1,000 demonstrating examples and find performance improvements across multiple tasks. However, their experiments are limited to text-only benchmarks and do not compare performance across different models.


Multimodal ICL. Due to the recent emergence of LMMs, research on multimodal ICL is still nascent. One prior work developed a new model to leverage complex prompts composed of multimodal inputs in order to allow models to compare images [10], while other recent works explored the generalizability of GPT-4V and Gemini to multimodal out-domain and out-of-distribution tasks, and found that ICL leads to performance benefits for both models across many tasks [6, 5]. However, none of these works have leveraged the new largely expanded context windows to investigate the effects of increasing the number of demonstrating examples.


Batch Querying. Multiple prior works have explored batching queries (also commonly referred to as batch prompting) for more efficient and cheaper inference. Batch prompting was first introduced in Cheng et al. [11], leading to comparable or better performance than single prompting, while achieving substantially reduced inference token cost and latency. Lin et al. [12] observe performance degradation with batched prompts in longer contexts, and propose a variety of techniques to mitigate the performance loss. More recently, additional variations of batch prompting have been proposed, including grouping similar questions together [13], batching prompts of different tasks [14], and concatenating multiple images into a single image collage [15]. We again note that batch prompting with high numbers of demonstrating examples and high numbers of queries has only become feasible due to larger context windows of recent models.


Authors:

(1) Yixing Jiang, Stanford University ([email protected]);

(2) Jeremy Irvin, Stanford University ([email protected]);

(3) Ji Hun Wang, Stanford University ([email protected]);

(4) Muhammad Ahmed Chaudhry, Stanford University ([email protected]);

(5) Jonathan H. Chen, Stanford University ([email protected]);

(6) Andrew Y. Ng, Stanford University ([email protected]).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks