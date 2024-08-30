Authors: (1) Suzanna Sia, Johns Hopkins University; (2) David Mueller; (3) Kevin Duh.

7. Conclusion

We demonstrate evidence that In-context Causal Decoder models locate the translation task at a specific layers during forward inference. To study this, we introduced causal masking of self-attention over the context from layer ℓ onwards (Section 3). The findings generalise across models of different sizes and in both non instruction-tuned and instruction-tuned models. We further identify certain layers as task critical, and show that this corresponds to the task recognition point of the model (Section 4.1) and is not influenced by increasing number of examples (Section 6.1) shown to the models.





Our central finding that models do not need to maintain attention over all of the context across every layer has direct implications for inference efficiency of transformers, with estimated up to 45% cost-savings for llama model with 5 examples (Section 5).









Limitations and Future Work We have conducted extensive investigations focusing on the task of translation on a high-resource language pair, with a small extension to en ↔ pt. In future work, we hope to extend this analysis to other sequence or classification tasks as well as true novel tasks.





Reproducibility The MT dataset that we use, FLORES (Goyal et al., 2021), is fully open-source and well-known in the community. Models are open-source and freely available on Huggingface (Wolf et al., 2019). We used models of "reasonable" size (3B and 7B parameters) that can be run with consumer grade GPUs, making our reproducible to most academic institutions. Code to reproduce all the experiments will be made available subsequently.





Impact Statement (Ethics and Societal Consequences) There are no known ethical concerns as these are exploratory studies on open-source LLMs.

ACKNOWLEDGMENTS

We would like to thank Daniel Kashabi and Marc Marone for feedback on earlier drafts.

