This story draft by @escholar has not been reviewed by an editor, YET.

Indicators for detecting under-trained tokens and Verification of candidate tokens

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Methods

    2.1 Tokenizer analysis

    2.2 Indicators for detecting under-trained tokens and 2.3 Verification of candidate tokens

  2. Results

    3.1 Effectiveness of indicators and verification

    3.2 Common observations

    3.3 Model-specific observations

  3. Closed-source models

  4. Discussion, Acknowledgments, and References


A. Verification details

B. A short primer on UTF-8 encoding

C. Outputs for API-based verification

2.2 Indicators for detecting under-trained tokens

We propose and use model architecture-dependent indicators to identify potentially under-trained tokens. An key distinction is made based on whether a model uses the same matrix for its token embeddings E and the final model layer, consisting of the ‘unembedding’ matrix, U, which converts the final internal embeddings to a probability distribution over tokens.[1] Regardless of model architecture, all weights of the unembedding matrix influence the token predictions at every training step. Specifically, the training loss is minimized when the probability of unused tokens is predicted as zero, regardless of the input, making their logits converge towards −∞. The model can achieve such an input-independent prediction by a constant vector in the residual stream, and the negative of this vector in rows of the unembedding matrix, resulting in a constant negative contribution to the logit values of unused tokens. Using this intuition, we can find unused tokens from the unembedding weights as follows:


2.3 Verification of candidate tokens

Our proposed indicators naturally provide a ranking of candidate under-trained tokens, but do not give a definitive threshold, and their relative simplicity is likely to result in a somewhat noisy relation between indicator and model behaviour. To confirm that candidate tokens indeed induce unwanted model outputs, we verify all tokens which rank among the most likely 2% according to the chosen indicator, excluding partial UTF-8 sequences and unreachable tokens. This verification process involves constructing specific repetitive prompts that induces a high output probability for normal tokens, and checking if a candidate token has a very low output probability (see Appendix A for details).


Authors:

(1) Sander Land, Cohere s([email protected]);

(2) Max Bartolo, Cohere ([email protected]).


This paper is available on arxiv under CC BY-SA 4.0 DEED license.

[1] We assume the conventional final layer structure, consisting solely of the unembedding matrix without a bias.

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks