This story draft by @escholar has not been reviewed by an editor, YET.

Outputs for API-based verification

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Table of Links

Abstract and 1. Introduction

  1. Methods

    2.1 Tokenizer analysis

    2.2 Indicators for detecting under-trained tokens and 2.3 Verification of candidate tokens

  2. Results

    3.1 Effectiveness of indicators and verification

    3.2 Common observations

    3.3 Model-specific observations

  3. Closed-source models

  4. Discussion, Acknowledgments, and References


A. Verification details

B. A short primer on UTF-8 encoding

C. Outputs for API-based verification

C Outputs for API-based verification

We use the following prompt for API based testing of under-trained tokens.



Where the strings consist of the problematic token, occasionally prefixed to help identify their source, and to avoid leading spaces, as we noticed that models often fail to correctly repeat such tokens for other reasons. Although many other prompt formats are effective, we have found this code-based approach to more clearly avoid false positives.


Figure 4 shows the result for Mistral, Anthropic and OpenAI models.


(a) Mistral API prompting results.


(b) Claude API prompting results.


(c) GPT-3.5 API prompting results.


(d) GPT-4 API prompting results.


Figure 4: API prompting results.


Authors:

(1) Sander Land, Cohere s([email protected]);

(2) Max Bartolo, Cohere ([email protected]).


This paper is available on arxiv under CC BY-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks