Table of Links Abstract and Introduction


Related Works
2.1 Code LLMs
2.2 Quantization
2.3 Evaluation benchmarks for code LLMs and 2.4 Evaluation metrics
2.5 Low- and high-resource languages


Methodology
3.1 Run-time environment
3.2 Choice of LLMs
3.3 Choice of benchmarks
3.4 Evaluation procedure
3.5 Model parameters and 3.6 Source code and data


Evaluation
4.1 Pass@1 rates
4.2 Errors
4.3 Inference time
4.4 Lines of code and 4.5 Comparison with FP16 models


Discussion


Conclusions and References Abstract and Introduction Abstract and Introduction Abstract and Introduction Related Works
2.1 Code LLMs
2.2 Quantization
2.3 Evaluation benchmarks for code LLMs and 2.4 Evaluation metrics
2.5 Low- and high-resource languages Related Works 2.1 Code LLMs 2.1 Code LLMs 2.2 Quantization 2.2 Quantization 2.3 Evaluation benchmarks for code LLMs and 2.4 Evaluation metrics 2.3 Evaluation benchmarks for code LLMs and 2.4 Evaluation metrics 2.5 Low- and high-resource languages 2.5 Low- and high-resource languages Methodology
3.1 Run-time environment
3.2 Choice of LLMs
3.3 Choice of benchmarks
3.4 Evaluation procedure
3.5 Model parameters and 3.6 Source code and data Methodology 3.1 Run-time environment 3.1 Run-time environment 3.2 Choice of LLMs 3.2 Choice of LLMs 3.3 Choice of benchmarks 3.3 Choice of benchmarks 3.4 Evaluation procedure 3.4 Evaluation procedure 3.5 Model parameters and 3.6 Source code and data 3.5 Model parameters and 3.6 Source code and data Evaluation
4.1 Pass@1 rates
4.2 Errors
4.3 Inference time
4.4 Lines of code and 4.5 Comparison with FP16 models Evaluation 4.1 Pass@1 rates 4.1 Pass@1 rates 4.2 Errors 4.2 Errors 4.3 Inference time 4.3 Inference time 4.4 Lines of code and 4.5 Comparison with FP16 models 4.4 Lines of code and 4.5 Comparison with FP16 models Discussion Discussion Discussion Conclusions and References Conclusions and References Conclusions and References 4.4 Lines of code As shown in Fig. 6, lines of codes generated by the models do not differ much between the quantization levels. In generating incorrect solutions, CodeQwen and CodeGemma tended to be more verbose. The correct solutions in HumanEval require more lines of code than in the other two benchmarks. Interestingly, for the correct solutions, MBPP requires slightly more lines of code than MCEVAL while needing less inference time (Fig. 4). Overall, there is no effect of quantization on the number of lines of code generated. However, as depicted by Fig. 7, the time required to generate the same number of lines of code increases with higher precision quantization. This is observed for both the correct and incorrect solutions. This indicates that the increase in inference time in higher precision models is mainly due to longer forward pass time (calculations at the layers) rather than longer output generation time. In simpler terms, the higher precision models spend more time ‘thinking’ before generating output. However, this additional thinking time does not effectively translate into better performance when the 4-bit and 8-bit models are compared (Fig. 1). 4.5 Comparison with FP16 models Instead of using quantized models, it may be better to use a non-quantized model but with a smaller number of parameters. For this reason, we raised the research question RQ4. We performed the same tests on DeepSeek Coder 1.3B Instruct, CodeGemma 2B, and StarCoder2 3B. The three models were tested at half-precision (FP16). The storage requirements for these models are 2.69GB, 4.40GB, and 6.06GB respectively. When loaded into memory, these models require 2.53GB, 4.44GB, and 5.79GB respectively. These sizes roughly correspond to the sizes of 2-bit, 4-bit, and 8-bit models. No low-parameter models were available for CodeLLama and CodeQwen. As Fig. 8 suggests, the low-parameter models at the FP16 half-precision performed roughly at the level of 2-bit quantized models. The low-parameter models performed considerably worse than the 4-bit quantized models. Author:
(1) Enkhbold Nyamsuren, School of Computer Science and IT University College Cork Cork, Ireland, T12 XF62 (enyamsuren@ucc.ie). Author: Author: (1) Enkhbold Nyamsuren, School of Computer Science and IT University College Cork Cork, Ireland, T12 XF62 (enyamsuren@ucc.ie). This paper is available on arxiv under CC BY-SA 4.0 license. This paper is available on arxiv under CC BY-SA 4.0 license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Do Smaller, Full-Precision Models Outperform Quantized Code Models?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Review of Top Open-Source Code LLMs and Quantization Techniques

Universities Need to Change Before AI Crushes Enrollments: Here's Why

Can LLMs Run on Your Laptop? A Study on Quantized Code Models

A Review of Top Open-Source Code LLMs and Quantization Techniques

Evaluation Benchmarks for Code LLMs

Running Quantized Code Models on a Laptop Without a GPU

A Review of Top Open-Source Code LLMs and Quantization Techniques

Universities Need to Change Before AI Crushes Enrollments: Here's Why

Can LLMs Run on Your Laptop? A Study on Quantized Code Models

A Review of Top Open-Source Code LLMs and Quantization Techniques

Evaluation Benchmarks for Code LLMs

Running Quantized Code Models on a Laptop Without a GPU

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps