KC-EMA Mechanism: Theoretical Analysis and Derivation for IIL

Written by instancing | Published 2025/11/07
Tech Story Tags: machine-learning | knowledge-consolidation | theoretical-analysis | instance-incremental-learning | teacher-student-model | model-ema | gradient-derivation | catastrophic-forgetting

TLDRThis section provides the detailed theoretical derivation for the KC-EMA mechanism in Instance-Incremental Learningvia the TL;DR App

Abstract and 1 Introduction

  1. Related works

  2. Problem setting

  3. Methodology

    4.1. Decision boundary-aware distillation

    4.2. Knowledge consolidation

  4. Experimental results and 5.1. Experiment Setup

    5.2. Comparison with SOTA methods

    5.3. Ablation study

  5. Conclusion and future work and References

Supplementary Material

  1. Details of the theoretical analysis on KCEMA mechanism in IIL
  2. Algorithm overview
  3. Dataset details
  4. Implementation details
  5. Visualization of dusted input images
  6. More experimental results

7. Details of the theoretical analysis on KCEMA mechanism in IIL

In Sec. 4.3 of the manuscript, we theoretically analyze the feasibility of applying a model EMA-like mechanism in the IIL. Here, gives more derivation details of the Eq. 7.

The derivative of the old task(s) and the new task on the teacher model in current IIL phase is

Therefore, we get Eq. (7) and many conclusions can be drawn based on it as we present in the manuscript. Notably, as we assume the student is fully trained on new data, we set a freezing period during which we only train the student without implementing KC-EMA. In the manuscript, we empirically set the freezing period to 10 epochs.

Limitation. Our method may accumulate errors after a long consecutive IIL tasks. For example, in the ith IIL task, the old model should consider the base task and the previous i − 1 IIL task. The old task’s derivative on the parameters of teacher becomes

Authors:

(1) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(2) Weifu Fu, Tencent Youtu Lab;

(3) Yuhuan Lin, Tencent Youtu Lab;

(4) Jialin Li, Tencent Youtu Lab;

(5) Yifeng Zhou, Tencent Youtu Lab;

(6) Yong Liu, Tencent Youtu Lab;

(7) Qiang Nie, Hong Kong University of Science and Technology (Guangzhou);

(8) Chengjie Wang, Tencent Youtu Lab.


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.


Written by instancing | Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus
Published by HackerNoon on 2025/11/07