Unlocking Novel Class Discovery: Advances in NCD Algorithms and Hyperparameter Tuningby@dataology
Unlocking Novel Class Discovery: Advances in NCD Algorithms and Hyperparameter Tuning

This article showcases advancements in Novel Class Discovery (NCD) algorithms like PBN and innovative hyperparameter tuning methods, enabling the successful resolution of NCD problems even in realistic scenarios without prior knowledge of novel classes.
(1) Troisemaine Colin, Department of Computer Science, IMT Atlantique, Brest, France., and Orange Labs, Lannion, France;

(2) Reiffers-Masson Alexandre, Department of Computer Science, IMT Atlantique, Brest, France.;

(3) Gosselin Stephane, Orange Labs, Lannion, France;

(4) Lemaire Vincent, Orange Labs, Lannion, France;

(5) Vaton Sandrine, Department of Computer Science, IMT Atlantique, Brest, France.

Abstract and Intro

Related work


Hyperparameter optimization

Estimating the number of novel classes

Full training procedure





Appendix A: Additional result metrics

Appendix B: Hyperparameters

Appendix C: Cluster Validity Indices numerical results

Appendix D: NCD k-means centroids convergence study

8 Conclusion

In this article, we have shown that in the NCD setting, unsupervised clustering algorithms can benefit from knowledge of the known classes and reliably improve their performance by implementing simple modifications. We have also introduced a novel NCD algorithm called PBN, which is characterized by its simplicity and low number of hyperparameters, which proved to be a decisive advantage under realistic conditions. In addition, we have proposed an adaptation of the k-fold cross-validation process to tune the hyperparameters of NCD methods without depending on the labels of the novel classes. Finally, we have demonstrated that the number of novel classes can be accurately estimated within the latent space of PBN. These two previous contributions have shown that the NCD problem can be solved in realistic situations where no prior knowledge of the novel classes is available during training.



Colin Troisemaine, Alexandre Reiffers-Masson, Stephane Gosselin, Vincent Lemaire and Sandrine Vaton received funding from Orange SA.

Competing Interests

Colin Troisemaine, St´ephane Gosselin and Vincent Lemaire received research support from Orange SA. Alexandre Reiffers-Masson and Sandrine Vaton received research support from IMT Atlantique.

Ethics approval

Not applicable.

All authors have read and approved the final manuscript.

Not applicable.

Availability of data and materials

All data used is this study are available publicly online. The datasets were extracted directly in the repositories available with the links in the corresponding section.

Code availability

The code for experiments is available at the following url: PracticalNCD/ECMLPKDD2024.

Authors’ contributions

Colin Troisemaine, Alexandre Reiffers-Masson, St´ephane Gosselin, Vincent Lemaire and Sandrine Vaton contributed to the manuscript equally.


This paper is available on arxiv under CC 4.0 license.