Authors: (1) Asaduz Zaman, Dept. of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Australia (asaduzzaman@monash.edu); (2) Vanessa Kellermann, Dept. of Environment and Genetics, School of Agriculture, Biomedicine, and Environment, La Trobe University, Australia (v.kellermann@latrobe.edu.au); (3) Alan Dorin, Dept. of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Australia (alan.dorin@monash.edu). Authors: Authors: (1) Asaduz Zaman, Dept. of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Australia (asaduzzaman@monash.edu); (2) Vanessa Kellermann, Dept. of Environment and Genetics, School of Agriculture, Biomedicine, and Environment, La Trobe University, Australia (v.kellermann@latrobe.edu.au); (3) Alan Dorin, Dept. of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Australia (alan.dorin@monash.edu). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction Related Works Method Results and Discussion Conclusion and References Related Works Related Works Method Method Results and Discussion Results and Discussion Conclusion and References Conclusion and References Abstract This study introduces markerless retro-identification of animals, a novel concept and practical technique to identify past occurrences of organisms in archived data, that complements traditional forward-looking chronological reidentification methods in longitudinal behavioural research. Identification of a key individual among multiple subjects may occur late in an experiment if it reveals itself through interesting behaviour after a period of undifferentiated performance. Often, longitudinal studies also encounter subject attrition during experiments. Effort invested in training software models to recognise and track such individuals is wasted if they fail to complete the experiment. Ideally, we would be able to select individuals who both complete an experiment and/or differentiate themselves via interesting behaviour, prior to investing computational resources in training image classification software to recognise them. We propose retro-identification for model training to achieve this aim. This reduces manual annotation effort and computational resources by identifying subjects only after they differentiate themselves late, or at an experiment’s conclusion. Our study dataset comprises observations made of morphologically similar reed bees (Exoneura robusta) over five days. We evaluated model performance by training on final day five data, testing on the sequence of preceding days, and comparing results to the usual chronological evaluation from day one. Results indicate no significant accuracy difference between models. This underscores retro-identification’s value in improving resource efficiency in longitudinal animal studies. This study introduces markerless retro-identification of animals, a novel concept and practical technique to identify past occurrences of organisms in archived data, that complements traditional forward-looking chronological reidentification methods in longitudinal behavioural research. Identification of a key individual among multiple subjects may occur late in an experiment if it reveals itself through interesting behaviour after a period of undifferentiated performance. Often, longitudinal studies also encounter subject attrition during experiments. Effort invested in training software models to recognise and track such individuals is wasted if they fail to complete the experiment. Ideally, we would be able to select individuals who both complete an experiment and/or differentiate themselves via interesting behaviour, prior to investing computational resources in training image classification software to recognise them. We propose retro-identification for model training to achieve this aim. This reduces manual annotation effort and computational resources by identifying subjects only after they differentiate themselves late, or at an experiment’s conclusion. Our study dataset comprises observations made of morphologically similar reed bees (Exoneura robusta) over five days. We evaluated model performance by training on final day five data, testing on the sequence of preceding days, and comparing results to the usual chronological evaluation from day one. Results indicate no significant accuracy difference between models. This underscores retro-identification’s value in improving resource efficiency in longitudinal animal studies. 1. Introduction In longitudinal behavioural studies, tracking individual subjects over time, identifying them when they first appear, and again when they re-appear in subsequent observations, is critical for understanding behaviour [2]. Re-identification (re-id) of small, visually similar animals, such as honeybees, can be supported by physical markers or tags [2, 4, 14, 15]. However, these can alter subjects’ behaviour [7]. Markerless re-id potentially enables researchers to assess study subjects’ natural behaviours [16]. However, this is difficult for highly similar individuals, such as insects, and requires algorithms to be trained, often on hand-annotated images [21]. In experiments with insects, subject attrition through death or disappearance can be high during longitudinal studies over several days. This is especially true outside controlled lab settings, where the additional issue of morphological change through wear and tear may confound efforts to re-id an individual. If a subject is lost or visually altered during an experiment, resources invested in training image classification software to recognise it will potentially be wasted. This inefficiency is worsened by the need to conduct experiments on multiple subjects in the expectation that few will survive to the end, and of those, even fewer will exhibit a particular behaviour of interest, such as learning a task, solving a puzzle or collecting a specific resource [8]. Hence, late identification of key subjects from an initial larger starting set is common in longitudinal behavioural insect studies. How can researchers avoid wasted manual image annotation and re-id model training on subjects that do not ultimately contribute useful data? Here we propose and test retro-identification (retro-id) to tackle this issue. retro Rather than follow the convention of training models on initial (day one) data and attempting to follow individuals chronologically during an experiment, we propose it can sometimes be more useful to do the reverse. That is, sometimes we should train our algorithms on late-stage experimental image data of just the key (surviving or otherwise interesting) individuals. And then we should track these key individuals retrospectively through archived image data to explore their behaviour during the experiment. This focuses attention on annotation and model training for subjects critical to a study, rather than wasting resources on subjects that may not persist or exhibit relevant behaviour. We hypothesise that a model trained on insect image data from day one and tested for its ability to re-id insects through to day N, would exhibit the same performance as a model trained on day N data and tested to retro-id insects back to day one. We tested this by monitoring 15 individual reed bees over 5 days. These semi-social pollinators have high phenotypic similarity (Figure 1) and are naturally found close to one another, even sharing nests, making re-id ecologically valuable but challenging. We trained several transfer learning-based image classification models using data from days 1 and 5, evaluating their accuracy on subsequent and preceding day sequences respectively. Below we review related work (Section 2), describe our data collection and model evaluation methods (Section 3), discuss results (Section 4) and conclude (Section 5). This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv available on arxiv