This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Jakub DRÁPAL, Institute of State and Law of the Czech Academy of Sciences, Czechia, Institute of Criminal Law and Criminology, Leiden University, the Netherlands;
(2) Hannes WESTERMANN, Cyberjustice Laboratory, Université de Montréal, Canada;
(3) Jaromir SAVELKA, School of Computer Science, Carnegie Mellon University, USA.
Conclusions, Future Work and References
We proposed a novel LLM-powered framework supporting thematic analysis, and evaluated its performance on an analysis of criminal courts’ opinions focused on the categories of thefts in Czechia.
We found that the initial coding of data was performed with reasonable quality (RQ1), and further improved when expert feedback was provided (RQ2). The performance on zero-shot classification of the data (facts descriptions) in terms of themes (categories of theft) was promising (RQ3) but could likely benefit from expert feedback (future work).
The evaluation of the end-to-end performance of the pipeline on discovering and predicting themes suggested viability of the proposed framework (RQ4) while highlighting the importance of subject matter expert supervision.
Besides incremental improvements, the future work should focus on extending the support beyond phases 2 and 3 of thematic analysis and validating the findings of this study in other domains beyond court opinions and/or criminal law.
[1] Westermann H, Savelka J, Walker VR, Ashley KD, Benyekhlef K. Computer-Assisted Creation of Boolean Search Rules for Text Classification in the Legal Domain. In: JURIX; 2019. p. 123-32.
[2] Branting LK, Pfeifer C, Brown B, Ferro L, Aberdeen J, Weiss B, et al. Scalable and explainable legal prediction. Artificial Intelligence and Law. 2021;29:213-38.
[3] Westermann H, Savelka J, Walker VR, Ashley KD, Benyekhlef K. Sentence embeddings and high-speed similarity search for fast computer assisted annotation of legal documents. In: JURIX. vol. 334. IOS Press; 2020. p. 164.
[4] Salaun O, Gotti F, Langlais P, Benyekhlef K. Why Do Tenants Sue Their Landlords? Answers from a Topic Model. In: JURIX. vol. 362. IOS Press; 2022. p. 113.
[5] Braun V, Clarke V. Using thematic analysis in psychology. Qualitative research in psychology. 2006;3(2):77-101.
[6] Kitsuse JI, Cicourel AV. A note on the uses of official statistics. Soc Probs. 1963;11:131.
[7] Hornle T. Moderate and non-arbitrary sentencing without guidelines: the German experience. Law & Contemp Probs. 2013;76:189.
[8] Lappi-Seppälä T, Tonry M, Frase R. Sentencing and punishment in Finland: The decline of the repressive ideal. TONRY, Michael Why punish. 2001:239-54.
[9] UNODC, LXIV. UNODC, editor. International classification of crime for statistical purposes. Vienna: United Nations Office on Drugs and Crime; 2015.
[10] National Academies of Sciences, Engineering, and Medicine and others. Modernizing crime statistics: Report 1: Defining and classifying crime. National Academies Press; 2016.
[11] De Paoli S. Can Large Language Models emulate an inductive Thematic Analysis of semi-structured interviews? arXiv:230513014. 2023.
[12] Gao J, Guo Y, Lim G, Zhan T, Zhang Z, Li TJJ, et al. CollabCoder: A GPT-Powered Workflow for Collaborative Qualitative Analysis. arXiv preprint arXiv:230407366. 2023.
[13] Gamieldien Y, Case JM, Katz A. Advancing Qualitative Analysis: An Exploration of the Potential of Generative AI and NLP in Thematic Coding. Available at SSRN 4487768. 2023.
[14] Santtila P, Ritvanen A, Mokros A. Predicting burglar characteristics from crime scene behaviour. International Journal of Police Science & Management. 2004;6(3):136-54.
[15] Higgs T, Carter AJ, Tully RJ, Browne KD. Sexual murder typologies: A systematic review. Aggression and violent behavior. 2017;35:1-12.
[16] Canter DV, Bennell C, Alison LJ, Reddy S. Differentiating sex offences: A behaviorally based thematic classification of stranger rapes. Behavioral Sciences & the Law. 2003;21(2):157-74.
[17] Gˇrivna T, Drápal J. Attacks on the confidentiality, integrity and availability of data and computer systems in the criminal case law of the Czech Republic. Digital Investigation. 2019;28:1-13.
[18] Ashley KD. Reasoning with cases and hypotheticals in HYPO. International journal of man-machine studies. 1991;34(6):753-96.
[19] Gray M, Savelka J, Oliver W, Ashley K. Toward Automatically Identifying Legally Relevant Factors. In: Legal Knowledge and Information Systems. IOS Press; 2022. p. 53-62.
[20] Westermann H, et al. Using factors to predict and analyze landlord-tenant decisions to increase access to justice. In: Proceedings of ICAIL ’19’; 2019. p. 133-42.
[21] Savelka J. Unlocking practical applications in legal domain: Evaluation of gpt for zero-shot semantic annotation of legal texts. arXiv preprint arXiv:230504417. 2023.
[22] Savelka J, Ashley KD, Gray MA, Westermann H, Xu H. Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly Specialized Domain Expertise? arXiv preprint arXiv:230613906. 2023.
[23] Savelka J, Ashley KD. The Unreasonable Effectiveness of Large Language Models in Zero-shot Semantic Annotation of Legal Texts. Frontiers in Artificial Intelligence. 2023;6:1279794.
[24] OpenAI R. GPT-4 technical report. arXiv. 2023:2303-08774.
[25] Jiang JA, et al. Supporting serendipity: Opportunities and challenges for Human-AI Collaboration in qualitative analysis. Proceedings of the ACM on HCI. 2021;5(CSCW1):1-23.
This paper is available on arxiv under CC 4.0 license.6