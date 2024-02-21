Search icon
ReadWrite
see notifications
Notifications
see more
    paint-brush
    The Data We Acquired From Using LLMs to Support Thematic Analysisby@textmodels
    112 reads

    The Data We Acquired From Using LLMs to Support Thematic Analysis

    by Writings, Papers and Blogs on Text ModelsFebruary 21st, 2024
    Read on Terminal Reader
    Read this story w/o Javascript
    tldt arrow

    Too Long; Didn't Read

    In our experiments, we used a dataset of 785 facts descriptions from cases of Czech courts decided in 2017. The theft in a shop (29.0%) and breaking into another object (17.5%) are the most prevalent themes. We removed 49 cases from the dataset because they were used in a pilot study or due to them containing errors.
    featured image - The Data We Acquired From Using LLMs to Support Thematic Analysis
    a mountain of research papers Image created by HackerNoon AI Image Generator
    Writings, Papers and Blogs on Text Models HackerNoon profile picture

    This paper is available on arxiv under CC 4.0 license.

    Authors:

    (1) Jakub DRÁPAL, Institute of State and Law of the Czech Academy of Sciences, Czechia, Institute of Criminal Law and Criminology, Leiden University, the Netherlands;

    (2) Hannes WESTERMANN, Cyberjustice Laboratory, Université de Montréal, Canada;

    (3) Jaromir SAVELKA, School of Computer Science, Carnegie Mellon University, USA.

    Abstract & Introduction

    Related Work

    Dataset

    Proposed Framework

    Experimental Design

    Results and Discussion

    Conclusions, Future Work and References

    3. Dataset

    In our experiments, we used a dataset of 785 facts descriptions from cases of Czech courts decided in 2017. From the Prosecution Service, we received 834 cases that found an adult defendant guilty of theft. In Czechia, theft also includes burglary and pickpocketing.[2] We slightly over-represented the most serious offenses to ensure a sufficient number of cases in the dataset.


    We removed 49 cases from the dataset because they were used in a pilot study or due to them containing errors. We extracted text describing the facts. Each extracted text was anonymized and shortened or partially re-written if necessary.


    The resulting text snippets range from 73 to 29,695 characters in length (1Q 447, median 782, 3Q 1,462 characters). Figure 2 shows an example (automated translation).


    Figure 2. The categories from the theft types dataset (shown at the top) and their distribution (right). An example of case facts description from the theft from a car category is shown on the left.


    A group of three law students under the supervision of one of the authors of this paper manually conducted an unstructured variant of thematic analysis.[3] The group arrived at 14 high-level themes focused on modus operandi and target of committed thefts (Figure 2).


    For each facts description a single theme was independently chosen by two students according to specified rules.


    The disagreements were resolved by one of the students following careful re-reading of the case. The distribution of the themes over the 785 facts descriptions included in the dataset is presented in Figure 2. The theft in a shop (29.0%) and breaking into another object (17.5%) are the most prevalent themes.

    [2] ICCS codes 0501 and 0502 except for 0502212 [9].


    [3] We did not rigorously adhere to the process described in [5].

    This paper is available on arxiv under CC 4.0 license.6

    MongoDB
    L O A D I N G
    . . . comments & more!

    About Author

    Writings, Papers and Blogs on Text Models HackerNoon profile picture
    Writings, Papers and Blogs on Text Models@textmodels
    We publish the best academic papers on rule-based techniques, LLMs, & the generation of text that resembles human text.
    Read my storiesRead My Stories

    TOPICS

    purcat-imgmachine-learning #large-language-models #using-ai-for-thematic-analysis #empirical-legal #llms-dataset #ai-experiments #future-of-ai #thematic-analysis-experiment #ai-research-data

    THIS ARTICLE WAS FEATURED IN...

    Permanent on Arweave
    Read on Terminal Reader Terminal
    Read this story w/o Javascript Lite
    Buaq

    RELATED STORIES

    Article Thumbnail
    Gemini - A Family of Highly Capable Multimodal Models: Abstract and Introduction
    by textmodels
    Dec 24, 2023
    #gemini
    Article Thumbnail
    AI Yuletide: The Twelve (Generative) Days of Christmas
    by raymondcamden
    Dec 08, 2023
    #generative-ai
    Article Thumbnail
    Using Large Language Models to Support Thematic Analysis: Acknowledgment and What Comes Next?
    by escholar
    Feb 22, 2024
    #large-language-models
    Article Thumbnail
    Our Proposed Framework: Using LLMs for Thematic Analysis
    by escholar
    Feb 21, 2024
    #large-language-models
    Article Thumbnail
    Our Experimental Design: An In-Depth Walkthrough of Our Work - Using LLMs for Thematic Analysis
    by textmodels
    Feb 22, 2024
    #llms
    Join HackerNoonloading
    Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas