Taiwan Study Finds AI Can Assist Auditors with Big Data Sampling

Written by naivebayes | Published 2025/06/12
Tech Story Tags: machine-learning | naive-bayes-classification | machine-learning-in-finance | ai-for-auditing | ai-for-finance | auditing-sampling | sampling-audit-evidence | machine-classification

TLDRA Taiwanese study explores how AI-powered sampling can help auditors handle large datasets and reduce bias in financial audits.via the TL;DR App

Authors:

(1) Guang-Yih Sheu, Department of Innovative Application and Management/Accounting and Information System, Chang-Jung 6 Christian University, Tainan, Taiwan and this author contributed equally to this work ([email protected]);

(2) Nai-Ru Liu, Department of Accounting and Information System, Chang-Jung Christian University, Tainan, Taiwan ([email protected]).

Editor's note: this is part 1 of 3 of a study exploring how AI-powered sampling can help auditors handle large datasets. Read the rest below.

Abstract: Taiwan's auditors have suffered from processing excessive audit data, including drawing audit evidence. This study advances sampling techniques by integrating machine learning with sampling. This machine learning integration helps avoid sampling bias, keep randomness and variability, and target risker samples. We first classify data using a Naive Bayes classifier into some classes. Next, a user-based, item-based, or hybrid approach is employed to draw audit evidence. The representativeness index is the primary metric for measuring its representativeness. The user- based approach samples data symmetric around the median of a class as audit evidence. It may be equivalent to a combination of monetary and variable samplings. The item-based approach represents asymmetric sampling based on posterior probabilities for obtaining risky samples as audit evidence. It may be identical to a combination of non-statistical and monetary samplings. Auditors can hybridize those user-based and item-based approaches to balance representativeness and riskiness in selecting audit evidence. Three experiments show that sampling using machine learning integration has the benefits of drawing unbiased samples, handling complex patterns, correlations, and unstructured data, and improving efficiency in sampling big data. However, the limitations are the classification accuracy output by machine learning algorithms and the range of prior probabilities.

1. Introduction

Taiwan’s auditors have recently suffered from processing excessive data, including drawing audit evidence. This audit evidence refers to the information to support auditors’ findings or conclusions about those excessive data. Auditors desire assistance from emerging technologies such as machine learning algorithms or software robots in completing the sampling. The overload of sampling excessive data causes Taiwan’s small to medium accounting firms to need more young auditors to help accountants. They even ask Taiwan’s universities to provide excellent accounting students as potential employees.

This study develops a Naive Bayes classifier (e.g., [1]) as a sampling tool. It is employed to help auditors generate audit evidence from a massive volume of data. For example, enterprises employ enterprise resource planning or information management systems to manage accounting data. They output a colossal amount of data each day. For economic reasons, auditing all data is almost impossible. Auditors rely on sampling methods to generate audit evidence. It denotes that auditors audit less than 100 % of data; nevertheless, the sampling risk will occur correspondingly. It implies the likelihood that auditors’ conclusions based on samples may differ from the conclusion made from the entire data.

A previous study [2] suggested applying a classification algorithm to mitigate the sampling risk in choosing audit evidence. This published research constructed a neural network to classify data into some classes and generate audit evidence from each class. If classification results are accurate, the corresponding audit evidence is representative.

However, we may have intelligent demands in drawing audit evidence. For example, financial accounts accepting frequent transactions are risky in a money laundering problem. Criminals may own these financial accounts to receive black money. An auditor will be grateful for sampling such risky financial accounts as audit evidence. We select a Naive Bayes classifier to complete those intelligent demands of generating audit evidence since it provides the relationships between members in a class. Other alternative classification algorithms cannot provide similar relationships.

Many published studies (e.g., [3-5]) attempted to integrate machine learning with sampling; however, the research interest of most was not auditing. Their goal was to develop unique sampling methods for improving the performance of machine learning algorithms in solving specific problems (e.g., [3]). Some studies (e.g., [4]) suggested sampling with machine learning in auditing; moreover, only some researchers (e.g., [5]) have indeed implemented machine learning-based sampling in auditing.

This study starts acquiring audit evidence by appending some columns to data to store the classification results of a Naive Bayes classifier. It next classifies data into some classes. Referring to existing sampling methods, we next implement a user-based, item-based, or hybrid approach to draw audit evidence. The representativeness index [6] is the primary metric for measuring whether audit evidence is representative. The user-based approach draws samples symmetric around the median of a class. It may be equivalent to a combination of monetary and variable sampling methods [7]. The item-based approach denotes the asymmetric sampling based on posterior probabilities for detecting riskier samples. It may be equivalent to combining non-statistical and monetary sampling methods [7]. Auditors may hybridize these user- and item-based approaches to balance the representativeness and riskiness in selecting audit evidence.

The remainder of this study has five sections. Section 2 presents a review of relevant studies to this study. Section 3 shows an integration of a Naive Bayes classifier with sampling. Section 4 presents three experiments for testing the resulting works in Section 3. Section 5 discusses the experimental results. Based on the previous two sections, Section 6 lists this study’s conclusion and concluding remarks.

This paper is available on arxiv under Attribution-NonCommercial-ShareAlike 4.0 International license.


Written by naivebayes | Naive Bayes
Published by HackerNoon on 2025/06/12