Construction auditing risk detection using machine learning approaches - Construction auditing risk detection using machine learning approaches

Tóm tắt Construction auditing risk detection using machine learning approaches: ...een prevented, detected or corrected by internal audit. Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải -693- Fig 1. Audit risk detection process Based on the internal control system description, the auditor will assess whether this system is effective or not, wh...alues (2)ka (2) (2) (2) 1 , 1, , M k kj j k j a w z b k c = = + = where c is the number of outputs. These values are then passed through the output layer to produce output values , 1..ky k c= . There are several forms of activation functions. For the classification purpose, we con...he first hidden layer, eight nodes for second hidden layer and three nodes for the output layer, corresponding three levels of audit risk, low, medium and high, respectively. Activation function for each hidden layer was rectified linear unit (ReLU), and sigmoid function for the output layer. ...

9 trang | Chia sẻ: Tài Phú | Ngày: 19/02/2024 | Lượt xem: 312 | Lượt tải: 0

Nội dung tài liệu Construction auditing risk detection using machine learning approaches, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-691-
CONSTRUCTION AUDITING RISK DETECTION USING MACHINE
LEARNING APPROACHES
Cao Phuong Thao1*
1 University of Transport and Communications, No. 3 Cau Giay, Hanoi
* Corresponding author: Email: thaocp@utc.edu.vn
Abstract. Audit report plays a key role in determining the validity of final accounting
in the completion of any construction project. However, the quality of reports depends
heavily on the quality of the auditors themselves, whose variety of skill set and bias
level could lead to different assessment outcome of the accounting risk level. This
paper presents a method that automatically detects auditing risk using machine
learning approaches. The criteria to assess auditing risks will serve as inputs in the
machine learning algorithms, and the output will be the ranking of low, medium, high
level of auditing risk. The proposed two machine learning methods was tested on 80
construction projects in Vietnam and the result shows the high accuracy level of this
method in auditing risk detection.
Keywords: auditing, audit risk detection, neural network, random forest, machine
learning.
1. INTRODUCTION
The purpose of the audit is to examine and verify the truthfulness of the financial
statements provided by the accountant, thereby providing the most accurate
information about the financial situation of the organization. The final product of an
audit is a report express the auditor’s opinion about the truthfulness and fairness of the
financial statements as produced by the accountants. To do this, the auditor performs a
survey to the company or project management unit to see if the internal control system
follows the process properly. The assessment of this audit risk depends heavily on the
subjective opinion of the auditor. Therefore, if we can build an automated risk
assessment system based on objective criteria then the assessment of risk would
happen more quickly and accurately.
Recently, artificial intelligence has been applied in many fields such as financial
services, image processing, medical, natural language processing, text mining and
many others [1, 2, 3]. In [1], Bahrammirzaee (2010) had reviewed three artificial
intelligence methods applied to financial market. In another research, the same author
proposed the hybrid intelligent system for credit ranking using reasoning
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-692-
transformational models [2]. In this method, the expert system is considered as
symbolic module and artificial neural network is considered as non-symbolic module.
In [3], Kasman (2010) proposed the method using neural network with back
propagation learning algorithm to evaluate credit risk. Information technology had
employed in auditing and categorized the system into five groups according to audit
areas: data extraction and analysis, fraud detection, internal control evaluation,
electronic commerce control, and continuous monitoring [4, 5]. In [6] neural network
has proposed to classify the credit risk into good vs. bad consumer groups for the
bank. Recently, Cao et al. [7] presented a method of neural network to detect the audit
risk in three level low, medium and high level.
Although there are many research works that apply artificial intelligence in audit
and financial, these methods focus on evaluating audit risk in company, where there is
statistical & historical audit data. In this paper, we present a method of audit risk
detection in construction project using two methods of machine learning inclusing
neural network and random forest. The difference between audits in construction
project vs. other audit projects is that construction projects usually have short
execution time and the audit is implemented when the project is finished. Criteria to
assess audit risk will provide the inputs for a multi-layer perceptron neural network as
well as random forest and the output are the three level of risk include low, medium,
and high of audit risk. We test the method using data from 80 construction projects in
Vietnam. The experimental results show the efficiency of the method. In the first
section, we describe the audit process and how neural network and random forest is
applied to audit risk detection. In the next section, real data will be used to illustrate
the performance of the method. Finally, we draw conclusion of the study and
implication for future work.
2. MAIN CONTENT
8.1. Auditing risk assessment
Risk is a problem arises in all fields, each field has to develop its own unique
ways of assessing and handling this problem. In construction project, auditing risk is
associated with important errors in the final project settlement report. The final project
settlement report is very important as its job is to assess how comprehensive and
relevant are the samples that the auditor selects, how convincing the evidence
collected by the auditor, whether the project is complied with the law, or at point the
project is not in continuous operation, etc. To control this risk, the audit plan must be
appropriately established to detect fraud, risks, and potential problems and also ensure
that the audit is completed on time. Moreover, the auditor must consider and assess all
kinds of risk in construction project including potential risks, control risks and
detection risks with confidence sample. The audit risk assessment process is shown in
figure 1. This paper focuses on control risk, which is the possibility of errors occurred
that have not been prevented, detected or corrected by internal audit.
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-693-
Fig 1. Audit risk detection process
Based on the internal control system description, the auditor will assess whether
this system is effective or not, what risks that could occur and how the enterprise could
overcome the risk at some sensitive points. To assess the internal control system,
several factors need to be considered such as model, operating framework and ability
of the project management unit; financial management and accountancy; works related
to policy changes; existed findings from previous audits; errors in planning strategies;
weakness in management that leads to inadequate investment, slow progress,
outstanding investment cost, falling to meet the objective, and environmental impact
caused by the project. An assessment of control risk is to check the information of
internal control system of the project management unit such as diagram of
organizational structure, level of staffs, internal management documents, internal audit
works. Also, the auditor needs to observe activity of the unit and discuss with
managers and employees to understand the organizational characteristic, personnel
policies, qualification of the managers and employees. To do this, auditor designs the
survey form with 46 criteria, divided into five groups. They are Ability and quality of
the project management unit (PMU) director, Risk management process,
Information/report, Control operations, and Evaluation and Monitor (E & M). The
detail of the criterions are shown in [7].
These criterions will be quantification by coding in range from 0 to 1. These
values will be the inputs of neural network. Outputs of neural network are risks,
measured on three levels of audit risk as low, medium and high.
8.2. Neural network
An Artificial Neural Network (ANN) is a computational model that simulates
biological neurons and functions in the brain. Typically, an ANN has layers of
interconnected nodes. The nodes and their inter-connections are similar to the network
of neurons in the brain. Any basic ANN will always have multiple layers of nodes,
specific connection patterns and links between the layers, connection weights and
activation functions for the nodes that convert weighted inputs to outputs. The learning
process for the network typically involves a cost function and the objective is to
optimize the cost function (typically minimize the cost). The weights keep getting
updated in the process of learning.
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-694-
For the audit risk detection, we have considered them as classification problem. In
this paper, we use a multiple outputs three-layer structure of multilayer perceptron
(MLP) neural network. Although this classifier needs quite large training time but it is
able to process data and classification fast [8]. Figure 2 presents an example of MLP
structure which consists of one input layer, one hidden layer and one output layer [9].
Fig. 2. Two layer feed-forward neural network
Let
ix , i = 1..d is the input value to the network, the output forms M linear
combinations of these inputs to (1)ja as:
(1) (1) (1)
1
, 1, ,
d
j ji i j
i
a w x b j M
=
= + =
where w ji are element of the weight matrix and jb are the bias parameters
associated with the hidden unit. Also, each variable aj was associated with each hidden
unit and then transformed by the non-linear activation functions of the hidden layer.
The output of the hidden units are then given by
(1)tanh( ), 1, ,j jz a j M= =
The jz are then combined with weights and biases of the next layer to produce
values (2)ka
(2) (2) (2)
1
, 1, ,
M
k kj j k
j
a w z b k c
=
= + =
where c is the number of outputs.
These values are then passed through the output layer to produce output values
, 1..ky k c= . There are several forms of activation functions. For the classification
purpose, we consider the logistic sigmoidal activation functions as follow:
(1)
(2)
(3)
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-695-
(2)
1
1 exp( )
k
k
y
a
=
+ −
The network need to train to model the data in order to make a best predictions of
new input data. In this paper we consider the back propagation algorithm [8]. Assume
we have the target vector t for input data x, the error of the network, E, is defined as:
1 1
1
( )
2
N c
n n
k k
n k
E y t
= =
= −
Where nky is the actual value of k
th output unit for the nth input pattern, nkt is
desired value of the kth output unit for the nth input pattern.
The derivative of E with respect to the second layer weights are given detail in
[7].
The difference between the calculated output and the desired output is back-
propagated to the previous layers, usually modified by the derivative of the activate
function, and the connection weights are normally adjusted using the Delta Rule. This
process proceeds for the previous layers until the input layer is reached.
8.3. Random Forest
The Random Forest Classifier is a set of decision trees from randomly selected
subset of training set. It aggregates the votes from different decision trees to decide the
final class of the test object. The figure 3 describes the diagram of the Random Forest.
Fig. 3. The diagram of the Random Forest
Each individual tree in the random forest spits out a class prediction and the
class with the most votes becomes our model’s prediction. There are two phrases in
RF process, which are training and testing phrase. During training phrase, each trees of
random forests are built randomly using bagging. Bagging technique build many
bootstrap samples LB which are replications of initial learning set L but with
(4)
(5)
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-696-
replacement, each (Xk, Yk) (k = 1, 2, , n) may repeat many times in each bootstrap
sample LB. To build a decision tree of random forest, m features (variables) was
chosen randomly from each bootstrap sample LB which has n features (m<n) to build a
decision tree, then random forest algorithm choose best split variable among m
selected features and split function, then split node to two children nodes [10]. Internal
nodes t have binary spit st which use variable Xk to apply to incoming data, divide into
two subsets of data correspond with two children trees tL and tR. To make best split
then algorithm need to choose best split st which maximize the impurity decrease [11]
(6)
where is impurity measurement such as Gini index, Shannon entropy, Nt,
NtL, NtR are number of variables at node t, number of variable of left child and right
child of node t respectively, = NtL / Nt and = NtR / Nt.
3. RESULTS
Data used in this paper have been collected from 80 construction projects in
Vietnam. The survey forms for the internal audit includes 46 criterion such as
described in table 1. From the data collected, we quantify these criterion to form the
matrix with score from 0 to 1. In this data set, each row presents data from one project
and each column presents a criteria. These criterion have brought to the input of neural
network. The neural network structure here is designated with three layers, two hidden
layers and an output layer. The number of nodes in each layer is selected by
experiment, we used twelve nodes for the first hidden layer, eight nodes for second
hidden layer and three nodes for the output layer, corresponding three levels of audit
risk, low, medium and high, respectively. Activation function for each hidden layer
was rectified linear unit (ReLU), and sigmoid function for the output layer.
Table 1. List of criterion for internal control
1. Ability and quality of the project management unit (PMU) director 28 criterions
2. Risk management process 2 criterions
3. Information, report 6 criterions
4. Control operations 4 criterions
5. Evaluation and Monitor (E & M) 6 criterions
The data set was divided into two parts, 70% for training and 30% for testing for
two machine learning methods. The programs were written using Python with keras
backend tensorflow support GPU run on the computer Core i7, RAM 8GB. The
training result of neural network is shown in figure 3.
Figure 3 indicates that the values of lost function in both testing set and training
set are equal at the starting point. The values of lost function in both sets tend to be
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-697-
convergent (declining and the declining speed is also slower). However, once the
number of iteration increases gradually, the value of lost function in the training set
will be smaller than that of the testing function as data in the testing set is less than that
of the training set. While the accuracy hardly changes with the two functions, this dues
to a fact that the data set is not sufficient enough to cover all cases. Figure 4 shows that
the accuracy of the testing set is more than 90%, while the accuracy of the training set
is about 97%%. The accuracy of the testing set is higher than training set about 2%
because data in testing set less than data in training set. Therefore, the accuracy of this
model is about 95% to 96%.
Fig. 4. Performance of training and testing process
Fig 5. Confusion matrix for Neural Network (left0of two machine learning
methods (left: neural network, right: Random Forest)
Prediction accuracy are evaluated on the testing set. We evaluate the accuracy of
the methods using the ground truth notion of positive and negative detection. The
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-698-
confu-sion matrix for two methods neural network and random forest is shown in
figure 5. The accuracy of the method will be calculated as the percentage of correctly
classified samples compared with the total number of samples.
where TP is true positive, TN is true negative, FP is false positive, FN is false
negative.
Base on the matrix of neural network, we can see that 17 samples of high risk, 9
samples of medium risk and 4 samples of low risk were classified correctly. Similarity,
in the random forest, 12 samples of high risk, 10 samples of medium risk and no
sample of low risk were classified correctly. The overall is 94% accuracy for neural
network and 60% accuracy for random forest.
4. CONCLUSION
This paper proposed two machine learning methods to detect the audit risk in
construction projects. By quantifying the criterion survey, these variables can be used
as inputs to the neural network and random forest to train the model which can be used
to detect the audit risk in any new project. The experimental results show the
efficiency of the neural network method. This method can be applied to information
system to quickly detect the audit risk and also recur the work load for auditors. This
method can be applied to detect risk and can serve as a framework to identify risk in a
comprehensive manner for construction projects.
REFERENCES
[1]. Bahrammirzaee A., A Comparative Survey of Artificial Intelligence Applications
in Finance: Artificial Neural Networks, Expert System and Hybrid Intelligent Systems.
Neural Computing & Applications, Vol. 19 No. 8, pp.1165-1195, 2010.
[2]. Bahrammirzaee A., Ghatari A., Ahmadi P., and Madani K., Hybrid Credit
Ranking Intelligent System Using Expert System and Artificial Neural Networks.
Applied Intelligence, Vol. 34 No.1, pp. 28-46, 2011.
[3]. Kashman A., Neural Networks for Credit Risk Evaluation: Investigation of
Different Neural Models and Learning Schemes, Expert Systems with Applications,
Vol. 37 No.9, pp. 6233-6239, 2010.
[4]. Glower S. M., and Romney M. B., The Next Generation. Internal Auditor
55(August): 47-53, 1998.
[5]. Eija Koskivaara, Artificial Neural Networks in Auditing: State of the Art, TUCS
Technical Report No 509, 2003.
[6]. Qeethara K. A-Shayea, And Ghaleb A. E-Refae, Evaluating Credit Risk Using
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải
-699-
Artificial Neural Networks, Global Engineers & Technologist Review, Vol. 1 No.1,
2011.
[7]. Phuong Thao Cao, Hoang Tung Nguyen and Thi Hau Nguyen, Construction
Auditing Risk Detection Using Neural Network, Science, Engineering & Education, 4,
(1), pp. 39-44, 2019.
[8]. Ripley B. D., Pattern Recognition and Neural Networks, Cambridge
University Press, UK, 1996.
[9]. Ian Nabney, 'Netlab: Algorithms for Pattern Recognition', Advances in Pattern
Recognition, Springer, 2004.
[10]. Hastie, Trevor & Tibshirani, Robert & Friedman, Jerome. The Elements Of
Statistical Learning. Aug, Springer. 1. 10.1007/978-0-387-21606-5_7, 2001.
[11]. Criminisi, A. & Shotton, J. & Konukoglu, Ender. (2012). Decision forests: A
unified framework for classification, regression, density estimation, manifold learning
and semi-supervised learning. Foundations and Trends in Computer Graphics and
Vision. 7. 81-227.

File đính kèm:

construction_auditing_risk_detection_using_machine_learning.pdf