Construction auditing risk detection using machine learning approaches

Tóm tắt Construction auditing risk detection using machine learning approaches: ...een prevented, detected or corrected by internal audit. Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải -693- Fig 1. Audit risk detection process Based on the internal control system description, the auditor will assess whether this system is effective or not, wh...alues (2)ka (2) (2) (2) 1 , 1, , M k kj j k j a w z b k c = = + = where c is the number of outputs. These values are then passed through the output layer to produce output values , 1..ky k c= . There are several forms of activation functions. For the classification purpose, we con...he first hidden layer, eight nodes for second hidden layer and three nodes for the output layer, corresponding three levels of audit risk, low, medium and high, respectively. Activation function for each hidden layer was rectified linear unit (ReLU), and sigmoid function for the output layer. ...

pdf9 trang | Chia sẻ: Tài Phú | Ngày: 19/02/2024 | Lượt xem: 147 | Lượt tải: 0download
Nội dung tài liệu Construction auditing risk detection using machine learning approaches, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-691- 
CONSTRUCTION AUDITING RISK DETECTION USING MACHINE 
LEARNING APPROACHES 
Cao Phuong Thao1* 
1 University of Transport and Communications, No. 3 Cau Giay, Hanoi 
* Corresponding author: Email: thaocp@utc.edu.vn 
Abstract. Audit report plays a key role in determining the validity of final accounting 
in the completion of any construction project. However, the quality of reports depends 
heavily on the quality of the auditors themselves, whose variety of skill set and bias 
level could lead to different assessment outcome of the accounting risk level. This 
paper presents a method that automatically detects auditing risk using machine 
learning approaches. The criteria to assess auditing risks will serve as inputs in the 
machine learning algorithms, and the output will be the ranking of low, medium, high 
level of auditing risk. The proposed two machine learning methods was tested on 80 
construction projects in Vietnam and the result shows the high accuracy level of this 
method in auditing risk detection. 
Keywords: auditing, audit risk detection, neural network, random forest, machine 
learning. 
1. INTRODUCTION 
The purpose of the audit is to examine and verify the truthfulness of the financial 
statements provided by the accountant, thereby providing the most accurate 
information about the financial situation of the organization. The final product of an 
audit is a report express the auditor’s opinion about the truthfulness and fairness of the 
financial statements as produced by the accountants. To do this, the auditor performs a 
survey to the company or project management unit to see if the internal control system 
follows the process properly. The assessment of this audit risk depends heavily on the 
subjective opinion of the auditor. Therefore, if we can build an automated risk 
assessment system based on objective criteria then the assessment of risk would 
happen more quickly and accurately. 
Recently, artificial intelligence has been applied in many fields such as financial 
services, image processing, medical, natural language processing, text mining and 
many others [1, 2, 3]. In [1], Bahrammirzaee (2010) had reviewed three artificial 
intelligence methods applied to financial market. In another research, the same author 
proposed the hybrid intelligent system for credit ranking using reasoning 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-692- 
transformational models [2]. In this method, the expert system is considered as 
symbolic module and artificial neural network is considered as non-symbolic module. 
In [3], Kasman (2010) proposed the method using neural network with back 
propagation learning algorithm to evaluate credit risk. Information technology had 
employed in auditing and categorized the system into five groups according to audit 
areas: data extraction and analysis, fraud detection, internal control evaluation, 
electronic commerce control, and continuous monitoring [4, 5]. In [6] neural network 
has proposed to classify the credit risk into good vs. bad consumer groups for the 
bank. Recently, Cao et al. [7] presented a method of neural network to detect the audit 
risk in three level low, medium and high level. 
Although there are many research works that apply artificial intelligence in audit 
and financial, these methods focus on evaluating audit risk in company, where there is 
statistical & historical audit data. In this paper, we present a method of audit risk 
detection in construction project using two methods of machine learning inclusing 
neural network and random forest. The difference between audits in construction 
project vs. other audit projects is that construction projects usually have short 
execution time and the audit is implemented when the project is finished. Criteria to 
assess audit risk will provide the inputs for a multi-layer perceptron neural network as 
well as random forest and the output are the three level of risk include low, medium, 
and high of audit risk. We test the method using data from 80 construction projects in 
Vietnam. The experimental results show the efficiency of the method. In the first 
section, we describe the audit process and how neural network and random forest is 
applied to audit risk detection. In the next section, real data will be used to illustrate 
the performance of the method. Finally, we draw conclusion of the study and 
implication for future work. 
2. MAIN CONTENT 
8.1. Auditing risk assessment 
Risk is a problem arises in all fields, each field has to develop its own unique 
ways of assessing and handling this problem. In construction project, auditing risk is 
associated with important errors in the final project settlement report. The final project 
settlement report is very important as its job is to assess how comprehensive and 
relevant are the samples that the auditor selects, how convincing the evidence 
collected by the auditor, whether the project is complied with the law, or at point the 
project is not in continuous operation, etc. To control this risk, the audit plan must be 
appropriately established to detect fraud, risks, and potential problems and also ensure 
that the audit is completed on time. Moreover, the auditor must consider and assess all 
kinds of risk in construction project including potential risks, control risks and 
detection risks with confidence sample. The audit risk assessment process is shown in 
figure 1. This paper focuses on control risk, which is the possibility of errors occurred 
that have not been prevented, detected or corrected by internal audit. 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-693- 
Fig 1. Audit risk detection process 
Based on the internal control system description, the auditor will assess whether 
this system is effective or not, what risks that could occur and how the enterprise could 
overcome the risk at some sensitive points. To assess the internal control system, 
several factors need to be considered such as model, operating framework and ability 
of the project management unit; financial management and accountancy; works related 
to policy changes; existed findings from previous audits; errors in planning strategies; 
weakness in management that leads to inadequate investment, slow progress, 
outstanding investment cost, falling to meet the objective, and environmental impact 
caused by the project. An assessment of control risk is to check the information of 
internal control system of the project management unit such as diagram of 
organizational structure, level of staffs, internal management documents, internal audit 
works. Also, the auditor needs to observe activity of the unit and discuss with 
managers and employees to understand the organizational characteristic, personnel 
policies, qualification of the managers and employees. To do this, auditor designs the 
survey form with 46 criteria, divided into five groups. They are Ability and quality of 
the project management unit (PMU) director, Risk management process, 
Information/report, Control operations, and Evaluation and Monitor (E & M). The 
detail of the criterions are shown in [7]. 
These criterions will be quantification by coding in range from 0 to 1. These 
values will be the inputs of neural network. Outputs of neural network are risks, 
measured on three levels of audit risk as low, medium and high. 
8.2. Neural network 
An Artificial Neural Network (ANN) is a computational model that simulates 
biological neurons and functions in the brain. Typically, an ANN has layers of 
interconnected nodes. The nodes and their inter-connections are similar to the network 
of neurons in the brain. Any basic ANN will always have multiple layers of nodes, 
specific connection patterns and links between the layers, connection weights and 
activation functions for the nodes that convert weighted inputs to outputs. The learning 
process for the network typically involves a cost function and the objective is to 
optimize the cost function (typically minimize the cost). The weights keep getting 
updated in the process of learning. 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-694- 
For the audit risk detection, we have considered them as classification problem. In 
this paper, we use a multiple outputs three-layer structure of multilayer perceptron 
(MLP) neural network. Although this classifier needs quite large training time but it is 
able to process data and classification fast [8]. Figure 2 presents an example of MLP 
structure which consists of one input layer, one hidden layer and one output layer [9]. 
Fig. 2. Two layer feed-forward neural network 
Let 
ix , i = 1..d is the input value to the network, the output forms M linear 
combinations of these inputs to (1)ja as: 
(1) (1) (1)
1
, 1, ,
d
j ji i j
i
a w x b j M
=
= + = 
where w ji are element of the weight matrix and jb are the bias parameters 
associated with the hidden unit. Also, each variable aj was associated with each hidden 
unit and then transformed by the non-linear activation functions of the hidden layer. 
The output of the hidden units are then given by 
(1)tanh( ), 1, ,j jz a j M= = 
The jz are then combined with weights and biases of the next layer to produce 
values (2)ka 
(2) (2) (2)
1
, 1, ,
M
k kj j k
j
a w z b k c
=
= + = 
where c is the number of outputs. 
These values are then passed through the output layer to produce output values 
, 1..ky k c= . There are several forms of activation functions. For the classification 
purpose, we consider the logistic sigmoidal activation functions as follow: 
(1) 
(2) 
(3) 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-695- 
(2)
1
1 exp( )
k
k
y
a
=
+ −
The network need to train to model the data in order to make a best predictions of 
new input data. In this paper we consider the back propagation algorithm [8]. Assume 
we have the target vector t for input data x, the error of the network, E, is defined as: 
1 1
1
( )
2
N c
n n
k k
n k
E y t
= =
= − 
Where nky is the actual value of k
th output unit for the nth input pattern, nkt is 
desired value of the kth output unit for the nth input pattern. 
The derivative of E with respect to the second layer weights are given detail in 
[7]. 
The difference between the calculated output and the desired output is back-
propagated to the previous layers, usually modified by the derivative of the activate 
function, and the connection weights are normally adjusted using the Delta Rule. This 
process proceeds for the previous layers until the input layer is reached. 
8.3. Random Forest 
The Random Forest Classifier is a set of decision trees from randomly selected 
subset of training set. It aggregates the votes from different decision trees to decide the 
final class of the test object. The figure 3 describes the diagram of the Random Forest. 
Fig. 3. The diagram of the Random Forest 
Each individual tree in the random forest spits out a class prediction and the 
class with the most votes becomes our model’s prediction. There are two phrases in 
RF process, which are training and testing phrase. During training phrase, each trees of 
random forests are built randomly using bagging. Bagging technique build many 
bootstrap samples LB which are replications of initial learning set L but with 
(4) 
(5) 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-696- 
replacement, each (Xk, Yk) (k = 1, 2, , n) may repeat many times in each bootstrap 
sample LB. To build a decision tree of random forest, m features (variables) was 
chosen randomly from each bootstrap sample LB which has n features (m<n) to build a 
decision tree, then random forest algorithm choose best split variable among m 
selected features and split function, then split node to two children nodes [10]. Internal 
nodes t have binary spit st which use variable Xk to apply to incoming data, divide into 
two subsets of data correspond with two children trees tL and tR. To make best split 
then algorithm need to choose best split st which maximize the impurity decrease [11] 
 (6) 
where is impurity measurement such as Gini index, Shannon entropy, Nt, 
NtL, NtR are number of variables at node t, number of variable of left child and right 
child of node t respectively, = NtL / Nt and = NtR / Nt. 
3. RESULTS 
Data used in this paper have been collected from 80 construction projects in 
Vietnam. The survey forms for the internal audit includes 46 criterion such as 
described in table 1. From the data collected, we quantify these criterion to form the 
matrix with score from 0 to 1. In this data set, each row presents data from one project 
and each column presents a criteria. These criterion have brought to the input of neural 
network. The neural network structure here is designated with three layers, two hidden 
layers and an output layer. The number of nodes in each layer is selected by 
experiment, we used twelve nodes for the first hidden layer, eight nodes for second 
hidden layer and three nodes for the output layer, corresponding three levels of audit 
risk, low, medium and high, respectively. Activation function for each hidden layer 
was rectified linear unit (ReLU), and sigmoid function for the output layer. 
Table 1. List of criterion for internal control 
1. Ability and quality of the project management unit (PMU) director 28 criterions 
2. Risk management process 2 criterions 
3. Information, report 6 criterions 
4. Control operations 4 criterions 
5. Evaluation and Monitor (E & M) 6 criterions 
The data set was divided into two parts, 70% for training and 30% for testing for 
two machine learning methods. The programs were written using Python with keras 
backend tensorflow support GPU run on the computer Core i7, RAM 8GB. The 
training result of neural network is shown in figure 3. 
Figure 3 indicates that the values of lost function in both testing set and training 
set are equal at the starting point. The values of lost function in both sets tend to be 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-697- 
convergent (declining and the declining speed is also slower). However, once the 
number of iteration increases gradually, the value of lost function in the training set 
will be smaller than that of the testing function as data in the testing set is less than that 
of the training set. While the accuracy hardly changes with the two functions, this dues 
to a fact that the data set is not sufficient enough to cover all cases. Figure 4 shows that 
the accuracy of the testing set is more than 90%, while the accuracy of the training set 
is about 97%%. The accuracy of the testing set is higher than training set about 2% 
because data in testing set less than data in training set. Therefore, the accuracy of this 
model is about 95% to 96%. 
Fig. 4. Performance of training and testing process 
Fig 5. Confusion matrix for Neural Network (left0of two machine learning 
methods (left: neural network, right: Random Forest) 
Prediction accuracy are evaluated on the testing set. We evaluate the accuracy of 
the methods using the ground truth notion of positive and negative detection. The 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-698- 
confu-sion matrix for two methods neural network and random forest is shown in 
figure 5. The accuracy of the method will be calculated as the percentage of correctly 
classified samples compared with the total number of samples. 
where TP is true positive, TN is true negative, FP is false positive, FN is false 
negative. 
Base on the matrix of neural network, we can see that 17 samples of high risk, 9 
samples of medium risk and 4 samples of low risk were classified correctly. Similarity, 
in the random forest, 12 samples of high risk, 10 samples of medium risk and no 
sample of low risk were classified correctly. The overall is 94% accuracy for neural 
network and 60% accuracy for random forest. 
4. CONCLUSION 
This paper proposed two machine learning methods to detect the audit risk in 
construction projects. By quantifying the criterion survey, these variables can be used 
as inputs to the neural network and random forest to train the model which can be used 
to detect the audit risk in any new project. The experimental results show the 
efficiency of the neural network method. This method can be applied to information 
system to quickly detect the audit risk and also recur the work load for auditors. This 
method can be applied to detect risk and can serve as a framework to identify risk in a 
comprehensive manner for construction projects. 
REFERENCES 
[1]. Bahrammirzaee A., A Comparative Survey of Artificial Intelligence Applications 
in Finance: Artificial Neural Networks, Expert System and Hybrid Intelligent Systems. 
Neural Computing & Applications, Vol. 19 No. 8, pp.1165-1195, 2010. 
[2]. Bahrammirzaee A., Ghatari A., Ahmadi P., and Madani K., Hybrid Credit 
Ranking Intelligent System Using Expert System and Artificial Neural Networks. 
Applied Intelligence, Vol. 34 No.1, pp. 28-46, 2011. 
[3]. Kashman A., Neural Networks for Credit Risk Evaluation: Investigation of 
Different Neural Models and Learning Schemes, Expert Systems with Applications, 
Vol. 37 No.9, pp. 6233-6239, 2010. 
[4]. Glower S. M., and Romney M. B., The Next Generation. Internal Auditor 
55(August): 47-53, 1998. 
[5]. Eija Koskivaara, Artificial Neural Networks in Auditing: State of the Art, TUCS 
Technical Report No 509, 2003. 
[6]. Qeethara K. A-Shayea, And Ghaleb A. E-Refae, Evaluating Credit Risk Using 
Hội nghị Khoa học công nghệ lần thứ XXII Trường Đại học Giao thông vận tải 
-699- 
Artificial Neural Networks, Global Engineers & Technologist Review, Vol. 1 No.1, 
2011. 
[7]. Phuong Thao Cao, Hoang Tung Nguyen and Thi Hau Nguyen, Construction 
Auditing Risk Detection Using Neural Network, Science, Engineering & Education, 4, 
(1), pp. 39-44, 2019. 
[8]. Ripley B. D., Pattern Recognition and Neural Networks, Cambridge 
University Press, UK, 1996. 
[9]. Ian Nabney, 'Netlab: Algorithms for Pattern Recognition', Advances in Pattern 
Recognition, Springer, 2004. 
[10]. Hastie, Trevor & Tibshirani, Robert & Friedman, Jerome. The Elements Of 
Statistical Learning. Aug, Springer. 1. 10.1007/978-0-387-21606-5_7, 2001. 
 [11]. Criminisi, A. & Shotton, J. & Konukoglu, Ender. (2012). Decision forests: A 
unified framework for classification, regression, density estimation, manifold learning 
and semi-supervised learning. Foundations and Trends in Computer Graphics and 
Vision. 7. 81-227. 

File đính kèm:

  • pdfconstruction_auditing_risk_detection_using_machine_learning.pdf