Comparison among Akaike Information Criterion , Bayesian Information Criterion and Vuong ' s test in Model Selection : A Case Study of Violated Speed Regulation in Taiwan

When doing research scienti c issues, it is very signi cant if our research issues are closely connected to real applications. In reality, when analyzing data in practice, there are frequently several models that can appropriate to the survey data. Hence, it is necessary to have a standard criteria to choose the most e cient model. In this article, our primary interest is to compare and discuss about the criteria for selecting model and its applications. The authors provide approaches and procedures of these methods and apply to the tra c violation data where we look for the most appropriate model among Poisson regression, Zero-in ated Poisson regression and Negative binomial regression to capture between number of violated speed regulations and some factors including distance covered, motorcycle engine and age of respondents by using AIC, BIC and Vuong's test. Based on results on the training, validation and test data set, we nd that the criteria AIC and BIC are more consistent and robust performance in model selection than the Vuong's test. In the present paper, the authors also discuss about advantages and disadvantages of these methods and provide some of suggestions with potential directions in the future research.


Introduction
The model selection criteria is a very crucial eld in statistics, economics and several other areas and it has numerous practical applications.This issue is currently researched theoretically and practically by several statisticians and has gained many attentions in the last two decades, especially in regression and econometric models.There are three most commonly used model selection criteria including Akaike information criterion (AIC), Bayesian information criterion (BIC) and Vuong's test, which are compared and discussed in this paper.AIC is rst proposed by Akaike [1] as a method to compare different models on a given outcome.Meanwhile, BIC is proposed by Schwarz [20], is a criterion for model selection among a nite set of models.Vuong's test has been proposed by Vuong [24] in the literature aiming at selecting a single model c 2019 Journal of Advanced Engineering and Computation (JAEC) regardless of its intended use.All three criteria are the most widespread criteria for choosing model.
Until today, these problems have been studied and utilized in numerous areas.AIC has been researched and applied extensively in literature such as: Snipes et al. [19] employ AIC and present about an example from wine ratings and prices, Taylor et al. [21] introduce indicators of hotel protability: Model selection using AIC, Charkhi et al. [4] research about asymptotic post-selection inference for the AIC, Chang et al. [3] present about Akaike Information Criterion-based conjunctive belief rule base learning for complex system modeling, etc.
In addition, BIC is also utilized extensively in literature for example: Neath et al. [16] introduce about regression and time series model selection using variants of the Schwarz information criterion.Cavanaugh et al. [2] present about generalizing the derivation of the BIC.Weakliem [27] introduce about a critique of the Bayesian information criterion for model selection.Neath et al. [15] present about a Bayesian approach to the multiple comparisons problem.Neath et al. [17] present about the BIC: background, derivation, and applications.Nguefack-Tsague et al. [23] focus on introduce about Bayesian information criterion, etc.
Similarly to AIC and BIC, Vuong's test [24] is also used largely in literature for instance: Clarke [5] employ Vuong's test to introduce a simple distribution-free test for non-nested model selection, Theobald [22] utilize Vuong's test to present a formal test of the theory of universal common ancestry, Lukusa et al. [13] use Vuong's test to evaluate whether the zeroinated Poisson (ZIP) regression model is consistent with the real data, Dale et al. [6] perform model comparison using Vuong's test to estimate of nested and zero-inated ordered probit models, Schneider et al. [18] present about model selection of nested and non-nested item response models using Vuong's test, etc.
Our main objective in this paper is to provide researchers an overview of the criteria in model selection for the trac violation data.The rest of the paper is organized as follows.In Section 2, we present approaches and procedures of the criteria for choosing model including Akaike information criterion (AIC), Bayesian information criterion (BIC) and Vuong's test.In Section 3, these methods are applied to a real data which could help readers to easily assess them.Some of suggestions and some potential directions for the further research are devoted in Section 4. Finally, some conclusions and remarks are given in Section 5.

Some of Criteria for Model Selection
In this section, we present approaches and procedures of ubiquitous methods to choose the most ecient model consisting of Akaike Information Criteria (AIC), Bayesian Information Criterion (BIC) and Vuong's test.

Akaike Information Criteria (AIC)
AIC is rst proposed by Akaike [1] as a method to compare dierent models on a given outcome.
The AIC for candidate model is dened as follows: where K is the number of estimated parameters in the model including the intercept and ( θ|y) is a log-likelihood at its maximum point of the estimated model.The rule of choice: the smaller the value of AIC is, the better the model is.

Bayesian Information Criterion (BIC)
BIC is rst introduced by Schwarz [20], one sometimes calls the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) which is a criterion for model selection among a nite set of models.The BIC for candidate model is dened as follows: where n is a sample size; K is the number of estimated parameters in the model including the intercept and ( θ|y) is the log-likelihood at its maximum point of the estimated model.The rule of selection: the smaller the value of BIC is, the better the model is.The procedure for applying AIC and BIC are given as follows: Step 1: Selecting candidate models which can be tted to the data set.
Step 2: Estimating unknown parameters of models.
Step 3: Finding values of AIC and BIC by using the formulas (1) and (2), respectively.
Step 4: Basing on the rule of choice, one can decide the most suitable model.

Vuong's Test
Vuong's test [24] is one of the ubiquitous criteria for choosing model and it is often used to the data set with no missing values.Let f 1 (Y |X, Z, W ; α 1 ) and f 2 (Y |X, Z, W ; α 2 ) be two non-nested probability models.Let α 1 and α 2 be a consistent estimator of α 1 and α 2 under the model f 1 and f 2 , respectively.Letting hypotheses • H 0 : The two models are equally closed to the true data.
The Vuong's test statistics is provided as follows; (see Mouatassim and Ezzahid [14]): where The detailed calculation of V is provided in Appendix.Note that: is the predicted probability of an observed count for case i from the model j, j = 1, 2, respectively.
• Moreover for the complete case, V can be easily obtained from the package pscl in R language, (Zeileis at el. [28]).
At the signicant level α, the decision rule is given as follows: where Q α/2 is an upper quantile of standard normal distribution at the level α/2.Similar to algorithms for AIC and BIC, to perform Vuong's test, we need to do through following steps: Step 1: Choosing candidate models which can be tted to the data set.
Step 2: Estimating unknown coecients of models.
Step 3: Calculating V by using (3) Step 4: Basing on the rule of choice, one can select the most compatible model.The ZIP model (M 1 ) is composed of two parts separately, where the former is called count model with coecients denoted by β and the latter is the so-called ination model with coecients denoted by γ, see Equation ( 5. ).As can be seen from the Tab.2, all estimated coecients of zero-inated part are statistically signicant at the level 5% thanks to all P-values are less than 0.05.In contrast, in the count model, the Distance-covered (X) and Motorcycle-engine (Z) are not signicant, except the Age (W ).The factor Age aects the number of trac violations for both parts in the sense that if W is increasing and other factors are assumed to be unchanged, then the ex-pected number of violation is denitely reduced and the probability of not violating is clearly increasing since we have β 3 = −0.23536< 0 and γ 3 = 0.19547 > 0, respectively.
For the Poisson regression model (M 2 ) and the Negative binomial regression model (M 3 ), we also see the statistical signicance of estimated coecients based on P-values are very small (≈ 0).The two factors X and Z with positive coecients imply that they increase the incidence rate (see µ in ( 11) and ( 12)) of number of trac violations while W makes it to be decreasing as in the case of ZIP model, see Tab. 3 and 4.
We now turn to discuss which model is better.Based on results represented in the Tab. 5  For AIC and BIC, AIC is very ubiquitous in econometrics, while BIC is more commonly utilized in sociology, see Weakliem [27].It can be seen that, BIC becomes to AIC if K = ln(n).
To see the relationship between formula (1), ( 2), and Vuong's test, the problem is given as follows: Let D is an observed data (a real data).A number of possible models M k for D are considered, with each model having a likelihood function L(D|θ k ; M k ) and θ k are unknown parameters need to be estimated with p k parameters.For simplicity's sake, let (θ k ) = ln[L(D|θ k ; M k )] and θ k be an estimator of θ k by using the maximum likelihood estimate (MLE).Assessment of the candidate models can be carried out as a sequence of comparisons between pairs of models.It is more convenient to consider model M 1 and M 2 .The dierence of two values AIC (resp.BIC) obtained from two certain models can be expressed as follows: and the Vuong's test can be rewritten as: where h 2 (( θ1 , θ2 )) denotes sample variance of the dierence of log-likelihood ( θ1 ) − ( θ2 ).
From this point of view, one may prefer the rst model M 1 than the second model M 2 if ∆AIC, ∆BIC and V are positive values.
AIC is a very widespread formula, thus there are several scholars have researched and improved it by some adjustments.List of modied AIC statistics are given as follows: • First denoted by AICc is the corrected AIC for sample size • Next is the AIC weight of the model M k dened by where R is number of possible candidate models.The AICw(k) is the weight of the evidence of the model M k with respect to other candidate models, i.e. the model has the highest AICw is considered as the strongest model.
• Evidence ratio of the model M k is determined by where AICw best is the AIC weight of the best (true) model.This ratio measures how decisive the evidence in the sense that the model with the smallest ER is the most appropriate model with respect to other candidate models.
Regarding applicability, Vuong's test, Akaike Information Criteria (AIC) and Bayesian Information Criterion (BIC) are only applicable for complete data i.e. no missing values.In several practical applications, some elements in the given data set are usually missing.Hence, these traditional criteria may be no longer suitable for selecting models and if we remove all missing elements, it could lead to the biasness in inferences.Therefore, it is necessary to improve the above formulas with the possibility of dealing with missing data.To the best of our knowledge, no scholar has studied this problem yet.These are potential research directions in the next time.Some of methods to solve this issue are very ubiquitous and prevalent.Little [12] reviewed six methods to solve the missing data problem that are complete-case (CC) analysis, available-case (AC) methods, least squares (LS) on imputed data, maximum likelihood (ML), Bayesian methods and multiple imputation (MI).Zhao and Lipsitz [29] proposed the inverse probability weighting (IPW) method.Wang et al. [26] developed a regression calibration (RC) method.Wang et al. [25] introduced the joint conditional likelihood (JCL) method.In addition, we can combine methods to provide a robust tool to solve this problem.For instance: Han [8] presented multiply robust estimation in regression analysis with missing data where the IPW and MI method are combined together.
About the expansion of above issues, it is similar to the study of regression models, the traditional regression models such as logistic regression model, zero-inated binomial (ZIB) regression model, zero-inated Poisson (ZIP) regression model, etc, coecients cannot be directly estimated if some covariates having missing values.Hence, one needs to have some new approaches to estimate parameters in this situation.For instance, Wang et al. [25] employed the joint conditional likelihood (JCL) estimator in logistic regression with missing covariates data.Hsieh et al. [9] extended method of Wang et al. (2002) to introduce a semiparametric analysis of randomized response data with missing covariates in logistic regression.Lee et al. [11] also extended method in Wang et al. (2002) to present a semiparametric estimation of logistic regression model with missing covariates and outcome.Pho et al. [30] discussed about three ubiquitous approaches to handle the issues having missing data.Diallo et al. [7] introduced an IPW estimator of the parameters of a ZIB regression model with missing-at-random covariates.Lukuasa et al. [13]

1 :
Frequency of respondents (Re) in data set after deleting missing values.
Firstly the data is randomly split into three data sets, namely, training, validation and test with respect to the percentage of 60% − 20% − 20%.This means 60% of the whole data is used to train the three models M i , i = 1, 2, 3, with results as shown in the Tabs.2, 3 and 4, respectively.Next, the validation data which is also randomly extracted by 20% of the full data is then used for selecting the most appropriate model while the remaining test data is to check accuracy when we do a performance of forecast with those models.The criteria AIC, BIC, Vuong's test, mean square error (MSE) and accuracy are respectively computed to each data set and each model for comparisons.
inated Poisson (ZIP) regression model, zeroinated binomial (ZIB) regression model, and zero-inated negative binomial (ZINB) regression model could be more plausible candidates.Y is exhibited in Fig.1(Appendix).As can be observed from the Tab. 1 and the Fig.1that the number of people violating of speed regulations in Taiwan 2007 is very small.The data set contains most of zeros in Y which is usually called zero-inated count data.With this type of data set, some of zero-inated models may be more appropriate than other models.In this section, we investigate three following models: Zero-inated Poisson (ZIP) regression model denoted by M 1 , Poisson regression model called M 2 and M 3 stands for Negative binomial (NB) regression model.The forms of these models are briey given in the Appendix.Our aim is to evaluate which model is more appropriate for modeling between the number of violated speed regulation (Y ) with some factors such as Distance-covered (X), Motorcycle-engine (Z) and the Age of respondents (W ).
and 6, the smallest value AIC and BIC on validation data are respectively 1013.404 and 1033.937 and both are produced by the model M 1 .One can also see this conrmation on the training and test data sets.Hence, the model M 1 (ZIP) is the most plausible model in comparison to the models M 3 and M 2 .However, by Vuong's test results on the validation set, see Tab. 8, it suggests that the model M 1 is more preferable than the model M 2 , but it is equivalent to the model M 3 (P-value = 0.1 > 0.05).This equivalence is also conrmed by the same mean square error M SE = 0.3488 and the same accuracy 90.42% on the validation data, see Tabs. 10 and 11.When checking on the test set, the model M 1 has a slightly better performance with the smallest MSE 0.2811, the greatest accuracy 90.60% and similarly result if using Vuong's test.Our result is consistent to Lukusa et al.It also shows that the information criteria AIC and BIC are more robust than the Vuong's test in model selection.[13].
We reviewed widespread methods for selecting the most ecient model: Vuong's test, Akaike Information Criteria (AIC) and Bayesian Information Criterion (BIC).The approach and procedure of these methods and application to trac violation data are provided step by step.Based on results on the training, validation and test data set, we nd that the criteria AIC and BIC have a more consistent and robust performance in model selection than the Vuong's test in this case.Besides, some advantages and disadvantages of these methods have been discussed and compared in the paper.Furthermore, the authors also suggest some potential research directions in the next time.Fig. 1.Frequency of violations of speed regulations in Taiwan 2007.Count Coe.Estimate Std.Error z value Pr(> |z|) Estimates of the model M 1 (ZIP model).Estimates of the model M 2 (Poisson model).
Tab. 4: Estimates of the model M 3 (NB model).