Akaike Information Criterium (AIC) in model selection

Data analysis often requires selection over several possible models, that could fit the data. With noisy data, a more complex model gives better fit to the data (smaller sum-of-squares, SS) than less complex model. If only SS would be used to select the model that best fits the data, we would conclude that a very complex model that fits every noise peak is the best. Therefore, the model complexity needs to be taken into account in model selection. Akaike Information Criterium is a commonly used method for model comparison, and conclusions from other model selection methods are usually the same.

In the special case of sum-of-squares optimization, the basic AIC formula is expressed as:

${AIC} = {n} \times {ln{SS \over {n}}} + 2 \times k$

, where n is the number of observations (for example PET frames), k is the number of estimated parameters in the model (excluding fixed parameters), and SS is the sum-of-squares, Σe_i² (where e_i are the estimated residuals). Although the AIC formula appears to be very simple, its derivation is well founded on information theory, and the penalty term 2×k is not just an arbitrary value (Burnham and Anderson, 1998).

Information criterion is not a null hypothesis test: do not use terms like "(not) significant" or "rejected" in reporting results.
Based on information criteria, you must not test whether one model is "significantly" better than another model.

When sample size n is small compared to the number of parameters (n/k < 40, that is, almost always in PET data analysis) the use of a second-order corrected AIC (AICc) is recommended (Burnham and Anderson, 1998):

${AICc} = AIC + {{ 2 \times k \times (k+1) } \over {n-k-1}}$

SS is in the square of the units of the measured data. Therefore, the AIC is on a relative scale, and it is critical to compute and present the AIC differences (ΔAIC), instead of AIC or AICc values, over candidate models (Burnham and Anderson, 1998; Motulsky and Christopoulos, 2004). Define A to be a simpler model and B to be a more complicated model (k_A<k_B). The difference in AIC is:

${\Delta AIC} = n \times ln({{SS_B}\over{SS_A}}) + 2 \times (k_B - k_A)$

, and the difference in AICc is:

$\begin{eqnarray} {\Delta AICc} & = & n \times { \left( ln({SS_B\over n}) - ln({SS_A\over n}) \right) } + 2 \times (k_B - k_A) \\ & + & 2 \times \left({{k_B(k_B+1)}\over{n-k_B-1}} - {{k_A(k_A+1)}\over{n-k_A-1}}\right) \nonumber \end{eqnarray}$

Equations (3) and (4) can be used only after both models A and B are fitted to the data. A more practical approach may be to calculate the AICs separately for each model fit, and later calculate the difference simply as:

${\Delta AIC} = AIC_B - AIC_A$

, since, based on the properties of the logarithm, Eq (1) can also be written as

$AIC = n \times ln(SS) - n \times ln(n) + 2 \times k$

and then

$\begin{eqnarray} {AIC_B - AIC_A} & = & n \times (ln(SS_B) - ln(SS_A)) + 2 \times k_B - 2 \times k_A \\ & = & n \times ln(SS_B/SS_A) + 2 \times (k_B-k_A) \nonumber \end{eqnarray}$

ΔAIC or ΔAICc should calculated related to the smallest AIC or AICc, so that the best model will have ΔAIC = 0 (Burnham and Anderson, 2004). Although original AIC values may be very large compared to the differences, that does not mean that the difference would not be important; only the differences in AIC are interpretable as to the strength of evidence (Burnham and Anderson, 2004). The transformation exp(-ΔAIC/2) provides the likelihood of the model (Akaike, 1981; Burnham and Anderson, 2004).

Models can only be compared using information criteria when they have been fitted to exactly the same set of data with the same weights.

MSC

Model selection criterion (MSC) is a reciprocal modification of Akaike information criterion, used in Scientist software (MicroMath, Sant Louis, Missouri, USA). MSC is independent on the magnitude (scaling) of the data. Larger MSC means better fit, and when comparing models the most appropriate model for the data is that with the largest MSC.

${MSC} = ln \left({ \sum\limits_{i=1}^n w_i [ C_{PET}(t_i) - {\bar{C}}_{PET} ]^2 } \\ \over{ \sum\limits_{i=1}^n w_i [ C_{PET}(t_i) - C_{SIM}(t_i) ]^2 }\right) - \frac{2 p}{k}$

, where w_i are the weights for each data sample (i), C_SIM are the predicted (simulated, fitted) values, C_PET are the measured values, and C_PET is the mean of measured values.

Comparison of AIC to other tests

AIC has been reported to find the "true" model more reliably than for example F-test (Glatting et al, 2007; Kletting et al, 2009a). Compared to F-test, AIC has the advantage of being suited both for nested and non-nested models. Whether F-test tends to choose more complex or simple models than AIC depends on the selected α value. Glatting and Kletting conclude that AIC is effective and efficient approach.

Golla et al (2017) compared five model selection criteria (AIC, AICc, MSC, Schwartz Criterion, and F-test) on data from six PET tracers, and noted that all methods resulted in similar conclusions.

Bayesian information criterion (BIC) is another sometimes used method. BIC is not related to information theory, despite its name, and use of AICc over BIC is recommended by Burnham and Anderson (2004).

Literature

Akaike H. Likelihood of a model and information criteria. J Econometrics 1981; 16(1): 3-14. doi: 10.1016/0304-4076(81)90071-3.

Alves IL, García DV, Parente A, Doorduin J, Dierckx R, Marques da Silva AM, Koole M, Willemsen A, Boellaard R. Pharmacokinetic modeling of [¹¹C]flumazenil kinetics in the rat brain. EJNMMI Res. 2017; 7:17. doi: 10.1186/s13550-017-0265-4.

Bonate PL: Pharmacokinetic-Pharmacodynamic Modeling and Simulation, 2nd ed., Springer, 2011. doi: 10.1007/978-1-4419-9485-1.

Burnham KP and Anderson DR. Model Selection and Inference: A Practical Information-Theoretical Approach. Springer, 1998. ISBN 978-1-4757-2917-7. doi: 10.1007/978-1-4757-2917-7.

Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res. 2004; 33(2): 261-304. doi: 10.1177/0049124104268644.

Forster: Key Concepts in Model Selection: Performance and Generalizability. J Math Psychol. 44, 205-231, 2000. doi: 10.1006/jmps.1999.1284.

Glatting G, Kletting P, Reske SN, Hohl K, Ring C: Choosing the optimal fit function: Comparison of the Akaike information criterion and the F-test. Med Phys. 34(11): 4285-92, 2007. doi: 10.1118/1.2794176.

Golla SSV, Adriaanse SM, Yaqub M, Windhorst AD, Lammertsma AA, van Berckel BNM, Boellaard R. Model selection criteria for dynamic brain PET studies. EJNMMI Phys. 2017; 4: 30. doi: 10.1186/s40658-017-0197-0.

Kletting P, Glatting G: Model selection for time-activity curves: The corrected Akaike information criterion and the F-test Z Med Phys. 19: 200-206, 2009a. doi: 10.1016/j.zemedi.2009.05.003.

Kletting P, Kull T, Reske SN, Glatting G: Comparing time activity curves using the Akaike information criterion. Phys Med Biol. 54: N501-N507, 2009b. doi: 10.1088/0031-9155/54/21/N01

Motulsky H, Christopoulos A. 2004. Fitting Models to Biological Data Using Linear and Nonlinear Regression. Oxford University Press, NY. ISBN: 9780195171792.

Turkheimer, Hinz, Cunningham: On the undecidability among kinetic models: from model selection to model averaging. J Cereb Blood Flow Metab., 23:490-498, 2003. doi: 10.1097/01.WCB.0000050065.57184.BB.

Zhou Y, Aston JAD, Johansen AM. Bayesian model comparison for compartmental models with applications in positron emission tomography. J Appl Statistics 2013; 40(5): 993-1016. doi: 10.1080/02664763.2013.772569.

Tags: Modeling, Compartmental model, Fitting, Validation, Statistics

Updated at: 2021-02-23
Created at: 2010-09-07
Written by: Harri Merisaari, Jambor I, Lars Jødal, Vesa Oikonen

Akaike Information Criterium (AIC) in model selection

MSC

Comparison of AIC to other tests

See also:

Literature