# Akaike Information Criterium (*AIC*) in model selection

Data analysis often requires selection over several possible
models, that could
fit the data.
With noisy data, a more complex model gives better fit to the data (smaller
sum-of-squares, *SS*) than less
complex model. If only *SS* would be used to select the model that best fits the data,
we would conclude that a very complex model that fits every noise peak is the best.
Therefore, the model complexity needs to be taken into account in model selection.

Akaike Information Criterium is a commonly used method for model comparison.
Golla et al (2017) compared five model
selection criteria (*AIC*, *AICc*, *MSC*,
Schwartz Criterion, and F-test) on data from six PET tracers, and noted that all methods resulted in
similar conclusions.

In the special case of sum-of-squares optimization, the basic *AIC*
formula is expressed as:

, where *n* is the number of observations (for example PET frames), *k* is the
number of estimated parameters in the model (excluding fixed parameters), and *SS* is
the sum-of-squares, *Σe _{i}^{2}* (where

*e*are the estimated residuals). Although the

_{i}*AIC*formula appears to be very simple, its derivation is well founded on information theory, and the penalty term

*2×k*is not just an arbitrary value (Burnham and Anderson, 1998).

When sample size *n* is small compared to the number of parameters (*n/k* < 40,
that is, almost always in PET data analysis) the use of a second-order corrected *AIC*
(*AICc*) is recommended (Burnham and Anderson, 1998):

*SS* is in the square of the units of the measured data. Therefore, the *AIC* is
on a relative scale, and it is critical to compute and present the *AIC* differences
(*ΔAIC*), instead of *AIC* or *AICc* values, over candidate models
(Burnham and Anderson, 1998; Motulsky and Christopoulos, 2004).
Define A to be a simpler model and B to be a more complicated model
(*k _{A}<k_{B}*). The difference in

*AIC*is:

, and the difference in *AICc* is:

Equations (3) and (4) can be used only after both models A and B are
fitted to the data. A more practical approach may
be to calculate the *AIC*s separately for each model fit, and later calculate the difference
simply as:

, since, based on the properties of the logarithm, Eq (1) can also be written as

and then

*ΔAIC* or *ΔAICc* should calculated related to the smallest
*AIC* or *AICc*, so that the best model will have *ΔAIC* = 0
(Burnham and Anderson, 2004).
Although original *AIC* values may be very large compared to
the differences, that does not mean that the difference would not be important;
**only** the differences in *AIC* are interpretable as to the strength of
evidence (Burnham and Anderson, 2004).
The transformation *exp(-ΔAIC/2)* provides the likelihood of the model
(Akaike, 1981; Burnham and Anderson, 2004).

Information criterion is not a null hypothesis test: do not use terms like “(not) significant” or “rejected” in reporting results.

Based on information criteria, you must not test whether one model is “significantly” better than another model.

### Comparison to F-test and *BIC*

*AIC*has been reported to find the “true” model more reliably than for example F-test (Glatting et al, 2007; Kletting et al, 2009a). Compared to F-test,

*AIC*has the advantage of being suited both for nested and non-nested models. Whether F-test tends to choose more complex or simple models than

*AIC*depends on the selected α value. Glatting and Kletting conclude that

*AIC*is effective and efficient approach. Another method, Bayesian information criterion (

*BIC*), is not related to information theory, despite its name, and use of

*AICc*over

*BIC*is recommended by Burnham and Anderson (2004).

Models can only be compared using information criteria when they have been fitted to exactly the same set of data with the same weights.

## MSC

Model selection criterion (*MSC*) is a reciprocal modification of Akaike information
criterion, used in *Scientist* software (MicroMath, Sant Louis, Missouri, USA).
*MSC* is independent on the magnitude (scaling) of the data.
Larger *MSC* means better fit, and when comparing models the most appropriate model for
the data is that with the largest *MSC*.

, where *w _{i}* are the weights for each data sample (

*i*),

*C*are the predicted (simulated, fitted) values,

_{SIM}*C*are the measured values, and C

_{PET}_{PET}is the mean of measured values.

## See also:

## References

Akaike H. Likelihood of a model and information criteria.
*J Econometrics* 1981; 16(1): 3-14.

Alves IL, García DV, Parente A, Doorduin J, Dierckx R, Marques da Silva AM, Koole M, Willemsen A,
Boellaard R. Pharmacokinetic modeling of [^{11}C]flumazenil kinetics in the rat brain.
*EJNMMI Res.* 2017; 7:17.

Bonate PL: *Pharmacokinetic-Pharmacodynamic Modeling and Simulation*,
2nd ed., Springer, 2011.

Burnham KP and Anderson DR. *Model Selection and Inference: A Practical Information-Theoretical
Approach.* 1998, Springer-Verlag, NY.

Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection.
*Sociol Methods Res.* 2004; 33(2): 261-304.
doi: 10.1177/0049124104268644.

Forster: Key Concepts in Model Selection: Performance and Generalizability.
*J Math Psychol.* 44, 205-231, 2000.
doi: 10.1006/jmps.1999.1284.

Glatting G, Kletting P, Reske SN, Hohl K, Ring C: Choosing the optimal fit function: Comparison
of the Akaike information criterion and the F-test. *Med Phys.* 34(11): 4285-92, 2007.
doi: 10.1118/1.2794176.

Golla SSV, Adriaanse SM, Yaqub M, Windhorst AD, Lammertsma AA, van Berckel BNM, Boellaard R.
Model selection criteria for dynamic brain PET studies. *EJNMMI Phys.* 2017; 4: 30.
doi: 10.1186/s40658-017-0197-0.

Kletting P, Glatting G: Model selection for time-activity curves: The corrected Akaike
information criterion and the F-test *Z Med Phys.* 19: 200-206, 2009a.
doi: 10.1016/j.zemedi.2009.05.003.

Kletting P, Kull T, Reske SN, Glatting G: Comparing time activity curves using the Akaike
information criterion. *Phys Med Biol.* 54: N501-N507, 2009b.
doi: 10.1088/0031-9155/54/21/N01

Motulsky H, Christopoulos A. 2004. *Fitting Models to Biological Data Using Linear and
Nonlinear Regression.* Oxford University Press, NY.

Turkheimer, Hinz, Cunningham: On the undecidability among kinetic models: from model selection
to model averaging. *J Cereb Blood Flow Metab.*, 23:490-498, 2003. doi: 10.1097/01.WCB.0000050065.57184.BB.

Zhou Y, Aston JAD, Johansen AM. Bayesian model comparison for compartmental models with
applications in positron emission tomography. *J Appl Statistics* 2013; 40(5): 993-1016.
doi: 10.1080/02664763.2013.772569.

Tags: Modeling, Compartmental model, Fitting, Validation

Created at: 2010-09-07

Updated at: 2018-12-12

Written by: Harri Merisaari, Jambor I, Lars Jødal, Vesa Oikonen