When fitting a model to a dataset, one often uses the Chi-Square
metric to validate the goodness of fit and decide how likely the
model may have generated the dataset (or the noise-corrupted
measurements of some quantity). The big assumption here is that
the measurement errors follow a normal (Gaussian) distribution.

The null hypothesis (H0) for such a case is defined as follows:
"the given model [and measurement errors] explains the dataset".
That is, the residuals in the fit are consistent with the measurement
errors and the minimum Chi-Square value, chisq_min, was drawn from a
Chi-Square distribution with N - M degrees of freedom. Here,
N = number of data points being fitted and M = number of parameters
in the fit.

If the probability of obtaining a Chi-Square value > chisq_min
under H0 is tiny (i.e., < some critical value Pcrit), H0 is rejected
and we conclude it's unlikely the model could have generated the
dataset. Therefore, the model is rejected. Alternatively, it could
also mean the measurement uncertainties were simply underestimated
and inflated the achieved chisq_min value.

This would be interpreted by a frequentist as accepting the model
as "correct", i.e., as the one having generated the data (= H0 above),
then asking: how many times would this sequence of fit residuals
(with specific chisq-min value) be exceeded if the dataset were
repeatedly simulated using the model assuming errors drawn from a
normal distribution?   

--
F. Masci, 08/02/2025