I've learned something new today! Do you have any links where this guidance is written or other models described in this way?
I wouldn't describe it as guidance, just the way people build statistical models, but I can give you another example. I'm already worried about how long this might be :-(
To be topical, let's use an example of how big a difference is there in the risk of dying from CV-19 in the UK compared to other countries.
You start by framing the question that should answer the question posed, e,g For someone alive on the 1st January 2020 what is the risk of them being dead on the 31st December 2020 from Covid-19.
You are interested in knowing how people have "responded" to all the things that might have affected their mortality during that defined period. But first you need to know how to measure the "thing" you are interested in knowing about, and in this case, its "is a person dead or alive". This then gives us the "Response variable" for the model, which we can call mortality, which it goes on the left hand side of the model and contains a variable that can be either "yes" or "no"
On the right-hand side of the model, you add the "explanatory variables". These are explanatory variables because they explain the response recorded in the response variable.
This gives a model structure of:
Response Variable = Explanatory variable 1 + Explanatory variable 2 + explanatory variable 3.. + error.
The error is the bit of what is happening to the response variable that the model can't explain. So these are things that are affecting the mortality but you haven't included in the model. Normally things you simply don't know about. e.g. it may be that having red hair (possibly as a genetic indicator) is a good predictor of mortality, but you didn't think of it, so ts not included in the model, and becomes rolled up with all the other unknowns contained in the "error" term.
Deciding what explanatory variables are included in the model (and how they are measured) is the crucial starting point for a model.
But lets' say we end up with a model in the form of:
Mortaility = Country + age + sex + ethnic group +. covid19 + underlying condition
So now you need to collect the data for these variables. For every country, you will need to get data for a sample of people, and for every person, in the sample, you will need to get the information required by the model, e.g there age on the age on the 1st January 2020, their sex etc, and this data is then fed into the statistical software to run the model.
The linear models that I am familiar with, essentially look at the strength of correlations between each of the explanatory variables to the response variable and give you a number (the coefficient) that quantifies that relationship. And in this case, you can add up the results from the explanatory variables to give a "probability of being dead"
So to make it simple I will assume that a probability of 100% is definitely being dead (percentages aren't probability but its easier to work with).
For the age variable in the model, you might get a coefficient of 0.1%. This means that for every year someone gets older there risk of being dead in that 12 month period increases by 0.1%
For sex, you might have coefficient of 3, meaning that men have a 3% greater risk of being dead than women.
If you had tested positive for COVID, that might increase your risks by 15% (coefficient of 15)
If you had an underlying condition this might increase your risks by 35% (coefficient of 35)
Each country and each ethnic group would have its own coefficients generated by the model
So if you want to work out the risk of 65 year old man, from a specific country and ethnic group, without an underlying health condition, but who had tested positive for COVID, you would add up the results.
Mortaility = Country + age + sex + ethnic group +. covid19 + underlying condition, which becomes
risk of being dead = 2 (UK) + 0.1x 65 (age) + 3(sex) + 0,2 (ethnic group) + 15(covid positive) + 0 (underlying condition)
So 2+6.5+3+0.2+15. = 26.7
This means there is about a 30% risk that a person selected randomly from the UK population with these characteristics, will be dead at the end of the 12 month period and 70% chance (risk) of him being alive.
The model gives confidence intervals around these numbers (measures of uncertainty about the results), so in practice, you would end up with numbers that look more like, a 15% to 45% chance of being dead and a 35% to 85% chance of being alive.
But you can also compare the coefficients of individual variables e.g the country variable. So if the coefficient for the UK was 10 and the coefficient for Italy was 5. You could conclude that the risk of dying in the UK is double the risk of dying in Italy, after you take into account the effect of age, sex, ethnic group, underlying condition and whether a person tested positive for COVID 19.
This is also an opportunity to misrepresent the stats. If you look at the example above of doubling from 5 to 10, and look at the scale of the other coefficients, then this is an important difference between countries.
However, if the difference went from 0.005 to 0.01, this would still be doubling, but the effect is so small from either country, that you could safely disregard the differences between countries, and conclude the country you live in isn't that important in terms of living or dying from COVID-19.
I hope that makes sense, and maybe answers your question, but get back to me if all I've done is confuse you.