SCHOOL OF PUBLIC HEALTH & COMMUNITY MEDICINE
MULTIPLE REGRESSION - Dr Surya Raj Niraula
Regression Sir Francis Galton was first coined the term
‘Regression’ i.e. ‘Regression toward the mean’. Ỳ = a + bX |
| b is the slope or gradient often referred as the
regression coefficient The above equation is an estimate of the following equation, which describes the population regression of y on x. Y = ß0 + ß1X1 + Є ß0 = y-axis intercept = slope of the population regression line |
| Y- Ỳ = the difference between the observed and the
predicted value. The mathematical procedure to minimize the estimated error (Y- Ỳ) is least-square method. Estimate of ß0 and ß1 by the following equations Estimate of ß0= a = y bar – b x bar |
| Estimate of ß1 = ∑(x-xbar)(y-ybar)/∑ (x-xbar)2 = rxy sy/sx |
Multiple regression In laboratory experiments
|
Or your goal may be to find an
equation that best predicts blood pressure from those three
variables.
|
The multiple regression model and its assumptions Y = ß0 + ß1X1 + ß2X2 + ß3X3 + ß4X4 . . . +
random scatter
|
On average, blood pressure
increases (or decreases) a certain amount (the best- fit value
of ß1) for every year of age. This amount is the same for men
and women of all ages and all weights. On average, blood pressure increases (or decreases) a certain amount per pound (the best-fit value of ß2). This amount is the same for men and women of all ages and all weights. On average, blood pressure differs by a certain amount between men and women (the best-fit value of ß3). This amount is the same for people of all ages and weights. |
The mathematical terms are
that the model is linear and allows for no interaction. Linear
means that holding other variables constant, the graph of blood
pressure vs. age (or vs. weight) is a straight line. No interaction
means that the slope of the blood pressure vs. age line is the
same for all weights and for men and women.
Additionally, the multiple regression procedure
makes assumptions about the random scatter. It assumes that
the scatter is Gaussian, and that the standard deviation of
the scatter is the same for all values of X and Y. Furthermore,
the model assumes that the scatter for each subject should be
random, and should not be influenced by the deviation of other
subjects.
|
![]()