Tag Archives: Regression Analysis

【Disaster Research:Excel】Regression Analysis

Step-by-Step Guide for Excel Regression Analysis

1. Prepare Your Dataset in Excel

Dataset Overview:
Create an Excel file (e.g., DisasterData.xlsx) with the following columns and sample data:

Objective:
Use the earthquake magnitude (independent variable) to predict economic loss (dependent variable: MN USD) through simple linear regression.

2. Enable the Analysis ToolPak

Excel’s Data Analysis ToolPak is required for regression analysis. If it’s not already enabled:

  1. Click on File > Options.
  2. Select Add-Ins.
  3. In the Manage box at the bottom, select Excel Add-ins and click Go.
  4. Check Analysis ToolPak and click OK.

3. Visualize the Data

Before running the regression, it’s helpful to visualize the relationship:

  1. Select the columns Earthquake_Magnitude and Economic_Loss (excluding the header if desired).
  2. Go to the Insert tab.
  3. Choose Scatter from the Charts group and select the basic scatter plot.

This chart helps you see if there’s a linear trend between the two variables.

4. Conduct the Regression Analysis

  1. Go to the Data tab and click Data Analysis (in the Analysis group).
  2. In the Data Analysis dialog, select Regression and click OK.
  3. Input Y Range:
    • Select the range for the dependent variable (Economic_Loss). For example, if Economic_Loss is in column C from row 2 to row 16, enter C2:C16.
  4. Input X Range:
    • Select the range for the independent variable (Earthquake_Magnitude). For example, B2:B16.
  5. If your data has headers, check the Labels box.
  6. Choose an Output Range where you want the results to appear (or select a New Worksheet Ply).
  7. Click OK.

5. Interpret the Regression Output

Excel will generate a regression output that includes several key pieces of information:

  • Coefficients:
    • Intercept: The expected value of Economic_Loss when Earthquake_Magnitude is zero.
    • X Variable 1 (Slope): The change in Economic_Loss for each one-unit increase in Earthquake_Magnitude.
  • R-squared:
    • Indicates how much of the variance in Economic_Loss is explained by Earthquake_Magnitude. A value closer to 1 indicates a better fit.
  • p-Value:
    • Helps determine the statistical significance of the model. A low p-value (typically less than 0.05) suggests that the relationship is significant.

6. Use the Regression Model for Predictions

Once you have the coefficient and intercept from the output, you can create a prediction formula:

Y=(slope)X+(intercept) Y:Economic_Loss X:Earthquake_Magnitude

<Interpretation for beginners>

  1. Regression Statistics

Multiple R (Correlation Coefficient):

  • Value: 0.964775
  • Meaning: This measures how strongly two variables (in this case, Earthquake_Magnitude and Economic_Loss) are related. A value close to 1 indicates a very strong linear relationship.

R Square (Coefficient of Determination):

  • Value: 0.930791
  • Meaning: About 93% of the variation in the Economic_Loss can be explained by the Earthquake_Magnitude. This is considered a high value, indicating the model fits the data well.

Adjusted R Square:

  • Value: 0.925467
  • Meaning: This is a slightly adjusted version of R Square that takes into account the number of explanatory variables and the sample size. Because this is a simple linear regression with only one explanatory variable, the Adjusted R Square is still very close to the R Square value.

Standard Error:

  • Value: 9.409751
  • Meaning: On average, the model’s predictions of Economic_Loss deviate from the actual observed values by about 9.41 units (likely millions of dollars if your data is in that unit). The lower this number, the more precise the model’s predictions tend to be.

Observations:

  • Value: 15
  • Meaning: The total number of data points (earthquake events) used in the analysis.
  1. ANOVA (Analysis of Variance) Table

The ANOVA table helps you see how much of the total variation in Economic_Loss is explained by the regression (model) versus how much is left unexplained (residual).

df (Degrees of Freedom):

  • Regression df: 1 (one explanatory variable)
  • Residual df: 13 (the remainder)
  • Total df: 14 (because 15 data points minus 1)

SS (Sum of Squares):

  • Regression SS: 15480.54
  • Residual SS: 1151.064
  • Total SS: 16631.6
  • Meaning: The total SS (16631.6) is split between the portion explained by the model (15480.54) and the unexplained portion (1151.064). Since the regression SS is much larger than the residual SS, the model explains most of the variation.

MS (Mean Square):

  • Regression MS: 15480.54 (because the Regression SS is divided by 1, the df for regression)
  • Residual MS: 88.5432 (because 1151.064 is divided by 13)

F and Significance F (p-value for the overall model):

  • F: 174.835
  • Significance F: 6.44E-07 (which is 0.000000644)
  • Meaning: A very low p-value indicates that the overall regression model is statistically significant. In other words, Earthquake_Magnitude has a statistically significant effect on Economic_Loss.
  1. Coefficients Table

This table provides information about the intercept and the slope of your regression line.

Intercept (Coefficient):

  • Value: -152.261
  • Standard Error: 14.936
  • t Stat: -10.193
  • P-value: 1.47E-07
  • Lower 95%: -184.529
  • Upper 95%: -119.994
  • Meaning:
      • The intercept is the predicted Economic_Loss when Earthquake_Magnitude is 0. Mathematically, it’s part of the best-fit line. Although a negative intercept doesn’t make real-world sense for something like “loss” (you can’t have negative loss), it’s a valid outcome in a simple linear model.
      • The very small p-value (< 0.05) indicates the intercept is statistically different from zero.

Earthquake_Magnitude (Coefficient):

  • Value: 28.6965
  • Standard Error: 2.123497
  • t Stat: 13.513
  • P-value: 6.44E-07
  • Lower 95%: 23.865
  • Upper 95%: 33.528
  • Meaning:
      • For each 1-unit increase in Earthquake_Magnitude, the model predicts an increase of about 28.70 in Economic_Loss (again, presumably in millions of dollars).
      • The very small p-value (< 0.05) shows that Earthquake_Magnitude is a statistically significant predictor of Economic_Loss.
      • The 95% confidence interval (23.865 to 33.528) means we are 95% confident the true slope lies between these values.
  1. Putting It All Together

Regression Equation:

Predicted Economic Loss=−152.261+(28.6965×Earthquake Magnitude)

  1. Interpretation of R Square (0.930791):
    • About 93% of the variation in Economic_Loss is explained by the variation in Earthquake_Magnitude. This suggests a strong linear relationship.
  2. Model Significance (Significance F and p-values):
    • The overall model is highly significant (p < 0.001).
    • Earthquake_Magnitude is a very strong predictor (p < 0.001).
  3. Practical Meaning:
    • As the earthquake magnitude increases, expected economic losses rise substantially. Even though the intercept is negative (which is not realistic in a real-world scenario), the main takeaway is the slope: larger earthquakes lead to significantly higher losses.
  4. Model Limitations:
  • This is a simple linear model using only one predictor (Earthquake_Magnitude). Real-world economic loss is influenced by many factors (e.g., population density, building codes, depth of the quake, location, etc.).
  • The model’s negative intercept highlights that while it fits the data well for the range of magnitudes observed, it may not be meaningful for magnitudes far outside that range.
  1. Final Tips for Beginners
  • Always Plot Your Data: A scatter plot of Earthquake_Magnitude vs. Economic_Loss can confirm if a linear trend is reasonable.
  • Check Residuals: Look at how well the model performs across all data points. If there’s a clear pattern in the residuals, the linear model might not be appropriate.
  • Real-World Context: Negative intercepts can appear in purely statistical models but might not have a direct real-world meaning. Always interpret results carefully.
  • Add More Variables: If you suspect other factors affect Economic_Loss, consider multiple regression in the future to improve your model’s accuracy.

In summary, these regression results show a strong linear relationship between Earthquake_Magnitude and Economic_Loss. The model explains about 93% of the variation in economic losses, and both the intercept and the slope are statistically significant. However, as with any statistical model, interpret the results with caution and consider real-world factors that may affect the outcome beyond just magnitude.