Stepwise Regression in Machine Learning & Types of it

What is Stepwise Regression Analysis?

A approach for selecting a selection of predictors(features) to incorporate into a regression model is stepwise regression in machine learning. When handling multiple features and must choose the most significant ones to assist in target variable forecasting, this approach is quite helpful.

Stepwise regression tries to improve model performance by removing extraneous variables, reducing model complexity and preventing overfitting, or by introducing variables that contribute favourably to the model.

Types of Stepwise Regression

There are three main types of stepwise regression:

Each kind includes gradually adding or eliminating predictors from the model based on a certain criterion, typically the p-value from statistical tests (such as the F-test or the t-test).

Forward Stepwise Regression

Forward selection begins with an empty model (no predictors) and gradually introduces the most significant predictors. At each step, the predictor that improves the model the greatest (based on the evaluation criterion, such as minimising the AIC or maximising R-squared) is added to the model.

Steps for Forward Selection:

Begin with no predictors.
Include the predictor that improves the model the greatest. This is commonly accomplished by comparing the p-values of each predictor to determine whether one enhances the model’s performance.
Repeat the process, incorporating the most significant predictor at each step.
Stop when no further predictors significantly enhance the model (as determined by a p-value cutoff or other criterion).

Backward Stepwise Logistic Regression

Backward elimination operates in the opposite direction. It begins with a complete model (containing all predictors) and removes the least significant predictors one at a time.

Steps for Backward Elimination:

Begin with the complete model, including all predictors.
Remove the least significant predictor, determined by its p-value or contribution to the model.
Repeat the procedure for removing the least significant predictor.
Stop when all of the remaining predictors are statistically significant.

Bidirectional Stepwise Regression

Bidirectional stepwise elimination, or Stepwise regression, combines forward selection with backward elimination. It allows predictors to be added or withdrawn at each step, depending on their significance.

Steps for Bidirectional Elimination:

Begin with a subset of predictors, which can be either empty (as in forward selection) or full (in backward elimination).
At each phase, you try adding and eliminating predictors.
Add the most significant predictor (such as forward selection).
Remove the least significant predictor (as in backward elimination).
Continue this approach until no more predictors can be added or removed without deteriorating the model.

How Stepwise Regression Works?

For a regression model, stepwise regression is a method for selecting the best set of predictors(features). The main concept is to progressively automatically decide which variables to add or exclude from a model. It assesses each potential predictor using statistical criteria such as p-values or AIC to ensure that only the most relevant predictors are included in the final model.

Here is a full explanation of how stepwise regression works:

Starting Point:

The algorithm starts with a model with either:

There are no predictors (forward selection): You begin with an empty model and gradually introduce predictors.
All predictors (backward elimination): You begin with a complete model that incorporates all available predictors.
A combination of the two (bidirectional or stepwise): Predictors can be added or removed at each phase.

2. Evaluate Candidates:

The key to stepwise regression is to evaluate each predictor using a statistical criterion. They are typically:

P-value: Determines the importance of every predictor by P-value. Candidates for elimination are those with high p-values—that is, values over a predefined threshold—usually 0.05.
AIC (Akaike Information Criterion): Measures the goodness of fit while penalizing for the number of parameters. A lower AIC suggests a better model.
BIC (Bayesian Information Criterion): Like AIC, but with a larger penalty for complexity(more parameters).
Adjusted R-squared: After considering the number of predictors, assesses the model’s ability to explain target variable variability.

Depending on the type of stepwise regression:

Forward Selection:
Start with an empty model then one by one add predictors. Based on metrics such as p-value or AIC, assess for every predictor how much it enhances the model. The predictor that results in the best improvement is added.

Backward Elimination:
Start with the model’s all the predictors. Remove the predictor with the highest p-value—that is, least significance. Revers the model with every elimination. Remove predictors one more until just the most important ones remain.

Bidirectional elimination (Stepwise):
This technique alternately adds and removes predictors. The model is tested after adding a predictor; should a predictor prove to be useless, it is eliminated. This keeps on until more progress is not possible.

3. Refit the Model After Each Step:

After adding or removing a predictor, the model is refitted and its performance is assessed. If adding or removing a predictor improves the model based on the chosen criterion, the step is accepted, and the procedure proceeds.

4. Stopping criteria:

The process repeats until no more predictors can be added or removed to improve the model’s performance. In forward selection, this implies adding the best possible predictor, whereas in backward elimination, it involves ensuring that all remaining predictors are statistically significant.

Model Evaluation Criteria

Stepwise regression assesses model improvements at each step using a selection criterion. Common criteria include:

AIC (Akaike Information Criterion): Determines the relative quality of a model by balancing fit and complexity. A lower AIC score suggests a better model.
BIC (Bayesian Information Criterion): It is similar to AIC but has a higher penalty for adding predictors.
P-value: The p-value represents the significance test of individual coefficients.
Adjusted R-squared: Determines the percentage of variance explained by the model.

Advantages of Stepwise Regression

Variable Selection: Assists in selecting the most significant predictors while rejecting unimportant ones, hence reducing model complexity.
Improved Interpretability: A model with fewer predictors is easier to understand.
Prevention of Overfitting: Overfitting can be avoided using stepwise regression, which removes less relevant factors.
Time-saving: For a large number of predictors, stepwise regression can be used to automatically choose features.

Disadvantages of Stepwise Regression

Overfitting: If not properly regulated, stepwise approaches can result in overfitting, particularly if the sample size is small in comparison to the number of predictors.
Instability: The results may be sensitive to minor changes in the data, and various training sets may provide distinct subsets of features.
Multicollinearity Issues: Stepwise regression fails to handle multicollinearity well, resulting in unstable coefficient estimates.
Does Not Always Find the Best Model: The approach is heuristic, and its greedy feature selection may cause it to overlook the “best” model.
Bias in Model Selection: Because it is based on p-values, stepwise regression may introduce bias if relevant factors are removed prematurely or unimportant predictors are maintained.

Best Practices

Cross-Validation: To avoid overfitting, use cross-validation.
Understand the Domain: Stepwise regression is only one way. Understanding the data and the domain is absolutely essential before depending just on statistical selection.
Examining several models: While also used for feature selection and can help with multicollinearity, compare stepwise regression to other models including Lasso or Ridge regression.

Implementation of Stepwise Regression

Most statistical software and machine learning libraries support stepwise regression. In Python, this can be done using statsmodels or scikit-learn for more basic feature selection, but for stepwise regression, you would typically need to implement the iterative process yourself or rely on specialized libraries like mlxtend.

Stepwise Regression Python Code

Here is a basic implementation using statsmodels in Python for Forward Selection:

import statsmodels.api as sm
import pandas as pd

# Assuming 'X' is your feature matrix and 'y' is the target variable

def forward_selection(X, y):
    initial_features = X.columns
    selected_features = []
    while True:
        remaining_features = list(set(initial_features) - set(selected_features))
        new_pval = pd.Series(index=remaining_features)
        
        for feature in remaining_features:
            model = sm.OLS(y, sm.add_constant(X[selected_features + [feature]])).fit()
            new_pval[feature] = model.pvalues[feature]
        
        min_pval_feature = new_pval.idxmin()
        min_pval = new_pval[min_pval_feature]
        
        if min_pval < 0.05:  # Only keep features with p-value < 0.05
            selected_features.append(min_pval_feature)
        else:
            break
        
    return selected_features

# Example usage
selected_features = forward_selection(X, y)
print(f"Selected features: {selected_features}")

Conclusion

Finding important elements in a regression model is best accomplished via stepwise regression. By removing extraneous variables, it simplifies difficult models and increases their performance. It does, however, have certain risks like overfitting and employing unsuitable models. The model must be validated using cross-validation; further approaches for feature selection and model development include regularization (Lasso or Ridge regression) and ensemble techniques (like Random Forests).

Page Content

Posts

Advantages and Disadvantages of Convolutional Neural Network

What is Panel Data Regression Analysis in Machine Learning?

What is Mutual Information Analysis in Machine Learning?

What is Gaussian Splatting Algorithm in Machine Learning?

Advantages and Disadvantages of Active Learning

What is Matrix Factorization in Machine Learning?

What is Matrix Decomposition in field of Machine Learning?

Machine Learning for Signal Processing and It’s Types

Bootstrap Methods and Their Applications in Machine Learning

What is Tanh Activation Function? and Tanh vs Sigmoid