Financial Inclusion in East Africa

Predicting Financial Inclusion in East Africa Using Machine Learning

Project mission

The mission of the "Financial Inclusion in Africa" project is to leverage machine learning to predict and enhance financial inclusion across Kenya, Rwanda, Tanzania, and Uganda. By identifying key factors that influence bank account ownership and usage, this project aims to provide actionable insights for policymakers, financial institutions, and development organizations.

The goal is to support efforts in designing targeted interventions, policies, and strategies that can effectively address financial exclusion, promote economic empowerment, and foster sustainable development in these regions.

Components and Steps

Exploratory Data Analysis

Country Representatioin

Raw data shows a high relationship between owning a mobile phone and having a bank account

STEPS

Data Preparation and Import: Load and clean the dataset.
Initial Exploration: Understand data structure and key characteristics.
Target Variable Analysis: Investigate bank account ownership distribution.
Country Representation: Analyze respondent distribution across countries.
UniqueID Analysis: Check uniqueness and insights of the 'uniqueid' column.
Age Distribution: Study respondents' age distribution.
Household Size: Compare average household sizes by country.
Household Head Relationship: Analyze respondents' relationships with household heads.
Cell Phone Access: Examine cell phone ownership.
Education Level: Analyze education levels.
Job Type: Explore job type distribution.
Correlation Analysis: Identify feature correlations.

Model Building

"""
custom function used to import prepare the data for model building
"""

def wrangle(file_path):

df = pd.read_csv(file_path)

# drop columns which are not useful in the model

df = df.drop(columns=['year', 'uniqueid'])

""" Encoding Categorical Features """

# Identify columns with categorical values for encoding

categorical_columns = [x for x in df.columns if type(df[x][1]) == str]

print(categorical_columns)

print(f"Our dataframe has {len(categorical_columns)} categorical columns")

# Instantiate label encoder

label_encoder = LabelEncoder()

# Transform data

for column in df.columns:

df[column] = label_encoder.fit_transform(df[column])

return df

STEPS

Data Wrangling and Preprocessing: Prepare the data for modeling, including handling missing values and encoding categorical variables.
Feature Matrix and Target Vector Splitting: Separate the data into feature matrix and target vector.
Handling Imbalance: Address class imbalance through techniques like oversampling.
Baseline Model: Develop a baseline machine learning model for initial predictions.
Model Iteration and Hyperparameter Tuning: Improve the model using pipelines and hyperparameter tuning.
Performance Evaluation: Evaluate model performance using metrics such as accuracy, precision, recall, and F1 score.
Results Communication: Visualize the results with confusion matrices and feature importance plots.

Results

Model Evaluation

The random forest model achieved a test accuracy of 89.12%. This means that when the model was tested on new, unseen data, it accurately predicted whether an individual has a bank account or not 89.12% of the time.

To provide context, the baseline accuracy, which is often the accuracy achieved by a simple model or a naive approach (like always predicting the majority class), was 85.81%. Comparing the test accuracy of the random forest model (89.12%) to the baseline accuracy (85.81%) indicates that the model performs significantly better than a basic approach.

Feature Importances

The feature importances highlight the significance of different features in predicting the likelihood of financial inclusion. This information is crucial for stakeholders seeking to understand the driving factors behind financial inclusion and make informed decisions.

Here is a brief explanation of the 3 most significant feature importances:

Education Level: The education level of individuals has the highest importance in predicting financial inclusion. This suggests that higher education may positively influence an individual's likelihood of having a bank account.
Job Type: The type of job held by individuals is the second most important factor. Certain job types may offer more financial stability or access to banking services, leading to higher rates of financial inclusion.
Age of Respondent: The age of respondents also plays a significant role. Younger individuals may be more likely to adopt banking services compared to older individuals.

Tools and Technologies

Programming Language: Python
Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
Data Processing: Data cleaning, preprocessing, and transformation
Modeling Techniques: RandomForestClassifier(), oversampling for imbalance handling, pipeline creation, hyperparameter tuning
Evaluation Metrics: Accuracy, precision, recall, confusion matrix, feature importance

Project History

Page updated

Google Sites

Report abuse