Exploratory Analysis of Loan Dataset using python

Ramesh Banjade
2 min readMay 21, 2021

--

Data is a valuable asset for any organization, as it contains crucial information related to the business and its customers. However, processing large volumes of data manually can be time-consuming and impractical. With the increasing volume of data generated each day, it is necessary to find efficient ways to analyze and visualize this data.

Here we have discussed about some important python library that helps to analyze and visualize data effectively and efficiently.

import required python library for analysis and visualization

Numpy is a numeric module which is used for mathematical calculation. Pandas is used to read and write data files. Data manipulation can be done easily with dataframes. Matplotlib library is used to display the data in the graphical image so that it is called plotting library

load the dataset using below code.

loan_df= pd.read_csv('data file name')
loan_df.info()

data preparation:

The data preparation is the process of cleaning data which are suitable to build the model. There are several technique to pre-processed the data

-Drop features that were irrelevant for the goal
-Drop the row or columns which has maximum null values
-Identify the missing value and handle those value
-use mean, median , mode technique
code:
loan_df.drop(columns = 'column name', inplace=True)
mean = loan_df['Credit Score'].mean()
loan_df['Credit Score'].fillna(mean, inplace=True)
'''#note we can use median , mode as per our dataset requirements'''
identify the count of missing values
plot the figure using seaborn library

Univariate function

Univariate function will plot the graphs based on the parameters.
df : dataframe name
col : Column name
vartype : variable type : continuos or categorical
Continuos(0) : Distribution, Violin & Boxplot will be plotted.
Categorical(1) : Countplot will be plotted.
hue : It’s only applicable for categorical analysis.

Define Univariate function
plot the loan status using univariate

similarly , we can show all the attributes in graph.

convert the target string value into binary. To build prediction model we need to change the string value into binary.

we can use count group by and sum function in python like sql to find out the number of different sub attributes.

Total paid-unpaid loan in pie chart

also we can build loan defaulter prediction model by using different supervised learning algorithm(Naïve Bayes, Decision tree, Random Forest etc.)

--

--