Credit risk is one of the major financial challenges that exist in the banking system. Yet, so far many lenders have been slow to fully utilise the predictive power of digitising risk. This is despite a recent report from McKinsey showing that machine learning may reduce credit losses by up to 10 per cent, with over half of risk managers expecting credit decision times to fall by 25 to 50 per cent.
Why? Auditability. Traditional scorecards often make it easier to explain how a customer was scored to the customer and regulators alike.
Traditional models tend to focus on borrowers’ financials, categorising customers based on demographics, payment history, and other macroeconomic considerations. This makes it easier for financial institutions to show clear relationships between consumer behaviour and credit score.
However, the way consumers are spending, saving, and borrowing money is changing and so, too, is technology.
Using machine learning for credit analysis
With machine learning, banks and financial institutions are increasingly able to implement more science and less guesswork. Major financial institutions have been using AI to detect and prevent fraudulent transactions for several years.
For example, in 2017 JPMorgan Chase introduced COiN, a contract intelligence platform that, using machine learning, can review 12,000 annual commercial credit agreements in seconds. It would take staff around 360,000 hours per year to analyse the same amount.
AI-based scoring models combine customers’ credit history and the power of big data, using a wider range of sources to improve credit decisions and often yielding better insights than a human analyst. Banks can analyse larger volumes of data – both financial and non-financial – by continuously running different combinations of variables and learning from that data to predict variable interactions.
Building a credit risk classification model
A recent Proof of Concept (PoC) showing that running AI-based scoring models on Intel® Xeon® processors and using Intel® Performance Libraries can help banks boost machine-learning and data analytics performance.
Using Intel-optimized performance libraries in the Intel® Xeon® Gold 6128 processor helped machine-learning applications to make predictions faster when running a German credit dataset of over 1,000 credit loan applicants.
The major steps involved in designing the PoC solution are shown in Figure 1.
Dataset analysis: This is the initial exploration of the data, including numerical and categorical variable analysis
Pre-processing: Data pre-processing transforms the data before feeding it to the algorithm. In this case, it will involve converting the categorical variables to numerical variables using various techniques such as one-hot and label encoding
Feature selection: In this step, the goal is to remove the irrelevant features which may cause an increase in run time, generate complex patterns, etc. This can be done either by using Random Forest or Xgboost algorithm
Data split: The data is then split into train and test sets for further analysis
Model building: Machine-learning models are selected for training
Prediction: During this stage, the trained model predicts the output for a given input based on its learning
Evaluation: In order to measure performance, various evaluation metrics are available such as accuracy, precision, and recall
For more information, read the full case study to learn how using Intel® Distribution for Python* and Intel® Performance Libraries can help boost machine-learning and data analytics performance for credit risk analysis.
Develop the future of AI
Join fellow developers and explore more free software tools, libraries, SDKs, and code samples at the Intel® AI Academy. Receive free access to the Intel® AI DevCloud and take advantage of the award-winning Intel® Performance Libraries to optimize code and shorten development time.