Understanding traditional credit scoring
Traditional credit scoring looks at an individual’s past money behavior to decide if they’re a responsible borrower. Lenders use this information to make decisions about loans, credit cards, and more. They check payment history, how much money the person owes, and how long they’ve used credit. The higher the score, the better the interest rates and the more likely the person will get approved for loans.
If you did your financial homework well (paid bills on time, don't have too much debt, have an established credit history and a high credit score), banks and lenders are more likely to trust you with loans and give you better deals, like lower interest rates. So, being smart with their money helps people earn a high score and unlock financial opportunities. But there’s a catch.
Though effective, classic credit scoring has its limitations and challenges.
- It only considers part of what shows the individual has good money management skills, like paying rent or utility bills on time.
- If this person is new to managing money, it's like not having a school record yet. That can make it challenging to prove they’re responsible.
- Sometimes it can be unfair. Let's say your friend had a tough time in the past (maybe got sick or lost a job), and their credit score dropped. Even if they’re doing better now, those poor scores can stick around for a long time, making it tough to get loans or credit cards.
Traditional credit scoring is reliable and beneficial for banks. But it doesn't always see the whole financial picture of an individual, limiting access to financial services and, consequently, a better life for many people. That's where fintechs step in. Using newer methods, like AI-powered credit scoring, they help make things more fair and accurate.
Introduction to AI-driven credit scoring
Luckily, there's a way out for those whose credit history is far from satisfactory. The modern approach to assessing an individual’s creditworthiness uses artificial intelligence (AI) and machine learning (ML) technologies. And these help see more reasons to trust potential clients who got a “no” from banks.
How AI and ML are transforming the credit assessment process
AI-based credit scoring considers a broader range of data sources. Digital footprints work fine, too, and they can be more flattering than credit history, income, and existing debts. To analyze this data and predict an individual’s future financial behavior, AI-based credit scoring employs complex algorithms. They analyze large sets of historical data, which allows AI to identify patterns and correlations linked to the person’s ability to pay. This way, lenders get more nuanced insights to make more informed decisions.
For instance, Karat Financial offers credit cards and accounting for online creators. Banks often lack trust for this category of applicants, so the company developed a niche offering for them, including custom credit score building and reward systems, bookkeeping, financial reports, and exclusive access to events like Coachella or Supernova.
Benefits and challenges of AI in credit scoring
Let’s start with AI’s pros.
Benefits
- For clients, the main advantage of AI driving their credit scoring is that it opens the door to assessing the creditworthiness of individuals who may not have a traditional credit history. Usually, they leave other kinds of data footprints that can be evaluated, such as online transactions or mobile app usage.
- This accessibility results from two other advantages. One is increased accuracy: AI identifies complex patterns and relationships within data. It can detect subtle correlations and trends that might be challenging for traditional credit scoring models to recognize.
- A big advantage is that AI can minimize human biases that may be present in traditional credit scoring models.
- AI can process credit applications and generate credit scores much faster than manual underwriting. This speed is crucial for providing timely decisions to applicants and can improve the overall efficiency of lending operations.
- Handling a large volume of credit applications simultaneously, AI-powered credit scoring models are suitable for financial institutions with many applicants.
- AI can segment borrowers into more granular risk categories, allowing lenders to tailor their offerings and interest rates to different risk profiles.
- For lenders, one of the crucial benefits is better fraudulent activity detection. By analyzing transaction data and identifying suspicious patterns, AI helps lenders protect themselves from fraudulent loan applications, reducing financial losses.
On top of that, credit scoring algorithms can adapt and improve over time as they are exposed to more data. This means that AI-powered credit scoring models can continually refine their accuracy, learning from historical data and adapting to changing economic conditions and consumer behavior.
Challenges
- Gathering and processing extensive personal data for credit scoring can raise privacy concerns. So ensure data security and compliance with data protection regulations (e.g., GDPR, CCPA), financial regulations, and anti-discrimination laws.
- AI models can absorb biases from historical data. If the data contains biased or discriminatory patterns, AI-driven credit scoring can perpetuate these biases, leading to unfair lending decisions.
- The decision-making processes of many algorithms can be challenging to interpret. Thus, they are also hard to explain to customers or regulators.
- While AI can utilize alternative data sources, not everyone has a rich digital footprint. This scarcity of alternative data can create disparities in credit scoring, as individuals with less digital presence may not benefit from the expanded data sources.
- AI models may be too dependent on the training data. They may not perform well under extreme economic conditions or other unknown scenarios.
- The cost of implementing AI-driven credit scoring systems can be biting. Besides, your team might require specialized data science and machine learning expertise. For smaller companies and startups, that might be challenging.
- The accuracy and reliability of the results will depend on the data you use. Poor-quality data can lead to inaccurate credit assessments.
Common ML algorithms to use
AI-based credit scoring algorithms can belong to various model categories. Models can be different types, depending on how they learn and the data they use. There are supervised learning models, unsupervised learning models, and hybrid models (ensemble methods stand out in this list, but I’ll explain a bit later).
- Supervised learning models feast on data that has clear answers, like a history of who paid back loans and who didn't. Then, they use this knowledge to predict whether new people will repay loans well.
One example of a supervised model algorithm is the Decision Trees algorithm. In credit scoring, it works by creating a tree-like structure to make credit decisions. The structure starts with a question at the root node, like: "Is the applicant's credit score above a certain threshold?" Based on the answer, it follows branches down the tree, asking more questions until it reaches a final decision, such as "Approve" or "Deny" for a loan application. Decision Trees are effective for credit scoring because they provide a transparent way to assess creditworthiness based on a series of logical criteria.
- Unsupervised learning models look at data without knowing the answers, much like detectives. They might group people together for credit scoring based on their spending or online habits. These groups can help us guess how good someone is at repaying loans, even if we don't have all the answers.
Isolation Forest is one of the most common unsupervised machine learning model algorithms like K-means, Hierarchical clustering, DBSCAN, and Kohonen's self-organizing maps. In credit scoring, it can detect anomalies or outliers in credit data. It grows its own “forest” of decision trees, where each tree tries to isolate or separate normal credit transactions from potentially fraudulent ones. Anomalies, which are harder to isolate, are typically assigned shorter paths in the trees. By measuring the average path length for each data point, the algorithm can identify transactions that stand out as potential credit risks.
- Hybrid learning models are something of a blend of the two. They use unsupervised learning to find hidden patterns in data and supervised learning to make smart guesses. These hybrid models work well, providing balanced results since they combine the best of both types, ensuring they understand credit risk from every angle.
Ensemble learning and neural networks
There are several ways to combine the algorithms into working models, and some algorithms can be versatile, fitting both supervised and unsupervised models.
Traditional ensemble learning is often mixed up with hybrid models, but there’s a difference. In traditional ensemble learning, you employ distinct or similar algorithms to operate on disparate or identical datasets (like Random Forest, which stratifies the dataset and constructs diverse Decision Trees for these subsets). At the same time, you can develop various models on the same unstratified dataset to establish an ensemble approach. Essentially, you have diverse machine learning models functioning independently to produce predictions. Subsequently, a voting system, whether it's hard or soft voting, determines the ultimate prediction by combining the predictions of multiple individual models.
In contrast, models within hybrid machine learning models essentially exchange their outputs with each other (unidirectional) to craft an efficient and precise machine learning model. Therefore, the key distinction lies in that ensemble methods function autonomously and engage in voting to arrive at a decision, whereas hybrid methods collaborate to forecast a single outcome devoid of any voting component.
Also, there are neural networks, which you can utilize in supervised and unsupervised learning models. Neural networks are a class of algorithms used in ML and AI inspired by the structure and function of the human brain. Every network has layers of interconnected neurons that analyze factors like credit history, income, and other relevant information to make creditworthiness predictions for new applicants.
Neural networks are highly complex and can automatically extract relevant features (characteristics of data) from raw data. Thus, they eliminate the need for manual feature engineering in some cases.
Feature engineering in AI-based credit scoring
Since I mentioned features, it is time to explain what they are and how they underpin the AI decision-making process. I’ll use the Decision Tree algorithm as an example.
Features equal to the questions a decision tree asks. They are the characteristics of the data that help the tree make predictions. For example, if we're predicting whether a fruit is an apple, features could be the fruit's color, size, and shape. In credit scoring, they would be the applicant’s age, gender, payment records for a particular time period, etc. — clues that make sense and are likely to affect someone's ability to repay a loan.
Features can be the following:
- Numerical (age, income);
- Categorical (gender - male, female, other);
- Text features - words or phrases (can be transformed into features by methods like TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings);
- Image features (pixel values, edges, colors, or shapes);
- Temporal features (date and time).
How features are made
Let’s walk this process step by step.
- Choosing the right features. Feature engineering in credit scoring is about picking the most useful clues (features) to help the computer make predictions. For example, people with higher credit scores are usually better at repaying loans. So, we choose "credit score" as an essential feature.
- Feature transformation. Sometimes, features need a little makeover to work better. For instance, age is a number, but it might be more helpful to divide people into groups like "young adults," "middle-aged," and "seniors." This transformation simplifies the decision-making process.
- Handling categorical data. In credit scoring, you’ll often have features that are categories. So, it’s necessary to specify the values for the category. For instance, the values for the category "marital status" will be “single,” “married,” and “divorced.” To help AI understand these nuances, you must encode these categories: single=1, married=2, and divorced=3.
- Dealing with missing data. If essential data is missing, like an individual’s income, it should be replaced. For instance, instead of the missing income, we can add the average income of similar people in the dataset.
- Splitting data with features. Based on the feature values you specified, decision trees divide people into groups, forming branches of nodes. If a node cannot be split further, it’s called a leaf node. To decide which feature to split on and where to split, decision trees calculate "information gain" or "Gini impurity." This way, the tree splits data to make each group as pure (or correctly sorted) as possible. For example, if we're trying to predict loan repayment, we want one group to be mostly people who repay loans and another group to be mostly people who don't.
Example
Let's say you’re building an AI model for credit scoring. Your features might include:
- Credit score: A number from 300 to 850.
- Annual income
- Employment status (encoded as employed=1, unemployed=0).
You can transform these features by:
- Grouping credit scores: Instead of using exact scores, you can group them into ranges like "low," "medium," and "high."
- Binning income: You can divide income into groups like "low income," "middle income," and "high income."
- Encoding employment: Turn "employed" into 1 and "unemployed" into 0.
These transformations will help the decision tree make better decisions. Selecting relevant features and preprocessing them appropriately ensures effective machine learning models. The quality and relevance of features can significantly impact the model's performance.
Model training and validation
It’s a complex and lengthy process, but the more attention you pay and effort invested, the more accurate results and happy customers you’ll get.
Let’s add the last puzzle pieces to the picture of the AI-powered credit scoring process we’ve been assembling here. The steps you need to follow look like this:
- Gather historical data about borrowers, including their credit histories, incomes, and other relevant information.
- Clean and prepare the data by handling missing values, encoding categorical variables, and normalizing numerical features.
- Choose the most relevant features (like credit score or income) and transform them if necessary to improve model performance.
- Pick an appropriate machine learning model, like a decision tree or neural network, based on the problem and data.
- Train the chosen model using the prepared data. The model learns patterns from the historical data to make credit decisions.
- Validate the model by splitting the data into training and validation sets. The model is trained on the training set and evaluated on the validation set to ensure it's not just memorizing the data but can generalize to new, unseen data.
- Adjust the model's settings (hyperparameters) to optimize its performance on the validation set.
- Use evaluation metrics like accuracy, precision, recall, and F1 score to assess how well the model predicts creditworthiness. These metrics help measure the model's effectiveness.
- Once the model performs well on the validation set, it's tested on a separate, unseen test dataset to ensure it works accurately in real-world scenarios.
- If the model passes testing, then congrats! It can be deployed to make credit decisions for new applicants.
Model validation ensures that the AI model doesn't just memorize the data but can make accurate predictions on new cases. Evaluation metrics help measure how well the model is doing. For credit scoring, accuracy is essential, but other metrics like precision (how many approved loans were actually repaid) and recall (how many risky loans were correctly identified) help balance the decision-making process and reduce financial risks.
One more thing I want to add is you need to ensure the model’s transparency and fairness. The model's decisions have to be explainable and understandable. Fairness ensures that the model doesn't discriminate against any group based on factors like race or gender, not only for ethical reasons but also to comply with anti-discrimination laws. This helps build trust with borrowers and regulators.
Use cases & top companies providing AI-based credit scoring
For the financial industry, AI had a transformative effect. Let’s briefly overview AI’s use in credit scoring in different kinds of financial organizations.
Banks. Not so long ago, they relied on traditional methods and needed a team of experts to decide who was getting the loan and who wasn’t. Now, complex AI models are replacing human specialists. So, AI companies specializing in financial products are mushrooming, building and training new models for banks, now more focused on personalizing their customers’ experience.
Fintechs. In 2022, 19% of Americans still had no credit score. A situation like that can make it extremely hard to get access to credit and start saving. Moreover, it magnifies the risk of being trapped in a vicious circle with high-fee financial service providers.
But things change for the better as more and more companies tackle the accessibility of credit via adopting AI-based credit scoring. Apart from Karat Financial I mentioned above, many companies are working in this direction in the U.S. alone. I’d like to focus your attention on the following three.
Upstart
NASDAQ-traded Upstart (UPST) reached the top of the AI-powered fintechs cohort. It connects millions of customers to 100 financial institutions where they receive the ultimate credit experience, secured by Upstart’s AI models and cloud applications. The company gives a hand with personal loans, automotive retail and refinance loans, and small-dollar “relief” loans.
Since 2012, Upstart’s technology has helped more borrowers of various backgrounds get approved at lower rates: over 80% of applicants get instant approvals without any paperwork.
As for the recent achievements, the company has multiplied the number of lending partners 10-x since its IPO, now eyeing a $4 trillion market opportunity. Mainly receiving positive customer reviews, Upstart enjoys a 4.9 rating on Trustpilot.
ZestFinance
ZestFinance became the go-to shop for leading lenders worldwide. Claimed to be “the only solution for explainable AI in credit,” Zest Automated Machine Learning (ZAML™) automates credit risk analysis. So, resting assured about their safety, lenders can focus on delivering fair and transparent credit to everyone.
For curious ones here: ZestFinance’s founder, Douglas Merrill, used to work as CIO at Google. Now, using Google-like math, the company leverages machine learning and data science to perfect credit decisions. Boosting repayment rates for its customers, ZestFinance is one of the fastest-growing U.S. financial technology startups.
Coming up: ScreeCred
One year into the industry, this Houston-based startup aims to improve credit access for marginalized communities. Yet to launch, the ScreeCred will help community members build credit by tracking their auto insurance payments. Currently, the company has launched a waitlist for everyone eager to be among the first users.
Summing up on AI in credit scoring
The revolution is happening already, pulling banks and fintechs into the whirlwind of improvements AI makes to credit scoring every year. Its accuracy and efficiency have helped millions receive better credit opportunities. However, it’s not without its challenges and limitations, being dependent on the data we feed it with and our computational capabilities.
- When tackling AI model training and validation, addressing data, bias, transparency, and compliance issues is essential to ensure that AI-driven credit scoring is fair, accurate, and beneficial for all stakeholders.
- You’ll need to hold the highest possible precision at every training step, be it data collection, preprocessing, model selection, training, validation, or evaluation.
- Remember to ensure transparency and fairness to build trust and avoid discrimination in AI-driven credit decisions.
Stay tuned for more materials on AI in Fintech. In the meantime, feel free to contact me on LinkedIn to exchange views on the topic.
Further reading:
Bao, W., Ning, L., & Kong, Y. (2019, August 1). Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems With Applications; Elsevier BV. https://doi.org/10.1016/j.eswa.2019.02.033
Teng, H. W., Lin, J., & Lu, K. W. (2023, January 1). Enhancing Credit Score Predictions with Dynamic Feature Engineering using Deep Learning. Social Science Research Network; RELX Group (Netherlands). https://doi.org/10.2139/ssrn.4375313
Robert Salter. (2023). Explainable Artificial Intelligence and its Applications in Behavioural Credit Scoring. DiVA Portal. Retrieved September 14, 2023, from https://www.diva-portal.org/smash/get/diva2:1784385/FULLTEXT01.pdf