Paragon's Modeller: Three Decades of Excellence in Credit Scoring Models 

December 5, 2023 Mark Thompson

(third article in a series of four)

Well designed tools make things easier

Paragon's Modeller has evolved over three decades, continuously improved and refined into a trusted and valued tool for credit risk analysts and modellers. With a focus on simplicity and ease of use, Modeller embodies a commitment to excellence, with best-in-class processes and methodologies to deliver best-in-class predictive models.

Tools that go hand-in-hand with your internal processes, standards and methodologies are invaluable. They not only save time but also instil best practices and ensure compliance with both your internal audit standards and external regulatory requirements. Modeller, from its inception, has been crafted as such a tool – one that understands the intricacies of credit risk modeling and aligns seamlessly with the operational needs of credit risk modelling professionals.

"We started development 30 years ago, and we haven't finished yet." This encapsulates the spirit of Modeller's journey – a continuous commitment to improvement and adaptability. It's a powerful tool that has evolved with the industry since its launch in 1992, keeping pace with the changing dynamics of credit risk analytics. In previous articles, we explored critical aspects of selecting a model development tool through the "Top 10 Questions to Ask" and delved into the transformative influence of process standardization in "The Power of Process Standardization". These topics set the stage for a deeper exploration into the credit risk domain expertise embedded in Modeller over the past 30 years.

The evolution of Modeller

1990s – Pioneering beginnings

Modeller's history isn't merely a timeline of features; it's a story of evolving domain expertise. In the 1990s the first version of the tool was developed, a pioneering purpose-built tool for the development of credit risk scorecards incorporating grouping (binning), WoE Logistic Regression, and Reject Inference for best-in-class model development. Up until the launch of Modeller (or DSS as it was first known) the methodology and approach for developing credit risk scorecards was somewhat guarded by a few. We not only pioneered the development of a scorecard-building tool, we also pioneered the path for the democratisation of scorecard development.

Always at the core of Modeller's excellence has been its proficiency in Logistic Regression models. Along with the "Auto Grouping" user-friendly binning tool (ensuring that variable classification – a cornerstone in model building – is not just accurate but also efficient) and Reject Inference module, Modeller streamlined the model development process making it easy to understand and saving valuable time for credit professionals. The Auto Grouper provides a solid starting point for visualising WoE trends and running quick models. Users can then easily review and adjust bins to align with expected trends, adding a layer of business sense to variable analysis.

For advanced application models requiring Reject Inference, Modeller's built-in tool is unparalleled. Users can choose how to infer their rejected and NTU populations, using a combination of Known Good/Bad and Accept/Reject models, or a more simple constant-value approach. Reject Inference not only contributes to a more robust final model but also reduces biases by incorporating data from the populations without actual or reliable performance outcome information.

To the present day, WoE logistic regression continues to be widely used for credit risk models development. This is due to its ease of explainability, robustness, easily catering for outliers and providing powerful models that are easy to understand, implement, monitor and update/recalibrate if required.

2000s – Choice and control

In the 2000s functionality continued to be added including multi-stage modelling, the innovative Field Reducer, and the Expert Scorecard Builder, strengthening the ability for modellers to have greater control in how their models are developed depending on the coverage, volume and accuracy of the modelling data, along with providing greater choice and greater control for users.

Field Reducer in Modeller automates the removal of fields with low discriminatory power or high correlation, ensuring subsequent models are both accurate and avoid overfitting, making it a user-favourite and powerful tool. It was first introduced to manage large numbers of credit bureau data variables whereby the number of candidate scorecard variables increased from tens to hundreds.

The use of bureau data also drove the need for multi-stage models to be developed. Being able to introduce bureau data into a model build as a second stage meant that lenders could have greater control around calling for (expensive) bureau data for certain applications. As more alternative data sources have become available the value and importance of developing multi-stage models has increased further.

Expert Scorecard Builder was introduced to enable model builds with insufficient defaulting accounts or no defaults at all. Expert modelling techniques are linked to the available customer data. This takes account of the correlations between the predictor characteristics and eases the path to a full statistical model by applying Bayesian update techniques. Low volume modelling is the link between expert models and full statistical approaches. Paragon Low Volume Models enable the estimation of default rates within univariate characteristics while retaining the characteristic relationships required to restrain model overfitting.

2010s – More algorithms, greater efficiencies

The 2010s marked a leap with the introduction of decision trees (CHAID and CART), other modelling techniques such as Survival Analysis and Elastic Net modelling, along with project templates. Modeller evolved from being a tool to a comprehensive solution, meeting the demands of a rapidly advancing industry.

Chi-Squared Automatic Interaction Detection (CHAID) is a tree building technique which picks a set of predictors in sequence to optimally predict the dependent variable. Decision trees can be used to identify interactions within predictive data as these will produce a non-symmetrical tree. Used in this way, they can inform suggestions for population splits in segmented scorecard building.

Elastic Net logistic regression is a regularised regression method that linearly combines Lasso and Ridge regression techniques. Lasso and Ridge regression help to reduce model over-fitting by limiting the value of the parameters that are being estimated with L1 and L2 penalty functions. The result is a regression method that incorporates variable selection with the capacity of selecting groups of correlated variables. A by-product of Elastic Net is that stepwise characteristic selection is no longer required and a very useful graphic indication of characteristic power is available.

As Survival analysis became an increasingly popular method of evaluating risk within the credit-scoring industry it was introduced into Modeller in the form of Cox Proportional Hazards modelling. Cox PH models predict the probability of an event occurring (e.g. an account going bad) within a specific time interval. The fundamental difference between survival analysis and the more traditional logistic regression methods is the inclusion of ‘survival times’ or a ‘time to event’. Using Good/Bad modelling as an example; in logistic regression, the only information required is whether an account has become bad during some predefined observation period [0, T] (T = 12 months for example). The logistic model produced gives the probability of an account becoming bad within this interval. In survival analysis, the user will also need to know the time at which this event occurs. The Cox PH model enables the user to calculate the probabilities of an account going bad within any time interval [0,t] with t < T (i.e. the probability of an account going bad within 3 months, 9 months etc.) provided the user specifies the baseline hazard function at time t. Survival analysis has been particularly relevant to Basel and IFRS9 models.

Modeller's reporting capabilities know no bounds. Standard built-in reports and customizable canvases empower users to develop their own reports, ensuring consistency and neatness in documentation. The flexibility to build standard validation processes directly into Modeller further reduces documentation time, allowing credit professionals to focus on refining their models.

Project templates, including Behavioral/Application templates and options to Include/Exclude Field Reducer, NTU Inference, and Reject Inference, provide a structured framework. The ability to add common variables to your generated characteristics library and import them into your current project, along with generating additional characteristics and automatic cross-characteristic generation ‘on-the-fly', demonstrates Modeller's commitment to driving efficiencies.

2020s – Machine learning and open-source integration

In the 2020s, Modeller embraced machine learning with tree ensemble models, introduced SHAP analysis for explainable ML, and integrated an Open Source Python Node. These advances highlight Modeller's adaptability and readiness to incorporate the latest techniques into its framework.

Modeller has embraced machine learning with the introduction of Random Forests and XGBoost algorithms. XGBoost is an implementation of gradient boosted decision trees. Boosting is an ensemble technique where new models are added to correct the errors made by existing ones. Models are added sequentially until no further improvements can be made, each model predicting the residuals or errors of the previous models. The term gradient boosting comes from the gradient descent algorithm to minimize the loss when adding new models. Random Forest is another tree ensemble technique where bagging and randomness is introduced in the development phase to build lots of individual decision trees which are then aggregated together to create a powerful predictive model which tends not to be overfitted. Tree ensemble models by their nature are hard to fully understand and explain by simply looking at the model/code. This is why explainability reports are also needed such as variable importance and SHAP analysis, also introduced in Modeller alongside the machine learning algorithms.

What sets Modeller apart is its commitment to staying on the cutting edge, evidenced by the latest addition of an Open Source Python Node. This integration allows users to seamlessly bring Python directly into the tool, bridging the gap between traditional modeling and the evolving landscape of machine learning. The Open-Source node allows users to run their own python scripts from within Modeller itself using the grouped data as its input. Once the python script has been run, a range of model outputs (e.g. P(good)) can then be fed back into the dataset and used within Modeller for reporting and validation.

Efficiency, control and governance (audit) is at the core of Modeller's design. The user-friendly binning tool, feature reduction capabilities, and the ability to produce quick models within minutes all contribute to an efficient model development process. Modeller understands that time is valuable, and it ensures that every action aligns with best practices, and every feature is crafted with the goal of making the life of a credit risk modeller easier. All user clicks and steps in the whole model building process are logged, stored and retrievable within a complete audit trail.

The software has evolved into an intuitive, interactive platform that puts the user in the driver's seat. With a sleek interface and powerful features, credit professionals can navigate through the complexities of predictive modeling with ease. Whether first introduced many years ago or more recently, all functionality adds value to the modelling process through a combination of control, choice and efficiency.

Conclusion

From its start 30 years ago to the present day, Modeller has evolved in tandem with the needs of credit risk professionals. From its pioneering beginnings to the present day, Modeller has embraced simplicity, transparency and domain knowledge. Its durability to the changing landscape of credit risk analytics is testament to its commitment to best in class credit risk analytics. As credit professionals navigate the complexities, Modeller remains a trusted ally, quietly engineering credit risk models with efficiency and precision.

You can find out more about Modeller and see some screenshots here, and to arrange a demo of Modeller please contact us by emailing info@credit-scoring.co.uk.