Skip to main content

Train-Test Clusters in DORA

Learn how DORA uses spatial cross-validation to train and evaluate its predictive model.

Overview

In this article, you will learn how DORA organizes your Learning Points into train-test clusters to train and validate a predictive model. Understanding this process helps you interpret your Prediction Accuracy results with confidence and know what the model has been tested on.


What Are Train-Test Clusters

When you click Run Prediction in Step 4: Select Input Features, DORA automatically groups your Learning Points into spatial clusters, which are sets of geographically nearby points. These clusters are used to separate training data from testing data.

DORA applies a method called spatial cross-validation.

Rather than training on some points and testing on others chosen at random, it rotates through each cluster one at a time. Each cluster takes a turn as the test set while the model trains on the remaining clusters. This process repeats until every cluster has been held out for testing.

You can view train-test clusters on the Prediction Map from the 3D Layers List, on their own or overlaid with other data such as Learning Points.

Train Test Clusters on a Prediction Map


Why DORA Uses This Method

Many machine learning models use a random split; for example, 70% of points for training and 30% for testing. DORA uses a different approach, because random splits are not well-suited to geospatial data.

Mineralization is spatially correlated. Points that are geographically close tend to share similar geological characteristics. If training and test points are chosen at random, they often end up as neighbours. The model can then be evaluated on patterns it has effectively already seen nearby, which can produce results that look strong but do not hold up on genuinely unexplored ground.

By using spatial cross-validation, DORA ensures that the test set is always geographically independent from the training set. This gives a more realistic measure of how the model will perform when predicting in new, unsampled areas — the “core” purpose of prospectivity mapping.


How Performance is Measured

The performance metrics shown in the Prediction Accuracy and Performance Breakdown outputs reflect how the model performed across geographically independent test clusters. This includes accuracy, precision, recall, and F1 scores.

Because the test clusters are spatially distinct, these metrics represent the model's ability to generalise across different parts of your Area of Interest (AOI), not just areas it has already seen.


Learn More


Still Have Questions?

Reach out to your dedicated DORA contact or email support@VRIFY.com for more information.

Did this answer your question?