Overview

In the Build Predictive Models step of a DORA experiment, a Random Forest classifier is used to predict the likelihood of mineralization within your Area of Interest (AOI). This ensemble method leverages multiple decision trees trained on subsets of data to improve prediction accuracy and robustness. Advanced settings allow you to adjust parameters that affect how the forest is trained, balancing between overfitting and underfitting to optimize results.

Below is a breakdown of each advanced feature setting:

Minimum Number Simulations

The Number of Simulations means the number of times the Random Forest model is run to create a variance within the prediction.

Each run varies training points and input features through subsampling, allowing for slightly different outputs at each run. This helps create more generalized predictions by building inherent variance into the model. The default is set to 10 runs but can be increased to 100 to explore variability further. Note that more runs will require longer processing times.

Negative Balance

The Negative Balance impacts the sampling ratio between positive and negative classes in the training data.

In Random Forests, an unbalanced dataset can cause the model to favour the majority class. The Negative Balance setting allows for subsampling of the negative class to prevent overfitting to it, which is especially relevant in geology, where non-mineralized samples often outnumber mineralized ones.

The default ratio is 1.3, meaning if you had 1000 positive points (mineralized samples), it would subsample the negative class to have 1300 negative points (unmineralized samples). Adjusting this helps ensure the model accurately predicts both classes, even when positive samples are scarce.

As a side note, it can be interesting to have more negative points and slightly favour the accuracy of negative class predictions. This will ensure that areas that aren’t highlighted by the VPS score are truly unmineralized.

Tree Depth

Tree Depth refers to the number of steps in each decision tree.

Tree depth influences how complex the decision trees become. A deeper tree can provide more detailed predictions but risks overfitting, making the model highly accurate on training data but less effective on new data. The default is set at 24, as 24 embedding dimensions have been inputted. Increasing the depth can improve accuracy but may lead to overfitting; reducing it helps prevent overfitting and enhances the generalization of the model’s predictions.

Number of Trees

The Number of Trees is the number of decision trees in the model’s forest.

A higher number of trees generally boosts predictive accuracy. However, if this number surpasses the available training points, the model risks overfitting. The typical default is 300, balancing accuracy and efficiency. Adjusting the number of trees allows for fine-tuning based on available data points and will be specific to your AOI and target commodity.

Minimum Split

The minimum split is the number of samples needed to split a node of your tree into two branches.

This parameter controls when a decision tree in the forest splits a node into two branches or sets the node as a final class. Higher minimum splits reduce the risk of overfitting by ensuring that nodes are only split when there is enough data to justify it. This leads to broader, more general splits, improving the forest’s ability to generalize. The typical range is between 20 and 30 for balanced fitting.

Minimum Samples

The minimum samples refers to the number of samples needed at the end of a decision tree for a final classification.

This parameter sets how many samples a node must have to form a class prediction. In Random Forests, low values may lead to overfitting, as the model can create highly specific rules for the data. High values, on the other hand, can result in underfitting. Setting this value carefully ensures that the individual trees in the forest contribute effectively to the overall model's performance.

Predict Depth

The Predict Depth determines if the output will include or exclude the depth of the prediction.

If the Predict Depth option is unchecked, the model will produce only two-dimensional predictions. Keeping it checked includes depth in the prediction, adding an additional layer to the output.

Still have questions?

Reach out to your dedicated DORA contact or email Support@VRIFY.com for more information.

Understanding a Prediction Map (View-Only Access)

Advanced Features for Building Predictive Models