Skip to main content

Advanced Features for Building Predictive Models

Learn about Advanced Feature settings in the ‘Build Predictive Model’ step of a DORA experiment.

Updated over 3 weeks ago

Overview

In the Build Predictive Models step of a DORA experiment, a Random Forest classifier is used to predict the likelihood of mineralization within your Area of Interest (AOI). This ensemble method leverages multiple decision trees trained on subsets of data to improve prediction accuracy and robustness. Advanced settings allow you to adjust parameters that affect how the forest is trained, balancing between overfitting and underfitting to optimize results.

Below is a breakdown of each advanced feature setting:


Tree Depth

Tree Depth refers to the number of steps in each decision tree.

Tree depth influences how complex the decision trees become. A deeper tree can provide more detailed predictions but risks overfitting, making the model highly accurate on training data but less effective on new data. The default is set at 24, as 24 embedding dimensions have been inputted. Increasing the depth can improve accuracy but may lead to overfitting; reducing it helps prevent overfitting and enhances the generalization of the model’s predictions.


Number of Trees

The Number of Trees is the number of decision trees in the model’s forest.

A higher number of trees generally boosts predictive accuracy. However, if this number surpasses the available training points, the model risks overfitting. The typical default is 300, balancing accuracy and efficiency. Adjusting the number of trees allows for fine-tuning based on available data points and will be specific to your AOI and target commodity.


Minimum Split

The minimum split is the number of samples needed to split a node of your tree into two branches.

This parameter controls when a decision tree in the forest splits a node into two branches or sets the node as a final class. Higher minimum splits reduce the risk of overfitting by ensuring that nodes are only split when there is enough data to justify it. This leads to broader, more general splits, improving the forest’s ability to generalize. The typical range is between 20 and 30 for balanced fitting.


Minimum Samples

The minimum samples refers to the number of samples needed at the end of a decision tree for a final classification.

This parameter sets how many samples a node must have to form a class prediction. In Random Forests, low values may lead to overfitting, as the model can create highly specific rules for the data. High values, on the other hand, can result in underfitting. Setting this value carefully ensures that the individual trees in the forest contribute effectively to the overall model's performance.


Predict Depth

The Predict Depth determines if the output will include or exclude the depth of the prediction.

If the Predict Depth option is unchecked, the model will produce only two-dimensional predictions. Keeping it checked includes depth in the prediction, adding an additional layer to the output.


Still have questions?

Reach out to your dedicated DORA contact or email Support@VRIFY.com for more information.

Did this answer your question?