Extremely randomized trees (ERT)

What is ERT ?

  • Extremely randomized trees: a tree-based ensemble method for supervised classification and regression problems.
  • It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node.
  • The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter.
  • Besides accuracy, the main strength of the resulting algorithm is computational efficiency.





Extra-Trees algorithm

  • The Extra-Trees algorithm builds an ensemble of unpruned decision or regression trees according to the classical top-down procedure.
  • Its two main differences with other tree based ensemble methods are that it splits nodes by choosing cut-points fully at random and that it uses the whole learning sample(rather than a bootstrap replica) to grow the trees.



 where Hc(S) is the (log) entropy of the classification in S, Hs(S) is the split entropy (also called split information by Quinlan), and I< s, c> (S) is the mutual information of the split outcome and the classification.


Bias/variance analysis

  • From the bias-variance point of view, the rationale behind the Extra-Trees method is that the explicit randomization of the cut-point and attribute combined with ensemble averaging should be able to reduce variance more strongly than the weaker randomization schemes used by other methods.
  • The usage of the full original learning sample rather than bootstrap replicas is motivated in order to minimize bias.
  • randomization increases bias and variance of individual trees, but may decrease their variance with respect to the learning sample.
  • the part of the variance due to randomization can be canceled out by averaging over a sufficiently large ensemble of trees.

On the effect of parameters

  • determines the strength of the attribute selection process.
  • nmin the strength of averaging output noise.
  • the strength of the variance reduction of the ensemble model aggregation.

Attribute selection strength K

  • For a given problem, the smaller is, the stronger the randomization of the trees and the weaker the dependence of their structure on the output values of the learning sample.
  • For classification problems, the default value of K = √n is shown as a vertical line on the graphs; for regression the default corresponds to the highest possible value of K (i.e. K = n).

Smoothing strength nmin

  • nmin is the minimum number of samples required for splitting a node.
  • Larger values of nmin lead to smaller trees, higher bias and smaller variance, and its optimal value hence depends in principle on the level of output noise in the dataset.
  • although possibly slightly suboptimal, the default values of nmin = 2 (classification) and nmin = 5 (regression) appear to be robust choices in a broad range of typical conditions.

Averaging strength M

  • The parameter denotes the number of trees in the ensemble.
  • It is well known that for randomization methods the behavior of prediction error is a monotonically decreasing function of M (see, e.g., Breiman, 2001). So in principle, the higher the value of M, the better from the accuracy point of view.
  • The choice of an appropriate value will thus essentially depend on the resulting compromise between computational requirements and accuracy.
  • Default value: M = 100



电子邮件地址不会被公开。 必填项已用*标注