LightGBM增量学习机制探索

Kuludu

April 21, 2022

1463 views

No comments

2079 words

技术文章智能科学

有关于LightGBM增量学习相关的资料可以找到不少，但大多数都只是在代码层面介绍了接口的使用方法（甚至相当一部分时互相搬运粗制滥造的），而具体到原理与实现就很难找到相关资料了。

在StackExchange上，我找到了这么一个回答：

LightGBM will add more trees if we update it through continued training (e.g. through BoosterUpdateOneIter). Assuming we use refit we will be using existing tree structures to update the output of the leaves based on the new data. It is faster than re-training from scratch, since we do not have to re-discover the optimal tree structures. Nevertheless, please note that almost certainly it will have worse performance (on the combined old and new data) than doing a full retrain from scratch on them.
Any online learning algorithm will be designed to adapt to changes. That said, LighyGBM's performance will depend on the training parameters we will use and how we will validate our predictions (e.g. how much we care to disregard previous data points). Assuming we properly train our booster, without having a relevant baseline (e.g. a ridge regression trained on an incremental manner) it does not make sense to say "LightGBM is good (or bad)" for dealing with concept drift.

实际上，就模型更新而言，LightGBM提供了两种方式：