栏目分类:
子分类:
返回
终身学习网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
终身学习网 > IT > 软件开发 > 后端开发 > R语言

R语言

R语言 更新时间:发布时间: 百科书网 趣学号

在建立决策树时,需要平衡“准确度和复杂度”。

  • Why we need this 'bias-variance tradeoff'/ Why we need pruning(剪枝)?

用#splits(分枝数量)来体现树的复杂度,树越大越复杂,对training set的准确度也越高。

如果不顾一切地让树增大,会导致overfitting/over learning(对训练集表现得好,but for future data, it may perform badly)。

  • What is cp used for?

The main role of this parameter is to save computing time by pruning off splits that are obviously not worthwhile.

cp(complexity parameter) is used to 'control this bias-variance tradeoff'. A prefect cp leads to a perfect balance.

cp is used to judge whether a split is allowed/whether to prune.

  • What is cp?

Any split that does not decrease the overall lack of fit by a factor of cp is not attempted. For instance, with anova splitting, this means that the overall R-squared must increase by cp at each step. 

Essentially,the user informs the program that any split which does not improve the fit by cp will likely be pruned off by cross-validation, and that hence the program need not pursue it.

当增加一个节点引起的分类精确度变化量小于树复杂度变化的cp倍时,则须剪去该节点。

如何度量树的精确度变化?对训练集做10-fold交叉验证(cross validation)。(在rpart.control函数中的参数xval(number of cross-validations)默认值为10)

Therefore if we set this parameter small, we can get a big tree.

转载请注明:文章转载自 www.051e.com
本文地址:http://www.051e.com/it/855894.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 ©2023-2025 051e.com

ICP备案号:京ICP备12030808号