1.4. Support Vector Machines
支持向量机
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
支持向量机是一种用于分类,回归和异常值检测的监督学习方式。
The advantages of support vector machines are:
SVM的优点如下:
Effective in high dimensional spaces.
在多维度空间中具有高效性。
Still effective in cases where number of dimensions is greater than the number of samples.
在特征值大于样本数情况下仍旧高效。
Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
在决定函数(称为支持向量)中使用训练集数据的一个子集,因此内存表现也高效。
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
多功能性:可以为决定函数指定不同的核函数。提供常用的核函数,也可以指定你所习惯的核函数。
The disadvantages of support vector machines include:
缺点如下:
If the number of features is much greater than the number of samples, the method is likely to give poor performances.
如果特征值数远远超过样本数,SVM性能可能不太好。
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).
SVM不直接提供概率评估,而是使用开销五倍的交叉验证来计算()
The support vector machines in scikit-learn support both dense (numpy.ndarray and convertible to that by numpy.asarray) and sparse (any scipy.sparse) sample vectors as input. However, to use an SVM to make predictions for sparse data, it must have been fit on such data. For optimal performance, use C-ordered numpy.ndarray (dense) or scipy.sparse.csr_matrix (sparse) with dtype=float64.
scikit-learn里的支持向量机同时支持dense(numpy中的ndarray数组和其他转化成ndarray的数组)和sparse(scipy.sparse)样本向量作为输入。然而如果用SVM来对sparse数据做预测,要保证数据已经被适配。
SVC, NuSVC and LinearSVC are classes capable of performing multi-class classification on a dataset.
SVN,NuSVC 和LinearSVC都可以用来进行数据的多类分类。
../_images/plot_iris_0012.png
SVC and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section Mathematical formulation). On the other hand, LinearSVC is another implementation of Support Vector Classification for the case of a linear kernel. Note that LinearSVC does not accept keyword kernel, as this is assumed to be linear. It also lacks some of the members of SVC and NuSVC, like support_.
As other classifiers, SVC, NuSVC and LinearSVC take as input two arrays: an array X of size [n_samples, n_features] holding the training samples, and an array y of class labels (strings or integers), size [n_samples]: