具体例子见（jupyter notebook）：

E:\cgx硬盘\★Python and AI\（cgx★★）scikit learn 学习笔记\sklearn_cgx\GridSearchCV和RandomizedSearchCV参数搜索\GridSearchCV_and_Pipeline.ipynb

Cas1: Pipeline的每个计算步骤只搜索一种方法的一种或多种参数；

这种情况很简单直观，在Pipeline中清晰定义了每一个计算步骤，然后在my_param_grid 中明确定义这几个计算步骤需要被搜索的参数、范围、轮数。

如下示例，Pipeline中明确定义了第1计算步骤 reduce_dim（自定义的字符串）用PCA()方法，第2计算步骤 classify（自定义的字符串）用LinearSVC()方法。

然后在my_param_grid1 中清晰指出了要搜索的参数是reduce_dim__n_components和classify__C，他们分别表示了PCA()的n_components参数和LinearSVC()的C参数。
my_param_grid2 与 my_param_grid1本质完全相同，只是多设置了一轮新的搜索。

# 定义Pipeline对象(这里每个计算步骤都是我们实际要搜索的方法，PCA()和LinearSVC())
pipe = Pipeline([('reduce_dim', PCA(iterated_power=7)),
                 ('classify', LinearSVC(dual=False, max_iter=10000))])

# # 针对Pipeline对象的每个计算步骤，设置参数搜索范围
 my_param_grid1 = {'reduce_dim__n_components': [2, 4, 8],
                  'classify__C': [1, 10, 100, 1000]}  # dict （单轮搜索）

my_param_grid2 = [{'reduce_dim__n_components': [2, 4, 8],
                  'classify__C': [1, 10, 100, 1000]},
                 {'reduce_dim__n_components': [13, 20, 26],
                  'reduce_dim__n_oversamples': [5, 10],
                  'classify__C': [50, 100]}]  # dict list （多轮搜索）

Cas2: Pipeline的每个计算步骤搜索多种方法的一种或多种参数；

与Cas1不同，Cas2在Pipeline对应的每个步骤可以搜索多种方法，此时在Pipeline中定义的计算步骤只相当于做了一个占位符，而在my_param_grid 中为每个计算步骤设置了多种方法选择，以及对应的需要被搜索的参数、范围、轮数。

如下示例，Pipeline中的第1计算步骤 reduce_dim 可以是'passthrough' 或 PCA(iterated_power=7) 或 [PCA(iterated_power=7), NMF()]...，前面说了，可以直接将其视为为占位符；第2计算步骤 classify 用LinearSVC()方法，当然这里也可以用多个方法组成的list（但因为他是最后的估计器，因此不能用'passthrough'，否则报错）。

然后在my_param_grid1 中清晰指出了：
对于reduce_dim步骤，真正要搜索的方法是[PCA(iterated_power=10), NMF(), FastICA()]三种，且针对这三种方法要搜索参数是'n_components（reduce_dim__n_components）'（需要注意的是，待搜索参数，必须是这些方法都有的参数，否则会报错）。
my_param_grid2 与 my_param_grid1本质完全相同，只是针对reduce_dim步骤还要搜索SelectKBest(chi2)方法的k参数，因为这个参数是PCA()和NMF()所没有的，因此要单独拿出来搜索。同时针对classify步骤，要搜索[SVC(), LinearSVC()]两种方法的C参数。

# 定义Pipeline对象(这里每个计算步骤无论是：'passthrough'、PCA(iterated_power=7)还是[PCA(iterated_power=7), NMF()]，都是占个位置而已)
pipe = Pipeline([('reduce_dim', 'passthrough' 或 PCA(iterated_power=7) 或 [PCA(iterated_power=7), NMF()]),
                 ('classify',  LinearSVC(dual=False, max_iter=10000))])

# 针对Pipeline对象的每个计算步骤，真正设置搜索方法和对应搜索参数、范围、轮数。
my_param_grid1 = {'reduce_dim': [PCA(iterated_power=10), NMF(), FastICA()],
                 'reduce_dim__n_components': [2, 4, 8],
                 'classify__C': [1, 10, 100, 1000]}  # dict

my_param_grid = [{'reduce_dim': [PCA(iterated_power=10), NMF(), FastICA()],
                  'reduce_dim__n_components': [2, 4, 8],
                  'classify__C': [1, 10, 100, 1000]},
                 {'reduce_dim': [SelectKBest(chi2)],
                  'reduce_dim__k': [2, 4, 8],
                  'classify': [SVC(), LinearSVC()],
                  'classify__C': [1, 10, 100, 1000]}]

注意

my_param_grid 中所有的dict的'key'的定义，由三部分组成，以‘reduce_dim__n_components’为例说明：
（1）第一部分，pipe对象中的‘key’：reduce_dim
（2）第二部分，两个短下划线：__
（3）第三部分，reduce_dim对应的方法（这里是PCA）的输入参数名称：n_components

如果没有太多的超参数需要调优，并且 pipeline 运行时间不长，请使用 GridSearchCV；
对于较大的搜索空间和训练缓慢的模型，请使用 HalvingGridSearchCV；
对于非常大的搜索空间和训练缓慢的模型，请使用 HalvingRandomSearchCV。

GridSearchCV() 与 Pipline相结合

GridSearchCV() 与 Pipline相结合

具体例子见（jupyter notebook）：

Cas1: Pipeline的每个计算步骤只搜索一种方法的一种或多种参数；

Cas2: Pipeline的每个计算步骤搜索多种方法的一种或多种参数；

注意

推荐阅读更多精彩内容