ML - 非线性回归 logistic regression

1. 概率:

1.1 定义 概率(P)robability: 对一件事情发生的可能性的衡量

1.2 范围 0 <= P <= 1

1.3 计算方法:

1.3.1 根据个人置信

1.3.2 根据历史数据

1.3.3 根据模拟数据

1.4 条件概率:



2. Logistic Regression (逻辑回归)

2.1 例子


h(x) > 0.5

h(x) > 0.2

2.2 基本模型

测试数据为X(x0,x1,x2···xn)

要学习的参数为: Θ(θ0,θ1,θ2,···θn)



向量表示:



处理二值数据,引入Sigmoid函数时曲线平滑化

预测函数:



用概率表示:
正例(y=1):



反例(y=0):

2.3 Cost函数
线性回归:




找到合适的 θ0,θ1使上式最小

Logistic regression:



Cost函数:
目标:找到合适的 θ0,θ1使上式最小
2.4 解法:梯度下降(gradient decent)




更新法则:


image.png
  • 学习率
    同时对所有的θ进行更新
    重复更新直到收敛

  • Python 实现:

import numpy as np
import random


# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y


# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
print("x:")
print(x)
print("y:")
print(y)

m, n = np.shape(x)
print("m:")
print(m)
print("n:")
print(n)

numIterations = 100000 #训练次数
alpha = 0.0005  # 学习率
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print("theta:")
print(theta)

结果:

x:
[[ 1.  0.]
 [ 1.  1.]
 [ 1.  2.]
 [ 1.  3.]
 [ 1.  4.]
 [ 1.  5.]
 [ 1.  6.]
 [ 1.  7.]
 [ 1.  8.]
 [ 1.  9.]
 [ 1. 10.]
 [ 1. 11.]
 [ 1. 12.]
 [ 1. 13.]
 [ 1. 14.]
 [ 1. 15.]
 [ 1. 16.]
 [ 1. 17.]
 [ 1. 18.]
 [ 1. 19.]
 [ 1. 20.]
 [ 1. 21.]
 [ 1. 22.]
 [ 1. 23.]
 [ 1. 24.]
 [ 1. 25.]
 [ 1. 26.]
 [ 1. 27.]
 [ 1. 28.]
 [ 1. 29.]
 [ 1. 30.]
 [ 1. 31.]
 [ 1. 32.]
 [ 1. 33.]
 [ 1. 34.]
 [ 1. 35.]
 [ 1. 36.]
 [ 1. 37.]
 [ 1. 38.]
 [ 1. 39.]
 [ 1. 40.]
 [ 1. 41.]
 [ 1. 42.]
 [ 1. 43.]
 [ 1. 44.]
 [ 1. 45.]
 [ 1. 46.]
 [ 1. 47.]
 [ 1. 48.]
 [ 1. 49.]
 [ 1. 50.]
 [ 1. 51.]
 [ 1. 52.]
 [ 1. 53.]
 [ 1. 54.]
 [ 1. 55.]
 [ 1. 56.]
 [ 1. 57.]
 [ 1. 58.]
 [ 1. 59.]
 [ 1. 60.]
 [ 1. 61.]
 [ 1. 62.]
 [ 1. 63.]
 [ 1. 64.]
 [ 1. 65.]
 [ 1. 66.]
 [ 1. 67.]
 [ 1. 68.]
 [ 1. 69.]
 [ 1. 70.]
 [ 1. 71.]
 [ 1. 72.]
 [ 1. 73.]
 [ 1. 74.]
 [ 1. 75.]
 [ 1. 76.]
 [ 1. 77.]
 [ 1. 78.]
 [ 1. 79.]
 [ 1. 80.]
 [ 1. 81.]
 [ 1. 82.]
 [ 1. 83.]
 [ 1. 84.]
 [ 1. 85.]
 [ 1. 86.]
 [ 1. 87.]
 [ 1. 88.]
 [ 1. 89.]
 [ 1. 90.]
 [ 1. 91.]
 [ 1. 92.]
 [ 1. 93.]
 [ 1. 94.]
 [ 1. 95.]
 [ 1. 96.]
 [ 1. 97.]
 [ 1. 98.]
 [ 1. 99.]]
y:
[ 26.27815269  32.66768058  28.22594145  30.77125223  29.22859695
  38.02617578  38.77723704  38.75941693  37.51914005  34.70311263
  39.38349805  36.82172645  37.53424558  43.89335788  39.86619043
  42.77143872  46.97544428  50.24971924  45.30721118  44.55195142
  51.70691022  46.56863106  52.32805153  52.84954093  54.55242641
  52.14422122  56.27667761  56.98691298  53.56176317  63.44462043
  60.08578544  65.41098273  65.92701345  64.24412903  67.53920778
  63.35080039  64.43398594  63.34590094  63.11265328  67.07000322
  69.69430602  67.07964006  71.26126237  72.33061819  78.99023496
  78.62644886  75.33387876  74.23899871  80.06708854  81.03063236
  82.09372834  76.65280126  86.38648144  86.39932245  79.56509259
  86.62380336  88.00737772  91.95667651  86.30124993  91.39647352
  87.791776    88.80877001  96.1679461   94.8139934   98.44559598
  98.55320134  99.85464471  96.56094905  97.31222944  94.19160055
  98.14492827  99.80317251 101.65405055 102.29465893 100.20862392
 106.37400148 108.33447212 110.41768632 105.49789886 104.64961868
 109.37812661 107.69358766 109.10927721 109.93432977 109.13875359
 116.8197377  111.26240862 120.88567915 117.93786525 122.85693307
 120.26210017 116.99993199 124.74461618 124.99292528 122.0402555
 123.9124033  122.28379028 127.65976993 132.21417455 125.08727085]

m:
100
n:
2

theta:
[29.86975704  1.00423275]
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 山林曲 闫贵忠 世相无常,如风如云。 人心难测,波浪起伏。 山林清泉,江湖犬马。 有...
    闫贵忠阅读 567评论 0 4
  • 本节内容的思维导图 进化不是让我们看到真实的世界,而是让我们能够生存下去即可。只是强大的怀疑功能,我们才认识到进化...
    孙彤雯阅读 435评论 0 0
  • 【百日生涯营DAY17】 梦境日:今天跟大家科普一个概念,叫做”冰山理论“。这个理论的内容就是告诉大家,人的意识分...
    Sarah_Lee阅读 121评论 0 0
  • 今天和筱晓在小区里玩,以前遇到有台阶,不管高低,她都需要有人牵手再走,然后第二阶段是学会了用手趴在地上撑着上下台阶...
    玉露君阅读 203评论 0 2