Logistic Regression

라부송 2019. 3. 31. 18:50

- Linear regression과는 달리 0~1 내에서 값이 결정될 때 Logistic regression이라 칭한다.

- hypothesis의 그래프는 로그함수의 형태를 지니고, sigmoid라 불린다.

- Linear 때 처럼 그대로 Gradient descent 알고리즘을 적용하면 예쁜 포물선 형태가 나오지 않고 로그함수 때문에 쭈글쭈글해지기 때문에, 최소값이 다르게 나오므로 적절하지 않다.

- 따라서 위 cost 함수와 같이 변형된 식이 요구된다.

import tensorflow as tf

x_data=[[1,2],[2,3],[3,1],[4,3],[5,3],[6,2]]
y_data=[[0],[0],[0],[1],[1],[1]]

X=tf.placeholder(tf.float32, shape=[None,2])
Y=tf.placeholder(tf.float32, shape=[None,1])

W=tf.Variable(tf.random_normal([2,1]), name='weight')
b=tf.Variable(tf.random_normal([1]), name='bias')

hypothesis=tf.sigmoid(tf.matmul(W,X)+b)
cost=-tf.reduce_mean(Y*tf.log(hypothesis)+(1-Y)*tf.log(1-hypothesis))
train=tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)

predicted=tf.cast(hypothesis>0.5, dtype=tf.float32)
accruacy=tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))

with tf.Session() as sess :
    sess.run(tf.global_variables_initializer())
    
    for step in range(10001) :
        cost_val, _ = sess.run([cost,train], feed_dict={X:x_data, Y:y_data})
        if step % 200 == 0:
            print(step, cost_val)
            
    h,c,a=sess.run([hypothesis, predicted, accuracy], feed_dict={X:x_data, Y:y_data})
    print("\nhypothesis: ", h, "\nCorrect (Y) : ", c, "\nAccruacy: ", a)

코드 이외의 설정 오류 때문에 실행을 해보진 못했다

그러나 이 logistic regression 알고리즘으로 여러가지 데이터 파일을 불러워서 실질적인 통계를 낼 수 있다

(ex. 당뇨병 판단)

시간날 때 캐글 사이트에서 적당한 데이터 csv 파일 하나쯤 불러와서 실습해봐야겠다