Keras로 MF 구현

2022. 5. 15. 10:48AI/Deep learning

    목차
반응형

평점 데이터로 rating matrix 생성

import pandas as pd
import numpy as np
from sklearn.utils import shuffle

r_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_csv('u.data', names=r_cols,  sep='\t',encoding='latin-1')
ratings = ratings[['user_id', 'movie_id', 'rating']].astype(int)

TRAIN_SIZE = 0.75
ratings = shuffle(ratings)
cutoff = int(TRAIN_SIZE * len(ratings))
ratings_train = ratings.iloc[:cutoff]
ratings_test = ratings.iloc[cutoff:]

Keras package load

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dot, Add, Flatten
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import SGD, Adamax

Matrix Factorization parameter 설정

K = 200                             # Number of Latent factors
mu = ratings_train.rating.mean()    # overall average
M = ratings.user_id.max() + 1       # Number of users
N = ratings.movie_id.max() + 1      # Number of movies
def RMSE(y_true, y_pred):
    return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))

모델 정의

user = Input(shape=(1, ))   
item = Input(shape=(1, )) 
P_embedding = Embedding(M, K, embeddings_regularizer=l2())(user)
Q_embedding = Embedding(N, K, embeddings_regularizer=l2())(item)
user_bias = Embedding(M, 1, embeddings_regularizer=l2())(user)
item_bias = Embedding(N, 1, embeddings_regularizer=l2())(item)
R = layers.dot([P_embedding, Q_embedding], axes=2)
R = layers.add([R, user_bias, item_bias])
R = Flatten()(R)
model = Model(inputs=[user, item], outputs=R)
model.compile(
  loss=RMSE,
  optimizer=SGD(),
  #optimizer=Adamax(),
  metrics=[RMSE]
)
model.summary()

 

 

신경망 학습

result = model.fit(
  x=[ratings_train.user_id.values, ratings_train.movie_id.values],
  y=ratings_train.rating.values - mu,  # train set의 출력 지정
  epochs=60,
  batch_size=256,  # 한 번에 학습하는 batch size
  validation_data=(  # 정확도 측정을 위해 사용할  test set을 지정
    [ratings_test.user_id.values, ratings_test.movie_id.values],
    ratings_test.rating.values - mu
  )
)

Epoch 59/60

293/293 [==============================] - 2s 7ms/step - loss: 1.1156 - RMSE: 1.0977 - val_loss: 1.1097 - val_RMSE: 1.0921

Epoch 60/60

293/293 [==============================] - 2s 7ms/step - loss: 1.1151 - RMSE: 1.0976 - val_loss: 1.1092 - val_RMSE: 1.0921

 

모델에서 mu 더하고 빼는 것을 굳이 표현하지 않기 위해,

fit y 대해서 mu 빼고, 예측된 결과에 일괄적으로 mu 더함

y_pred = model.predict([user_ids, movie_ids]) + mu

 

plt.plot(result.history['RMSE'], label="Train RMSE")
plt.plot(result.history['val_RMSE'], label="Test RMSE")
plt.xlabel('epoch')
plt.ylabel('RMSE')
plt.legend()
plt.show()

user_ids = ratings_test.user_id.values[0:6]
movie_ids = ratings_test.movie_id.values[0:6]
predictions = model.predict([user_ids, movie_ids]) + mu
print("Actuals: \n", ratings_test[0:6])
print( )
print("Predictions: \n", predictions)

 

Actuals:

        user_id  movie_id  rating

79619      886        20       2

85309      893      1012       3

52069      707      1007       4

99577      590       676       4

78216      896        27       1

58896      758       529       4

 

Predictions:

 [[3.510834 ]

 [3.5274408]

 [3.532464 ]

 [3.5137413]

 [3.4406579]

 [3.5852284]]

반응형

'AI > Deep learning' 카테고리의 다른 글

user based CF, item based CF  (0) 2022.03.22
역전파 (Back-propagation)  (0) 2022.03.06
Activation function(활성 함수)  (0) 2022.03.06
RNN(Recurrent Neural Network)  (0) 2022.03.06
seq2seq  (0) 2022.03.06