Keras로 MF 구현

Keras로 MF 구현

2022. 5. 15. 10:48ㆍAI/Deep learning

평점 데이터로 rating matrix 생성

import pandas as pd
import numpy as np
from sklearn.utils import shuffle

r_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_csv('u.data', names=r_cols,  sep='\t',encoding='latin-1')
ratings = ratings[['user_id', 'movie_id', 'rating']].astype(int)

TRAIN_SIZE = 0.75
ratings = shuffle(ratings)
cutoff = int(TRAIN_SIZE * len(ratings))
ratings_train = ratings.iloc[:cutoff]
ratings_test = ratings.iloc[cutoff:]

Keras package load

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dot, Add, Flatten
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import SGD, Adamax

Matrix Factorization parameter 설정

K = 200                             # Number of Latent factors
mu = ratings_train.rating.mean()    # overall average
M = ratings.user_id.max() + 1       # Number of users
N = ratings.movie_id.max() + 1      # Number of movies

def RMSE(y_true, y_pred):
    return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))

모델 정의

user = Input(shape=(1, ))   
item = Input(shape=(1, )) 
P_embedding = Embedding(M, K, embeddings_regularizer=l2())(user)
Q_embedding = Embedding(N, K, embeddings_regularizer=l2())(item)
user_bias = Embedding(M, 1, embeddings_regularizer=l2())(user)
item_bias = Embedding(N, 1, embeddings_regularizer=l2())(item)

R = layers.dot([P_embedding, Q_embedding], axes=2)
R = layers.add([R, user_bias, item_bias])
R = Flatten()(R)

model = Model(inputs=[user, item], outputs=R)

model.compile(
  loss=RMSE,
  optimizer=SGD(),
  #optimizer=Adamax(),
  metrics=[RMSE]
)

model.summary()

신경망 학습

result = model.fit(
  x=[ratings_train.user_id.values, ratings_train.movie_id.values],
  y=ratings_train.rating.values - mu,  # train set의 출력 지정
  epochs=60,
  batch_size=256,  # 한 번에 학습하는 batch size
  validation_data=(  # 정확도 측정을 위해 사용할  test set을 지정
    [ratings_test.user_id.values, ratings_test.movie_id.values],
    ratings_test.rating.values - mu
  )
)

…

Epoch 59/60

293/293 [==============================] - 2s 7ms/step - loss: 1.1156 - RMSE: 1.0977 - val_loss: 1.1097 - val_RMSE: 1.0921

Epoch 60/60

293/293 [==============================] - 2s 7ms/step - loss: 1.1151 - RMSE: 1.0976 - val_loss: 1.1092 - val_RMSE: 1.0921

모델에서 mu를 더하고 빼는 것을 굳이 표현하지 않기 위해,

fit 시 y에 대해서 mu를 빼고, 예측된 결과에 일괄적으로 mu를 더함

y_pred = model.predict([user_ids, movie_ids]) + mu

plt.plot(result.history['RMSE'], label="Train RMSE")
plt.plot(result.history['val_RMSE'], label="Test RMSE")
plt.xlabel('epoch')
plt.ylabel('RMSE')
plt.legend()
plt.show()

user_ids = ratings_test.user_id.values[0:6]
movie_ids = ratings_test.movie_id.values[0:6]
predictions = model.predict([user_ids, movie_ids]) + mu
print("Actuals: \n", ratings_test[0:6])
print( )
print("Predictions: \n", predictions)

Actuals:

user_id movie_id rating

79619 886 20 2

85309 893 1012 3

52069 707 1007 4

99577 590 676 4

78216 896 27 1

58896 758 529 4

Predictions:

[[3.510834 ]

[3.5274408]

[3.532464 ]

[3.5137413]

[3.4406579]

[3.5852284]]

저작자표시 비영리 변경금지 (새창열림)

'AI > Deep learning' 카테고리의 다른 글

user based CF, item based CF (0)	2022.03.22
역전파 (Back-propagation) (0)	2022.03.06
Activation function(활성 함수) (0)	2022.03.06
RNN(Recurrent Neural Network) (0)	2022.03.06
seq2seq (0)	2022.03.06

spring rain