2022. 5. 15. 10:48ㆍAI/Deep learning
- 목차
평점 데이터로 rating matrix 생성
import pandas as pd
import numpy as np
from sklearn.utils import shuffle
r_cols = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_csv('u.data', names=r_cols, sep='\t',encoding='latin-1')
ratings = ratings[['user_id', 'movie_id', 'rating']].astype(int)
TRAIN_SIZE = 0.75
ratings = shuffle(ratings)
cutoff = int(TRAIN_SIZE * len(ratings))
ratings_train = ratings.iloc[:cutoff]
ratings_test = ratings.iloc[cutoff:]
Keras package load
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Dot, Add, Flatten
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import SGD, Adamax
Matrix Factorization parameter 설정
K = 200 # Number of Latent factors
mu = ratings_train.rating.mean() # overall average
M = ratings.user_id.max() + 1 # Number of users
N = ratings.movie_id.max() + 1 # Number of movies
def RMSE(y_true, y_pred):
return tf.sqrt(tf.reduce_mean(tf.square(y_true - y_pred)))
모델 정의
user = Input(shape=(1, ))
item = Input(shape=(1, ))
P_embedding = Embedding(M, K, embeddings_regularizer=l2())(user)
Q_embedding = Embedding(N, K, embeddings_regularizer=l2())(item)
user_bias = Embedding(M, 1, embeddings_regularizer=l2())(user)
item_bias = Embedding(N, 1, embeddings_regularizer=l2())(item)
R = layers.dot([P_embedding, Q_embedding], axes=2)
R = layers.add([R, user_bias, item_bias])
R = Flatten()(R)
model = Model(inputs=[user, item], outputs=R)
model.compile(
loss=RMSE,
optimizer=SGD(),
#optimizer=Adamax(),
metrics=[RMSE]
)
model.summary()
신경망 학습
result = model.fit(
x=[ratings_train.user_id.values, ratings_train.movie_id.values],
y=ratings_train.rating.values - mu, # train set의 출력 지정
epochs=60,
batch_size=256, # 한 번에 학습하는 batch size
validation_data=( # 정확도 측정을 위해 사용할 test set을 지정
[ratings_test.user_id.values, ratings_test.movie_id.values],
ratings_test.rating.values - mu
)
)
…
Epoch 59/60
293/293 [==============================] - 2s 7ms/step - loss: 1.1156 - RMSE: 1.0977 - val_loss: 1.1097 - val_RMSE: 1.0921
Epoch 60/60
293/293 [==============================] - 2s 7ms/step - loss: 1.1151 - RMSE: 1.0976 - val_loss: 1.1092 - val_RMSE: 1.0921
모델에서 mu를 더하고 빼는 것을 굳이 표현하지 않기 위해,
fit 시 y에 대해서 mu를 빼고, 예측된 결과에 일괄적으로 mu를 더함
y_pred = model.predict([user_ids, movie_ids]) + mu
plt.plot(result.history['RMSE'], label="Train RMSE")
plt.plot(result.history['val_RMSE'], label="Test RMSE")
plt.xlabel('epoch')
plt.ylabel('RMSE')
plt.legend()
plt.show()
user_ids = ratings_test.user_id.values[0:6]
movie_ids = ratings_test.movie_id.values[0:6]
predictions = model.predict([user_ids, movie_ids]) + mu
print("Actuals: \n", ratings_test[0:6])
print( )
print("Predictions: \n", predictions)
Actuals:
user_id movie_id rating
79619 886 20 2
85309 893 1012 3
52069 707 1007 4
99577 590 676 4
78216 896 27 1
58896 758 529 4
Predictions:
[[3.510834 ]
[3.5274408]
[3.532464 ]
[3.5137413]
[3.4406579]
[3.5852284]]
'AI > Deep learning' 카테고리의 다른 글
user based CF, item based CF (0) | 2022.03.22 |
---|---|
역전파 (Back-propagation) (0) | 2022.03.06 |
Activation function(활성 함수) (0) | 2022.03.06 |
RNN(Recurrent Neural Network) (0) | 2022.03.06 |
seq2seq (0) | 2022.03.06 |