Titanic data analysis

2022. 3. 19. 10:29AI/Big data

    목차
반응형

data analysis

import pandas as pd
t = sb.load_dataset('titanic')
t

survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True
891 rows × 15 columns

t['surviced']
            0      0
            1      1
            2      1
            3      1
            4      0
                  ..
            886    0
            887    1
            888    0
            889    1
            890    0
            Name: survived, Length: 891, dtype: int64

coiunt

t['survived'].value_counts()
0    549
1    342
Name: survived, dtype: int64

normalize

count 값을 0에서 1 사이의 값으로 normalize

t['survived'].value_counts(normalize=True)
  0    0.616162
  1    0.383838
  Name: survived, dtype: float64

조건식

남자 중에서 생존한 사람의 value_counts를 normalize 하여 출력

t[t['sex'] == 'male']['survived'].value_counts(normalize=True)
  0    0.811092
  1    0.188908
  Name: survived, dtype: float64

남자의 생존율은 약 18%

t[t['sex'] == 'female']['survived'].value_counts(normalize=True)
    1    0.742038
    0    0.257962
    Name: survived, dtype: float64

여자의 생존율은 74% 정도

일부 column에 대한 record 보기

survived, sex, age, class만 보기

t[['survived', 'sex', 'age', 'class']]
         survived     sex   age   class
    0           0    male  22.0   Third
    1           1  female  38.0   First
    2           1  female  26.0   Third
    3           1  female  35.0   First
    4           0    male  35.0   Third
    ..        ...     ...   ...     ...
    886         0    male  27.0  Second
    887         1  female  19.0   First
    888         0  female   NaN   Third
    889         1    male  26.0   First
    890         0    male  32.0   Third
반응형

'AI > Big data' 카테고리의 다른 글

hyperparameter tuning  (0) 2022.03.19
Titanic data training  (0) 2022.03.19
Pandas DataFrame  (0) 2022.03.18
Pandas Series  (0) 2022.03.18
Map Reduce vs. Spark  (0) 2022.03.06