2022. 3. 19. 10:29ㆍAI/Big data
- 목차
data analysis
import pandas as pd
t = sb.load_dataset('titanic')
t
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
886 0 2 male 27.0 0 0 13.0000 S Second man True NaN Southampton no True
887 1 1 female 19.0 0 0 30.0000 S First woman False B Southampton yes True
888 0 3 female NaN 1 2 23.4500 S Third woman False NaN Southampton no False
889 1 1 male 26.0 0 0 30.0000 C First man True C Cherbourg yes True
890 0 3 male 32.0 0 0 7.7500 Q Third man True NaN Queenstown no True
891 rows × 15 columns
t['surviced']
0 0
1 1
2 1
3 1
4 0
..
886 0
887 1
888 0
889 1
890 0
Name: survived, Length: 891, dtype: int64
coiunt
t['survived'].value_counts()
0 549
1 342
Name: survived, dtype: int64
normalize
count 값을 0에서 1 사이의 값으로 normalize
t['survived'].value_counts(normalize=True)
0 0.616162
1 0.383838
Name: survived, dtype: float64
조건식
남자 중에서 생존한 사람의 value_counts를 normalize 하여 출력
t[t['sex'] == 'male']['survived'].value_counts(normalize=True)
0 0.811092
1 0.188908
Name: survived, dtype: float64
남자의 생존율은 약 18%
t[t['sex'] == 'female']['survived'].value_counts(normalize=True)
1 0.742038
0 0.257962
Name: survived, dtype: float64
여자의 생존율은 74% 정도
일부 column에 대한 record 보기
survived, sex, age, class만 보기
t[['survived', 'sex', 'age', 'class']]
survived sex age class
0 0 male 22.0 Third
1 1 female 38.0 First
2 1 female 26.0 Third
3 1 female 35.0 First
4 0 male 35.0 Third
.. ... ... ... ...
886 0 male 27.0 Second
887 1 female 19.0 First
888 0 female NaN Third
889 1 male 26.0 First
890 0 male 32.0 Third
'AI > Big data' 카테고리의 다른 글
hyperparameter tuning (0) | 2022.03.19 |
---|---|
Titanic data training (0) | 2022.03.19 |
Pandas DataFrame (0) | 2022.03.18 |
Pandas Series (0) | 2022.03.18 |
Map Reduce vs. Spark (0) | 2022.03.06 |