PISA dataset is a survey of student's skills and knowledge as they approach the end of compulsory education. It is not a conventional school test. Rather than examining how well students have learned the school curriculum, it looks at how well prepared they are for life beyond school.
Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics, and science representing about 28 million 15-year-olds globally.
I selected some features related to family status, working ethics, Teacher support, student view of the school, and attitudes to failer.
- Feature engineering
- Data Visualization
- Univariate Exploration
- Bivariate Exploration
- Multivariate Exploration
- etc.
- Python
- Pandas, , Numpy
- Matplotlib, Seaborn
- jupyter
- etc.
In the exploration phase, I was already selecting what I will analyze in students. I selected age, gender, family at home, country, father, and mother work status. And some students' behaviors like teacher support, work ethics, feeling lonely, awkward, outsider, and how they are satisfied at school.
I then analyze each feature alone, looking at its shape, distribution, and overall effectiveness in my further analysis, I found gender, the family at home, and family work status were very interesting and I wanted to compare them with students behaviors, also there were very high null values so I tried to be careful when I do feature engineering.
Also, I was interested in siblings and whether this is different in having siblings or not having at all will affect students' work ethics.
then I started to check relations with multiple variables like single parent's (work status) vs the other parents' existence and the gender vs students' behaviors that was very exciting
- Students without parents
- Students without one of the parents
- teacher Support effect on student