42 School is rapidly gaining global recognition as a prestigious educational institution, particularly in Korea, where the competition rate is an astonishing 44 to 1. However, a significant challenge arises during the month-long testing period, as the nature of crucial activities remains shrouded in secrecy, posing a concern for prospective students and their preparation strategies.β
There are 50 number of 42 campus and this education system is originated from 42 Ecole, which is the first campus of 42 school. 42 As schools become increasingly global, cracks will inevitably appear in their operating policies and educational standards.β It is unclear whether the students at 42 School are being evaluated according to the standards of 42 Ecole, which has originality.β Our team was launched to solve this situation using Business Analytics techniques.
3. Compare the selection criteria of 42 Seoul and Ecole 42 to identify the differences in educational priority and operating policies
We completed the data collection process by following these five steps. We created raw data through API calling, merged the crawled data, and deleted unnecessary columns.
By calling /v2/campus/:/campus_id/users, we could separately collect raw data for all users of Seoul 42 Campus and Ecole 42 Campus, and the campus_ids for each are 29 and 1.
To get the raw data, we found the campus IDs of the Seoul and Ecole campuses and retrieved the data through API requests. This is what user raw data looks like.
By calling /v2/users/:user_id/scale_teams/as_corrector and /v2/users/:user_id/scale_teams/as_corrected, we were able to obtain data in json format with items for events in which a user participated as a correcter and correction recipient.
After calling /v2/users/:user_id/scale_teams/as_corrector and as corrected to add the feedback received by one user and the feedback given by that user to another user as independent variables, the number of items is calculated from each response json format. By counting, we were able to extract data.
This is the sample data structure of as_corrected data. By counting item named with corrcected, we've figured out how many evaluations they gave (corrector) and feedback they received(corrected).
3. Levelβ, Group Assignmentsβ, Penaltyβ, Highest La-picsineβ, Final Exam Score: Crawling with Python Code
In 42 School, each user has their own personal page. From there, we could retrieve statistical information about users. So, data is collected through crawling by accessing each user's page.
Levelβ: Overall progress that can be made through assignments, and midterm exams
Group Assignments: Optional group assignments
Penaltyβ: How many times cheated; each time a user get caught, 42 points will be deducted from assignment score
Highest C-picsineβ: In assignments using the C language, the highest level of assignment completed (0~13)
Final Exam Score: as it is.
Files created through API calls and crawling include CSV files and a plain text file recording assignments and exam scores. From the plain text file, I extracted the Highest C Piscine, Final Exam Score, and the Number of Group Assignments. I then summed these scores and divided them by a certain value to derive a level that closely resembles the actual level, and created a CSV file from this data.
Subsequently, we performed an inner join on each CSV file using the 'id' and 'login' information. All dummy data was filtered out based on the level and generation.
This is how the final data looks like. Since the participants whose score is under 42 will automatically be failed, we could know almost ΒΌ student could not be passed in final exam.
This data set is a data set after removing all data points with a Final Exam score of less than 42 points. It can be seen that even if the Final Exam score is 42 points or more, there are many people who fail the final selection, and the ratio is almost equal to the number of people in the PASS.
In conclusion, The original name of our project was 'what is important'. We thought there must be a reason why 42 Academy emphasizes the importance of the learning process and peer learning to la-piscine students over the results of the problem. Therefore, we believed that there would be elements more important than the scores of assignments and exams in this test. The outcome of the project indicates that the importance of peer evaluation is more significant than anything else. Through this project, we were able to quantitatively prove that enjoying knowledge through mutual learning and teaching, rather than the amount of individual study, is the core value that 42 Academy emphasizes.β
More importantly, by comparing and analyzing data from 42 distributed and global schools, we were able to recognize data differences and prioritize important factors through modeling.β This not only guides a participant to learn in the right direction by recognizing his or her passing probability, but also serves as an important tool for quality assurance for the global 42 School management team.β By utilizing this model and collecting and analyzing data from more campuses, the likelihood of running a good program that guarantees consistent quality will increase. β
In the data extraction part, it was disappointing that we couldnβt extract data from other campuses due to the long duration of crawling. We also regretted not being able to extract detailed data like campus popularity polls. Additionally, unlike in Korea, France was passing a considerably larger number of participants. Therefore, despite sampling, the data imbalance made it difficult to find a well-performing model.
- Jeongmin Oh(jeongmino1207@gmail.com), Github Id: jeongmino
- Kangmin Kim (rkdals0203@gmail.com), Github Id: rkdals0203
- Aleksandra Kaniewska (@gmail.com), Github Id: alekann009
- Eonseon Park (pocva6243@gmail.com), Github Id: eonpark