-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathIntroduction.py
76 lines (59 loc) · 2.64 KB
/
Introduction.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import pandas as pd
import plotly.express as px
import streamlit as st
st.set_page_config(
page_title="Streamlit dashboard: Which DS online course to take first?",
page_icon="📚",
layout="centered",
initial_sidebar_state="expanded",
)
st.title("Which DS online course to take first?")
st.markdown(
"""
[data:image/s3,"s3://crabby-images/88823/88823ce0fa7ba4353317aa4e5949b21ed298c57e" alt="Source Code"](https://github.com/antonAce/data-science-courses)
[data:image/s3,"s3://crabby-images/a05f4/a05f4123a26f0edd985c041598ac947aaadbacbb" alt="Kaggle notebook"](https://www.kaggle.com/antonkozyriev/which-ds-online-course-to-take-first)
### Context
Nowadays, online educational platforms provide a vast amount of online courses.
For self-learning beginners in Data Science, sometimes it's hard to choose an online lecture to start.
This EDA intends to answer common questions when choosing a new study like **"Does a paid course provide higher quality than a free one?"** or **"Which platform is the most suitable for beginners?"**,
and also to discover which online platform provides the highest educational quality.
### Acknowledgements
Data was collected via web scraping from popular online platforms:
[Coursera](https://www.coursera.org), [Stepik](https://stepik.org),
[Udemy](https://www.udemy.com), [edX](https://www.edx.org),
[Pluralsight](https://www.pluralsight.com), [Alison](https://alison.com),
[FutureLearn](https://www.futurelearn.com), and [Skillshare](https://www.skillshare.com).
From each platform were queried courses only related to the "Data Science" topic.
"""
)
st.subheader("Data preview")
st.markdown("Structure of the collected data.")
dataframe = pd.read_csv(st.secrets.dataframe.path, index_col=0)
if st.checkbox("Display head only", value=True):
st.dataframe(dataframe.head())
else:
st.dataframe(dataframe)
st.subheader("Missing values")
st.markdown(
"""
Some insights on how many records available per each column. Platforms with *small set* of courses
usually **do not provide** additional information about course, like: enrolled students count, lectures durations, e.t.c.
"""
)
valid_columns = (
dataframe.count().to_frame(name="valid_records").sort_values("valid_records")
)
valid_columns.reset_index(inplace=True)
valid_columns = valid_columns.rename(
columns={"index": "Column", "valid_records": "Valid records count"}
)
fig = px.bar(
valid_columns,
x="Valid records count",
y="Column",
color="Column",
color_discrete_sequence=px.colors.diverging.Spectral,
orientation="h",
)
fig.update_layout(showlegend=False)
st.plotly_chart(fig, use_container_width=True)