Skip to content

Mz-scripter/Data-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a7c9341 · Oct 18, 2024

History

38 Commits
Aug 1, 2024
Aug 1, 2024
Jul 30, 2024
Aug 1, 2024
Aug 1, 2024
Jul 30, 2024
Jul 31, 2024
Aug 1, 2024
Aug 2, 2024
Aug 2, 2024
Aug 3, 2024
Aug 3, 2024
Aug 4, 2024
Aug 4, 2024
Aug 5, 2024
Aug 5, 2024
Aug 7, 2024
Aug 7, 2024
Aug 7, 2024
Aug 10, 2024
Aug 10, 2024
Oct 18, 2024

Repository files navigation

Data Science Learning Experience with Pandas

Day 1

I gained experience in handling missing data (NaN values), importing data from CSV files, removing unnecessary data, selecting specific columns, identifying maximum values within datasets, and locating corresponding row indices.

Day 2

I have acquired proficiency in utilizing Pandas and Matplotlib to transform raw data into informative visualizations. I employed pandas and matplotlib to generate a visual representation of the most widely used programming languages from 2008 to 2024.

Day 3

I learnt how to:

  • use HTML markdowns in Notebooks
  • combine the groupby() and count() functions to aggregate data
  • use the value_counts() function
  • slice DataFrames using the square bracket notation
  • use the agg() function to run an operation on a particular column
  • rename() columns of DataFrames
  • create a linear chart with two seperate axes to visualize data that have different scales
  • create a scatter plot and Bar chart in Matplotlib
  • work with tables in a relational database by using primary and foreign keys
  • merge() DataFrames along a particular column.

Day 4

I learnt how to:

  • use .describe() to get a snapshot of your data like average, highest and lowest values
  • use .resample() to make a time-series data comparable to another by changing the periodicity.
  • work with matplotlib.dates Locators to better style a timeline (e.g., an axis on a chart).
  • find the number of NaN values with .isna().values.sum()
  • change the resolution of a chart using the figure's dpi
  • create dashed '--' and dotted '-.' lines using linestyles
  • use different kinds of markers (e.g., 'o' or '^') on charts.
  • fine-tune the styling of Matplotlib charts by using limits, labels, linewidth and colours
  • use .grid() to help visually identify seasonality in a time series.

Day 5

I learnt how to:

  • pull a random sample from a DataFrame using .sample()
  • find duplicated entries with .duplicated() and .drop_duplicates()
  • convert string and object data types into numbers with .to_numeric()
  • use plotly to generate pie, donut and bar charts as well as box and scatter plots

Day 6

I learnt how to:

  • create arrays with np.array()
  • generate arrays using .arange(), .random() and .linspace()
  • analyse the shape and dimensions of ndarray
  • slice and subset a ndarray based on its indices
  • do linear algebra like operations with scalars and matrix multiplication
  • use NumPy's broadcasting to make ndarrays shapes compatible
  • manipulate images in the form ndarrays

Day 7

I learnt how to:

  • use nested loops to remove unwanted characters from multiple columns
  • create bubble charts using Seaborn library
  • filter Pandas DataFrame based on multiple conditions using both .loc[] and .query()
  • style Seaborn charts using the pre-built styles and by modifying Matplotlib parameters
  • use floor division to convert years to decades
  • use Seaborn to superimpose a linear regression over our data
  • run regressions with scikit-learn and calculate the coefficients

Day 8

I learnt how to:

  • create a Choropleth to display data on a map
  • create bar charts showing different segments of the data with plotly
  • create Sunburst charts with plotly.
  • use Seaborn's .lmplot() and show best-fit lines across multiple categories using the row, hue, and lowess parameters

Day 9

I learnt how to:

  • use histograms to visualise distributions
  • superimpose histograms on top of each other even when the data series have different lengths
  • use a to smooth out kinks in a histogram and visualise a distribution with a Kernel Density Estimate (KDE)
  • improve a KDE by specifying boundaries on the estimates
  • use scipy and test for statistical significance by looking at p-values
  • highlight different parts of a time series chart in Matplotib
  • add and configure a Legend in Matplotlib
  • NumPy's .where() function to process elements depending on a condition

Day 10

I learnt how to:

  • quickly spot relationships in a dataset using Seaborn's .pairplot()
  • split the data into a training and testing dataset to better evaluate a model's performance
  • run a multivariable regression
  • evaluate that regression-based on the sign of its coefficients
  • analyse and look for patterns in a model's residuals
  • improve a regression model using (a log) data transformation
  • specify your own values for various features and use your model to make a prediction

About

My experience learning data science

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published