Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 1.54 KB

File metadata and controls

44 lines (29 loc) · 1.54 KB

2.3 Exploratory data analysis

Slides

Notes

Pandas attributes and methods:

  • df[col].unique() - returns a list of unique values in the series
  • df[col].nunique() - returns the number of unique values in the series
  • df.isnull().sum() - returns the number of null values in the dataframe

Matplotlib and seaborn methods:

  • %matplotlib inline - assure that plots are displayed in jupyter notebook's cells
  • sns.histplot() - show the histogram of a series

Numpy methods:

  • np.log1p() - applies log transformation to a variable and adds one to each result.

Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible.

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation