Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matt Harrison - An Introduction to Pandas 2, Polars, and DuckDB | PyData Global 2023 #207

Open
lexarrow opened this issue Mar 8, 2024 · 0 comments

Comments

@lexarrow
Copy link

lexarrow commented Mar 8, 2024

  • Finished timestamps for this video: https://www.youtube.com/watch?v=vy8VrhaYR2M
  • Title: Matt Harrison - An Introduction to Pandas 2, Polars, and DuckDB | PyData Global 2023
  • Timestamps:
    00:00 - General introduction
    03:33 - About Matt
    06:50 - Pandas 2 introduction
    10:08 - Presentation of Pandas 2 main feature no 1, using pyarrow for dtype backend instead of numpy
    12:28 - Presentation of Pandas 2 main feature no 2, copy on write
    13:07 - Start of Pandas 2 with pyarrow example in Jupyter Notebook
    15:16 - Dealing with columns for which pyarrow did not detect dtype by default
    18:28 - Presenting the actions on the dataset implemented with numpy
    19:11 - Inefficiencies of .apply function in pandas
    20:40 - Presenting the actions on the dataset implemented with a vectorized function
    21:38 - Processing time benchmark between the .apply and the vectorized solutions
    24:09 - Audience question: Are there any backwards compatibility issues between Pandas 2 and Pandas 1?
    26:55 - Audience question: Are there any reasons not to use pyarrow?
    27:40 - Audience question: How can I easily migrate to Polars or handle the missing index?
    29:06 - Polars introduction
    36:04 - Start of Polars example in Jupyter Notebook
    36:11 - Audience question: Can Polars run in a distributed way?
    36:34 - Polars example with the eager implementation
    38:30 - Polars eager example - convert column dtypes to dates where auto-detection didn't work
    40:18 - Polars eager example - implementation of the Pandas numpy .apply in Polars
    42:40 - Polars eager example - processing time benchmark
    43:14 - Considerations of Pandas vs Polars speed
    45:16 - Polars example with the lazy implementation
    47:20 - Answer to the question: Can Polars run in a distributed way?
    48:50 - Audience question: Is there an advantage to using Polars over pyspark?
    52:33 - Audience question: Is there an advantage to using Polars over Daft?
    53:50 - Introduction to DuckDB in the context of dataframes and tabular data
    55:48 - DuckDB background and main features
    58:08 - Start of DuckDB example in Jupyter Notebook using SQL
    58:56 - DuckDB how to load data
    1:01:30 - Audience Question: What is a median-sized dataset?
    1:02:20 - DuckDB complicated query example
    1:03:07 - DuckDB Arrow integration
    1:04:48 - Audience Question: Where can I get a copy of temp bill file?
    1:05:26 - Main conclusions and aspects related to switching from Pandas to Polars
    1:09:21 - Audience consideration: The Pandas pyarrow integration is incomplete (ref dt accessor)
    1:11:10 - Audience question: How do you deal with reading variables as strings in DuckDB?
    1:12:16 - Audience question: What tool do you recommend to start learning as a beginner?
    1:12:32 - Presentation of Tabular Tools (API & Scale) chart
    1:16:12 - Answer to the question: What tool do you recommend to start learning as a beginner?
    1:16:43 - Audience question: Will 'Effective Pandas 2' book have the same datasets as 'Effective Pandas' original edition?
    1:18:06 - Audience question about mass renaming variables
    1:19:54 - Which tool to use of the ones presented?
    1:21:48 - Matt contact details and areas of expertise
  • Resources:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant