You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Title: Matt Harrison - An Introduction to Pandas 2, Polars, and DuckDB | PyData Global 2023
Timestamps:
00:00 - General introduction
03:33 - About Matt
06:50 - Pandas 2 introduction
10:08 - Presentation of Pandas 2 main feature no 1, using pyarrow for dtype backend instead of numpy
12:28 - Presentation of Pandas 2 main feature no 2, copy on write
13:07 - Start of Pandas 2 with pyarrow example in Jupyter Notebook
15:16 - Dealing with columns for which pyarrow did not detect dtype by default
18:28 - Presenting the actions on the dataset implemented with numpy
19:11 - Inefficiencies of .apply function in pandas
20:40 - Presenting the actions on the dataset implemented with a vectorized function
21:38 - Processing time benchmark between the .apply and the vectorized solutions
24:09 - Audience question: Are there any backwards compatibility issues between Pandas 2 and Pandas 1?
26:55 - Audience question: Are there any reasons not to use pyarrow?
27:40 - Audience question: How can I easily migrate to Polars or handle the missing index?
29:06 - Polars introduction
36:04 - Start of Polars example in Jupyter Notebook
36:11 - Audience question: Can Polars run in a distributed way?
36:34 - Polars example with the eager implementation
38:30 - Polars eager example - convert column dtypes to dates where auto-detection didn't work
40:18 - Polars eager example - implementation of the Pandas numpy .apply in Polars
42:40 - Polars eager example - processing time benchmark
43:14 - Considerations of Pandas vs Polars speed
45:16 - Polars example with the lazy implementation
47:20 - Answer to the question: Can Polars run in a distributed way?
48:50 - Audience question: Is there an advantage to using Polars over pyspark?
52:33 - Audience question: Is there an advantage to using Polars over Daft?
53:50 - Introduction to DuckDB in the context of dataframes and tabular data
55:48 - DuckDB background and main features
58:08 - Start of DuckDB example in Jupyter Notebook using SQL
58:56 - DuckDB how to load data
1:01:30 - Audience Question: What is a median-sized dataset?
1:02:20 - DuckDB complicated query example
1:03:07 - DuckDB Arrow integration
1:04:48 - Audience Question: Where can I get a copy of temp bill file?
1:05:26 - Main conclusions and aspects related to switching from Pandas to Polars
1:09:21 - Audience consideration: The Pandas pyarrow integration is incomplete (ref dt accessor)
1:11:10 - Audience question: How do you deal with reading variables as strings in DuckDB?
1:12:16 - Audience question: What tool do you recommend to start learning as a beginner?
1:12:32 - Presentation of Tabular Tools (API & Scale) chart
1:16:12 - Answer to the question: What tool do you recommend to start learning as a beginner?
1:16:43 - Audience question: Will 'Effective Pandas 2' book have the same datasets as 'Effective Pandas' original edition?
1:18:06 - Audience question about mass renaming variables
1:19:54 - Which tool to use of the ones presented?
1:21:48 - Matt contact details and areas of expertise
00:00 - General introduction
03:33 - About Matt
06:50 - Pandas 2 introduction
10:08 - Presentation of Pandas 2 main feature no 1, using pyarrow for dtype backend instead of numpy
12:28 - Presentation of Pandas 2 main feature no 2, copy on write
13:07 - Start of Pandas 2 with pyarrow example in Jupyter Notebook
15:16 - Dealing with columns for which pyarrow did not detect dtype by default
18:28 - Presenting the actions on the dataset implemented with numpy
19:11 - Inefficiencies of .apply function in pandas
20:40 - Presenting the actions on the dataset implemented with a vectorized function
21:38 - Processing time benchmark between the .apply and the vectorized solutions
24:09 - Audience question: Are there any backwards compatibility issues between Pandas 2 and Pandas 1?
26:55 - Audience question: Are there any reasons not to use pyarrow?
27:40 - Audience question: How can I easily migrate to Polars or handle the missing index?
29:06 - Polars introduction
36:04 - Start of Polars example in Jupyter Notebook
36:11 - Audience question: Can Polars run in a distributed way?
36:34 - Polars example with the eager implementation
38:30 - Polars eager example - convert column dtypes to dates where auto-detection didn't work
40:18 - Polars eager example - implementation of the Pandas numpy .apply in Polars
42:40 - Polars eager example - processing time benchmark
43:14 - Considerations of Pandas vs Polars speed
45:16 - Polars example with the lazy implementation
47:20 - Answer to the question: Can Polars run in a distributed way?
48:50 - Audience question: Is there an advantage to using Polars over pyspark?
52:33 - Audience question: Is there an advantage to using Polars over Daft?
53:50 - Introduction to DuckDB in the context of dataframes and tabular data
55:48 - DuckDB background and main features
58:08 - Start of DuckDB example in Jupyter Notebook using SQL
58:56 - DuckDB how to load data
1:01:30 - Audience Question: What is a median-sized dataset?
1:02:20 - DuckDB complicated query example
1:03:07 - DuckDB Arrow integration
1:04:48 - Audience Question: Where can I get a copy of temp bill file?
1:05:26 - Main conclusions and aspects related to switching from Pandas to Polars
1:09:21 - Audience consideration: The Pandas pyarrow integration is incomplete (ref dt accessor)
1:11:10 - Audience question: How do you deal with reading variables as strings in DuckDB?
1:12:16 - Audience question: What tool do you recommend to start learning as a beginner?
1:12:32 - Presentation of Tabular Tools (API & Scale) chart
1:16:12 - Answer to the question: What tool do you recommend to start learning as a beginner?
1:16:43 - Audience question: Will 'Effective Pandas 2' book have the same datasets as 'Effective Pandas' original edition?
1:18:06 - Audience question about mass renaming variables
1:19:54 - Which tool to use of the ones presented?
1:21:48 - Matt contact details and areas of expertise
The text was updated successfully, but these errors were encountered: