|
| 1 | +# PDEP-15: Do not require PyArrow as a required dependency (for pandas 3.0) |
| 2 | + |
| 3 | +- Created: 8 May 2024 |
| 4 | +- Status: Under Discussion |
| 5 | +- Discussion: [#58623](https://github.com/pandas-dev/pandas/pull/58623) |
| 6 | + [#52711](https://github.com/pandas-dev/pandas/pull/52711) |
| 7 | + [#52509](https://github.com/pandas-dev/pandas/issues/52509) |
| 8 | + [#54466](https://github.com/pandas-dev/pandas/issues/54466) |
| 9 | +- Author: [Thomas Li](https://github.com/lithomas1) |
| 10 | +- Revision: 1 |
| 11 | + |
| 12 | +## Abstract |
| 13 | + |
| 14 | +This PDEP was supersedes PDEP-10, which stipulated that PyArrow should become a required dependency |
| 15 | +for pandas 3.0. After reviewing feedback posted |
| 16 | +on the feedback issue [#54466](https://github.com/pandas-dev/pandas/issues/54466), we, the members of |
| 17 | +the core team, have decided against moving forward with this PDEP for pandas 3.0. |
| 18 | + |
| 19 | +The primary reasons for rejecting this PDEP are twofold: |
| 20 | + |
| 21 | +1) Requiring pyarrow as a dependency causes installation problems. |
| 22 | + - Pyarrow does not fit or has a hard time fitting in space-constrained environments |
| 23 | +such as AWS Lambda and WASM, due to its large size of around ~40 MB for a compiled wheel |
| 24 | +(which is larger than pandas' own wheel sizes) |
| 25 | + - Installation of pyarrow is not possible on some platforms. We provide support for some |
| 26 | +less widely used platforms such as Alpine Linux (and there is third party support for pandas in |
| 27 | +pyodide, a WASM distribution of pandas), both of which pyarrow does not provide wheels for. |
| 28 | + |
| 29 | + While both of these reasons are mentioned in the drawbacks section of this PDEP, at the time of the writing |
| 30 | +of the PDEP, we underestimated the impact this would have on users, and also downstream developers. |
| 31 | + |
| 32 | +2) Many of the benefits presented in this PDEP can be materialized even with payrrow as an optional dependency. |
| 33 | + |
| 34 | + For example, as detailed in PDEP-14, it is possible to create a new string data type with the same semantics |
| 35 | + as our current default object string data type, but that allows users to experience faster performance and memory savings |
| 36 | + compared to the object strings (if pyarrow is installed). |
| 37 | + |
| 38 | +While we've decided to not move forward with requiring pyarrow in pandas 3.0, the rejection of this PDEP |
| 39 | +does not mean that we are abandoning pyarrow support and integration in pandas. We, as the core team, still believe |
| 40 | +that adopting support for pyarrow arrays and data types in more of pandas will lead to greater interoperability with the |
| 41 | +ecosystem and better performance for users. Furthermore, a lot of the drawbacks, such as the large installation size of pyarrow |
| 42 | +and the lack of support for certain platforms, can be solved, and potential solutions have been proposed for them, allowing us |
| 43 | +to potentially revisit this decision in the future. |
| 44 | + |
| 45 | +However, at this point in time, it is clear that we are not ready to require pyarrow |
| 46 | +as a dependency in pandas. |
0 commit comments