From 7ed9ea39ada5aec280785b93769294963d02b508 Mon Sep 17 00:00:00 2001 From: Luis Antonio Obis Aparicio Date: Wed, 20 Dec 2023 10:14:11 -0500 Subject: [PATCH 1/4] working on fsspec docs --- docs-sphinx/basic.rst | 71 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/docs-sphinx/basic.rst b/docs-sphinx/basic.rst index 339d4a5e0..049fabfa7 100644 --- a/docs-sphinx/basic.rst +++ b/docs-sphinx/basic.rst @@ -1251,3 +1251,74 @@ In addition, each TBranch of the TTree can have a different compression setting: {'x': None, 'ny': None, 'y': ZLIB(4)} Changes to the compression setting only affect TBaskets written after the change (with :ref:`uproot.writing.writable.WritableTree.extend`; see above). + +Using fsspec for reading and writing files +-------------------------- + +Since version `5.2.0 `_, uproot supports reading and writing files using `fsspec `_. +This allows you to read and write files from a variety of sources, including cloud storage, HTTP, and more. + +Usage of fsspec as a source is the default behaviour since 5.2.0, but the user is able to manually specify the source by passing a `uproot.source.chunk.Source` class to the `handler` argument of different uproot methods, such as `uproot.open`, `uproot.iterate`, `uproot.concatenate`, etc. + +In general the user should not need to worry about the source, as uproot will automatically choose the best source for the given path. + +In some cases it may provide a performance benefit to manually specify the source, for example when opening a file from a local path, specifying `handler=uproot.source.file.MemmapSource` (instead of the default `handler=uproot.source.fsspec.FSSpecSource`) may reduce the time to open the file at the cost of using more memory. + +Any fsspec protocol should work for reading, while only the protocols supporting writing will work for writing. + +fsspec is a dependency of uproot, but in order to use some protocols, the user may need to install additional dependencies. +For example, in order to open S3 files, the user needs to have `s3fs `_ installed. +When attempting to open a file with a protocol that is not supported, uproot will raise an exception with a helpful message pointing towards the missing dependency. + +For some protocols, such as `s3` or `ssh`, fsspec may need additional options, such as credentials. These can be directly passed as keyword arguments to the uproot function, and will be passed to fsspec. + +reading +~~~~~~~ + +Opening a file via S3: + +.. code-block:: python + + >>> with uproot.open("s3://pivarski-princeton/pythia_ppZee_run17emb.picoDst.root:PicoDst", + >>> anon=True) as f: + >>> ... + +In this case, the `anon=True` option is required by `s3fs `_ to open the file (if aws credentials are not set). + +Opening a file via SSH: + +In order to open a file over SSH, `paramiko `_ needs to be installed (technically any other library that implements the protocol for fsspec would work, such as `sshfs `_ for ssh). + +Some parameters can be directly passed in the url scheme, such as ssh user and host: + +.. code-block:: python + + >>> with uproot.open("ssh://user@host:port/file.root") as f: + >>> ... + +globbing +~~~~~~~~ + +Some protocols support glob expressions, which can be used in the same way they are used in the local filesystem. + +Opening multiple files via globbing over XROOTD: + +.. code-block:: python + + >>> iterator = uproot.iterate("root://host.domain.com/path/to/files/*.root") + +Not all protocols that support reading support globbing, for example, http does not support globbing and will return an empty list of files instead. + +This feature comes directly as a consequence of the fsspec integration, so requests for globbing support should be directed to fsspec or the specific protocol implementation (it may not be technically possible for some protocols). + +writing +~~~~~~~ + +The same syntax used for writing uproot files can be used for writing files over different protocols via fsspec. +Just specify the protocol in the path (`ssh://...`) and any necessary options as keyword arguments. +If the protocol does not support writing, a `NotImplementedError` will be raised. + +local cache +~~~~~~~~~~~ + +fsspec supports caching files locally, which can be useful for repeated access to the same file. It can also be used for remote writing files, to avoid writing to the remote file until the file is closed. Additional information is available `in the fsspec docs `_. From c3b044e0691ce5b8e5ec67e9f3a8bc9877c35b90 Mon Sep 17 00:00:00 2001 From: Luis Antonio Obis Aparicio Date: Mon, 22 Jan 2024 09:04:15 +0100 Subject: [PATCH 2/4] update python version badge --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f80fa318c..ab09575a6 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ [![PyPI version](https://badge.fury.io/py/uproot.svg)](https://pypi.org/project/uproot) [![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/uproot)](https://github.com/conda-forge/uproot-feedstock) -[![Python 3.7‒3.11](https://img.shields.io/badge/python-3.7%E2%80%923.11-blue)](https://www.python.org) +[![Python 3.8‒3.12](https://img.shields.io/badge/python-3.8%E2%80%923.12-blue)](https://www.python.org) [![BSD-3 Clause License](https://img.shields.io/badge/license-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![Continuous integration tests](https://github.com/scikit-hep/uproot5/actions/workflows/build-test.yml/badge.svg)](https://github.com/scikit-hep/uproot5/actions) From 15058b091de2a3dc3c44ce0b1dfb2e1bf3a52908 Mon Sep 17 00:00:00 2001 From: Luis Antonio Obis Aparicio Date: Mon, 22 Jan 2024 09:23:32 +0100 Subject: [PATCH 3/4] expand fsspec docs --- docs-sphinx/basic.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs-sphinx/basic.rst b/docs-sphinx/basic.rst index 049fabfa7..c8a9a1dac 100644 --- a/docs-sphinx/basic.rst +++ b/docs-sphinx/basic.rst @@ -1272,6 +1272,8 @@ When attempting to open a file with a protocol that is not supported, uproot wil For some protocols, such as `s3` or `ssh`, fsspec may need additional options, such as credentials. These can be directly passed as keyword arguments to the uproot function, and will be passed to fsspec. +Keep in mind that there might be different libraries that implement a given fsspec backend. This might lead to errors when using uproot. For example, the fsspec ssh tests assume `paramiko `_ is installed, but another library such as `sshfs `_ might be present instead which also adds ssh support but might behave differently. + reading ~~~~~~~ @@ -1322,3 +1324,19 @@ local cache ~~~~~~~~~~~ fsspec supports caching files locally, which can be useful for repeated access to the same file. It can also be used for remote writing files, to avoid writing to the remote file until the file is closed. Additional information is available `in the fsspec docs `_. + +For example, the following code will download the whole file to a local cache directory: + +.. code-block:: python + + >>> with uproot.open("simplecache::http://host:port/file.root") as f: + >>> ... + +This improves read speed at the cost of waiting for the whole file to download and the increase in disk usage. + +The following fsspec option can be used to specify the cache directory: + +.. code-block:: python + + >>> with uproot.open("simplecache::http://host:port/file.root", simplecache={"cache_storage": cache_path}) as f: + >>> ... From 63430517a4b2c0d62b2cdca4f6fb868ea24b0895 Mon Sep 17 00:00:00 2001 From: Luis Antonio Obis Aparicio Date: Mon, 22 Jan 2024 09:35:46 +0100 Subject: [PATCH 4/4] capitalize --- docs-sphinx/basic.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs-sphinx/basic.rst b/docs-sphinx/basic.rst index c8a9a1dac..74a5fc9dd 100644 --- a/docs-sphinx/basic.rst +++ b/docs-sphinx/basic.rst @@ -1274,7 +1274,7 @@ For some protocols, such as `s3` or `ssh`, fsspec may need additional options, s Keep in mind that there might be different libraries that implement a given fsspec backend. This might lead to errors when using uproot. For example, the fsspec ssh tests assume `paramiko `_ is installed, but another library such as `sshfs `_ might be present instead which also adds ssh support but might behave differently. -reading +Reading ~~~~~~~ Opening a file via S3: @@ -1298,7 +1298,7 @@ Some parameters can be directly passed in the url scheme, such as ssh user and h >>> with uproot.open("ssh://user@host:port/file.root") as f: >>> ... -globbing +File globbing ~~~~~~~~ Some protocols support glob expressions, which can be used in the same way they are used in the local filesystem. @@ -1313,14 +1313,14 @@ Not all protocols that support reading support globbing, for example, http does This feature comes directly as a consequence of the fsspec integration, so requests for globbing support should be directed to fsspec or the specific protocol implementation (it may not be technically possible for some protocols). -writing +Writing ~~~~~~~ The same syntax used for writing uproot files can be used for writing files over different protocols via fsspec. Just specify the protocol in the path (`ssh://...`) and any necessary options as keyword arguments. If the protocol does not support writing, a `NotImplementedError` will be raised. -local cache +Local cache ~~~~~~~~~~~ fsspec supports caching files locally, which can be useful for repeated access to the same file. It can also be used for remote writing files, to avoid writing to the remote file until the file is closed. Additional information is available `in the fsspec docs `_.