-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functionality enhancements to address lazy loading of chunked data, variable length strings, and other minor bug fixes #68
Open
bnlawrence
wants to merge
121
commits into
jjhelmus:master
Choose a base branch
from
NCAS-CMS:wacasoft
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
121 commits
Select commit
Hold shift + click to select a range
02fca54
Using s3 to get at some real data for testing
df3669a
Getting the address as well as size into the index
16c0e81
With timer
c464be8
Not working yet. Don't reckon I have the arguments to OrthogonalIndex…
afaa4f5
A few more notes in the code so I can come back to it anon.
18bc37c
Woops. Need this.
4b0ac08
First working lazy read (only reads chunks needed for selection)
5356aa0
Woops didnt' commit the real oil
9fe2394
Should now support filtering chunks in the partical chunk loading. H…
dafb3c9
Some additional documentation
53e4ebe
Seems to work, prior to re-integration
9ac0bbd
Moved chunk support into standard API
a88a150
removing playing code
89aafe3
Merge branch 'jjhelmus:master' into issue6
bnlawrence 96dc178
Fixes bug which stops the selection read from actually occurring and …
eb44c15
Hack to avoid reference datatypes in chunk by chunk selections.
51f7cca
Remove obsolete function
1f61d6c
Support for third party access to contiguous data address and size. A…
e6217b5
First cut, fails references and classic, even with new stuff turned off?
67c93e0
This version appears to now support failing over from a memory map to…
a08ee20
First cut, no tests yet
dc00503
Improvements
9ffb5b2
With some failing tests
223a931
Fixed one test
3a256ab
All tests for new functionality pass, but I've broken something old
32d83dd
Now passing all tests
f5f89c5
Checking coverage of get_chunk_info_by_coord(method)
2c8f59c
Missing docstring
013ce62
Cleaning up
400c798
Merge pull request #5 from bnlawrence/h5pyapi
bnlawrence c21ee63
Ok, these were the pull request fixes that I thought I'd merged
f87b9d1
Merge branch 'h5pyapi'
3c3f6d6
Adding Datatype and check_enum_dtype in a minimal manner - closes #8
9994598
Basic support for elements of h5netcdf and what it expects to be able…
c12b5b3
Test support for graceful enum failure
c80ed92
Committing to the dtype returned as a numpy dtype, and the extra h5t …
04bbef6
Test for reference_list
552c463
(New reference list test still broken) H5D has been disconnected from…
e32a1b0
Interim commit so we have something to point to in a discussion aroun…
2d88101
Transition to H5D cached backend is complete, though we still have th…
2678022
Removed obsolete DatasetDataObjects
0078956
Expose package version in code, and separate testing requirements out…
503cb45
Attempt to get b-tree logging in h5d
ac96f46
Cleared a few bugs and misunderstandings which arose from workign wit…
b586db0
Continue to use open file in h5d, closes #18
7f17cc8
Test for true bytes-io testing (needed for h5netcdf test compatability)
fd13670
Deals with filename issues (closes #19) (and deals with another ioby…
32ad75d
Addressing, I think, upstream issue 53, and includes a test case I sh…
48b7b9a
Fix location of files so tests run properly from parent directory.
2233395
Well, I think this is a fix for #23, and it's so complex I'm committi…
1e2c424
Cleaned up issue23 fixes, all tests pass
6da5fda
Test localisation, and a new test for laziness outside a context manager
59e8667
Changes to support out of context variable access as described in #24
34a684a
Removing the pseudo chunking stuff that snuck into the last commit
64827c4
catching up to the main trunk in h5netcdf
20693b9
Starting to sketch out the pseudo chunking
4126e2b
threadsafe data access
davidhassell 8a7e1dc
merge from h5netcdf
davidhassell 87a1980
add deps for mock s3 test
valeriupredoi c7058b6
add mock s3 test
valeriupredoi df81faf
posix & s3
davidhassell 53ff9df
tidy
davidhassell ee0995b
tidy up
davidhassell 43a8e9c
tidy up
davidhassell 7462033
add test reports to gitignore for now
valeriupredoi 0c8ffc5
add conftest
valeriupredoi 8cc2363
minimize conftest
valeriupredoi 3086211
make use of conftest and add minimal test for mock s3 fs
valeriupredoi ed0f117
upgrade actions versions
valeriupredoi 6843567
add flask dep
valeriupredoi 88752d1
restrict to python 3.10
valeriupredoi ddeb0ea
add flask-cors
valeriupredoi 522bf7a
add h5 modules
valeriupredoi 22476e8
mark test as xfailed
valeriupredoi f28c68d
add dosctrings
valeriupredoi 03183b7
Merge pull request #26 from NCAS-CMS/mock_s3fs
bnlawrence 2d312c1
Minor changes following V's S3 testing merge
742faf8
A framework for testing laziness.
6cd74e7
Merge branch 'h5netcdf' into h5netcdf-fh-2
davidhassell 7a13108
add test for threadsafe data access on posix and s3
davidhassell 1b6d670
note on number of threadsafe test iterations
davidhassell 0ec45d7
Test framework for pseudochunking plus starting to migrate test data …
344573b
Ok, we pass the pseudochunking test, but we don't actually do it yet.
f450776
Pseudo chunking in, with test support (and a missing make data file t…
007ac56
Merge branch 'pseudo_chunking' into h5netcdf
bnlawrence 33e21e1
Merge pull request #28 from NCAS-CMS/h5netcdf
bnlawrence 89ebd2f
Merge branch 'h5netcdf' into h5netcdf-fh-2
bnlawrence fee0759
Merge pull request #27 from NCAS-CMS/h5netcdf-fh-2
bnlawrence 13e5e39
Tidy up dependencies for testing
c4a38b9
Minor changes which come from upstream advice on my two pull requests…
49aa794
Suppress reference list warning. It's useless
dba2683
Using context manager for threadsafe test
2ba27dc
no returned memory maps
davidhassell e6b518b
Test for #16 and #29
b1ae323
More versions of the #29 tests
c7e157c
Better .gitignore
40c898b
Giving up on in-memory netcdf tests for #29
8acf067
explicitly close POSIX files
davidhassell 526c642
vlen strings data test case, vanilla version, and version with missin…
0e4a45b
add extra posix test for file closure
davidhassell 298edc7
Merge branch 'h5netcdf' into fix-memmap
davidhassell 1fb9c98
More on h5d and testing. The iter_chunks method is broken and we now …
82dc2a9
Support for pyactivestorage via a bespoke `get_chunk_info_from_chunk_…
838b0a5
better ignore
a633683
The first vlen data test passes with this code
fbdda40
closer to a solution for #29. These tests pass, but we need to deal w…
a20763f
Partially working vlen string support, issues with global heap usage …
e7c465e
Passing all vlen tests for #29, though we are ignoreing the dtype of …
580e3df
Merge pull request #33 from NCAS-CMS/vlen
bnlawrence 0a4c801
Merge remote-tracking branch 'refs/remotes/origin/h5netcdf' into h5ne…
b955b4b
Merge branch 'fix-memmap' into h5netcdf
e40c7d7
Remaining tests for vlen and iterchunks, support for vlen dtypes (clo…
599db7b
dev
davidhassell 4b4fbc3
dev
davidhassell a2cfaeb
dev
davidhassell bd16147
vlen related fixes
davidhassell a50204d
Update pyfive/indexing.py
davidhassell 1f9b2c0
Merge pull request #37 from NCAS-CMS/vlen-dtype
bnlawrence 6c02408
Merge branch 'master' into wacasoft
eed7e99
install only in test mode
valeriupredoi 6255fc0
actual correct name for testing regime
valeriupredoi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,10 @@ | ||
.coverage | ||
.pyc | ||
build | ||
__pycache__/ | ||
*.egg-info | ||
.idea | ||
.DS_Store | ||
test-reports/ | ||
<_io.Bytes*> | ||
tests/__pycache__/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
# Include the license file | ||
include LICENSE.txt | ||
include README.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
import h5py | ||
import pyfive | ||
from pathlib import Path | ||
import time | ||
import s3fs | ||
|
||
S3_URL = 'https://uor-aces-o.s3-ext.jc.rl.ac.uk/' | ||
S3_BUCKET = 'bnl' | ||
|
||
def test_speed(s3=False): | ||
|
||
mypath = Path(__file__).parent | ||
fname1 = 'da193o_25_day__grid_T_198807-198807.nc' | ||
vname1 = 'tos' | ||
p1 = mypath/fname1 | ||
|
||
fname2 = 'ch330a.pc19790301-def-short.nc' | ||
vname2 = 'UM_m01s16i202_vn1106' | ||
p2 = Path.home()/'Repositories/h5netcdf/h5netcdf/tests/'/fname2 | ||
|
||
do_run(p1, fname1, vname1, s3) | ||
|
||
do_run(p2, fname2, vname2, s3) | ||
|
||
|
||
def do_s3(package, fname, vname): | ||
|
||
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'endpoint_url': S3_URL}) | ||
uri = S3_BUCKET + '/' + fname | ||
with fs.open(uri,'rb') as p: | ||
t_opening, t_var, t_calc, t_tot = do_inner(package, p, vname) | ||
|
||
return t_opening, t_var, t_calc, t_tot | ||
|
||
def do_inner(package, p, vname, withdask=False): | ||
h0 = time.time() | ||
pf1 = package.File(p) | ||
h3 = time.time() | ||
t_opening = 1000* (h3-h0) | ||
|
||
h5a = time.time() | ||
vp = pf1[vname] | ||
h5 = time.time() | ||
t_var = 1000* (h5-h5a) | ||
|
||
h6a = time.time() | ||
sh = sum(vp) | ||
h6 = time.time() | ||
t_calc = 1000* (h6-h6a) | ||
|
||
t_tot = t_calc+t_var+t_opening | ||
|
||
pf1.close() | ||
return t_opening, t_var, t_calc, t_tot | ||
|
||
|
||
|
||
def do_run(p, fname, vname, s3): | ||
|
||
if s3: | ||
import s3fs | ||
|
||
|
||
# for posix force this to be a comparison from memory | ||
# by ensuring file is in disk cache and ignore first access | ||
# but we then do an even number of accesses to make sure we are not | ||
# biased by caching. | ||
n = 0 | ||
datanames = ['h_opening','p_opening','h_var','p_var','h_calc','p_calc','h_tot','p_tot'] | ||
results = {x:0.0 for x in datanames} | ||
while n <2: | ||
n+=1 | ||
|
||
if s3: | ||
h_opening, h_var, h_calc, h_tot = do_s3(h5py, fname, vname) | ||
p_opening, p_var, p_calc, p_tot = do_s3(pyfive, fname, vname) | ||
|
||
else: | ||
h_opening, h_var, h_calc, h_tot = do_inner(h5py, p, vname) | ||
p_opening, p_var, p_calc, p_tot = do_inner(pyfive, p, vname) | ||
|
||
if n>1: | ||
for x,r in zip(datanames,[h_opening,p_opening,h_var,p_var,h_calc,p_calc,h_tot,p_tot]): | ||
results[x] += r | ||
|
||
for v in results.values(): | ||
v = v/(n-1) | ||
|
||
|
||
print("File Opening Time Comparison ", fname, f' (ms, S3={s3})') | ||
print(f"h5py: {results['h_opening']:9.6f}") | ||
print(f"pyfive: {results['p_opening']:9.6f}") | ||
|
||
print(f'Variable instantiation for [{vname}]') | ||
print(f"h5py: {results['h_var']:9.6f}") | ||
print(f"pyfive: {results['p_var']:9.6f}") | ||
|
||
print('Access and calculation time for summation') | ||
print(f"h5py: {results['h_calc']:9.6f}") | ||
print(f"pyfive: {results['p_calc']:9.6f}") | ||
|
||
print('Total times') | ||
print(f"h5py: {results['h_tot']:9.6f}") | ||
print(f"pyfive: {results['p_tot']:9.6f}") | ||
|
||
if __name__=="__main__": | ||
test_speed() | ||
test_speed(s3=True) | ||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,13 @@ | ||
""" | ||
pyfive : a pure python HDF5 file reader. | ||
This is the public API exposed by pyfive, | ||
which is a small subset of the H5PY API. | ||
""" | ||
|
||
from .high_level import File | ||
from pyfive.high_level import File, Group, Dataset | ||
from pyfive.h5t import check_enum_dtype, check_string_dtype, check_dtype | ||
from pyfive.h5py import Datatype, Empty | ||
from importlib.metadata import version | ||
|
||
__version__ = '0.5.0.dev' | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If filenames with < in them are generated, I'd like to see them.