Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python iRODS Client implementation of parallel transfer #236

Closed
wants to merge 6 commits into from

Conversation

d-w-moore
Copy link
Collaborator

@d-w-moore d-w-moore commented Dec 4, 2020

This pull request addresses a foundational issue #232 ( and a smaller one in #233 ) in the Python iRODS-Client (PRC) and then provides a multithreaded data-transfer implementation for the PRC that is similar (and comparable in performance) to the Parallel Transfer Engine integrated into the iRODS core code. The idea is to manage "N" connections to the server -- see iput and iget -N options -- each connection handling 1/N of the work of the (PUT|GET) operation. Because account credentials are available for reuse, this gets around the server having to encrypt the extra channels independent of, for example, the SSL encrypt capability inherent to the native iRODS connection.

@d-w-moore d-w-moore marked this pull request as draft December 4, 2020 03:46
@trel
Copy link
Member

trel commented Dec 5, 2020

getting there... codacy has some opinions.

@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Dec 5, 2020

Yes , going to work on the

  • codacy suggestions
  • tests
  • diverting parallel to single channel under certain conditions
  • updating README

then run the whole test suite and let you know!

@d-w-moore d-w-moore force-pushed the file_desc branch 3 times, most recently from 4affcbb to 2a71778 Compare December 6, 2020 14:24
@d-w-moore d-w-moore marked this pull request as ready for review December 7, 2020 13:15
@d-w-moore
Copy link
Collaborator Author

Tested and ready for merge, unless README is judge to need an update for the parallel transfer. (The demo script and progress bar niceties would be the only possibly compelling reason, as the data_objects manager put/get methods should transparently switch over to parallel mode when sizes are greater than 32MB. )

@trel trel changed the title Python iRods Client implementation of parallel transfer Python iRODS Client implementation of parallel transfer Dec 10, 2020
@d-w-moore d-w-moore force-pushed the file_desc branch 2 times, most recently from d3e95c3 to d5e5c74 Compare December 11, 2020 13:20
PYTHON_2_7.rst Outdated Show resolved Hide resolved
README.rst Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/parallel.py Outdated Show resolved Hide resolved
irods/test/admin_test.py Outdated Show resolved Hide resolved
irods/test/data_obj_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@d-w-moore d-w-moore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider as the environment variable... ENABLE_PARALLEL_TRANSFER_FOR_428.

leads the user/reader to ignore if not specifically interested in 4.2.8.

agreed

irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/parallel.py Outdated Show resolved Hide resolved
irods/parallel.py Outdated Show resolved Hide resolved
irods/parallel.py Outdated Show resolved Hide resolved
irods/pool.py Outdated Show resolved Hide resolved
PTE_notes.rst Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
irods/manager/data_object_manager.py Outdated Show resolved Hide resolved
@d-w-moore d-w-moore force-pushed the file_desc branch 7 times, most recently from cd6a76e to 8a3a892 Compare December 21, 2020 17:28
@d-w-moore
Copy link
Collaborator Author

d-w-moore commented Dec 21, 2020

This is working ideally now, and matches iput and iget for speed of parallelism on 4-2-stable when the environment variable IRODS_VERSION_OVERRIDE is set to "4,2,9". (this won't be necessary after the version bump.) Also, the test suite is passing on 4-2-stable; modifications were needed on a couple of tests, and those are included on this branch too.

For those who wish to benchmark parallel transfer but are running 4.2.8 release of iRODS server, you can select this branch with which to test, but be aware that it cannot be considered as production quality. For one, acPostProcForPut() is called potentially multiple times for a PUT (once per thread of a transfer).

@d-w-moore d-w-moore force-pushed the file_desc branch 4 times, most recently from b8a9064 to 17e6cdc Compare December 27, 2020 00:08
PYTHON_install_caveats.rst Outdated Show resolved Hide resolved
@trel
Copy link
Member

trel commented Jan 22, 2021

there are a couple merge conflicts - after those are cleaned up, we can squash down to clean commits and get every commit to have an issue number.

need to release 0.8.6 before we merge this (for what will be 0.9.0)

@d-w-moore
Copy link
Collaborator Author

I had planned on resolving those but was waiting to do everything at once ( test with 4-2-stable, assuming Alan's changes are pretty much all in.)

@trel
Copy link
Member

trel commented Jan 23, 2021

yeah, these are minor, and i don't expect too much more interaction with the alan server code.

@d-w-moore
Copy link
Collaborator Author

yeah, these are minor, and i don't expect too much more interaction with the alan server code.

ok will rebase and test this weekend

We also add the create argument in the data object open() call which
defaults to old behavior, ie letting it default True means we create
and open a new replica on the default or otherwise targeted
resource (DEST_RESC_NAME_KW if specified) if none already exists.
@trel
Copy link
Member

trel commented Feb 10, 2021

codacy and then we start to squash.

@d-w-moore
Copy link
Collaborator Author

ok going to run tests again also.

f.write(self.write_str1.encode('utf-8'))

## seek and re-read of just written bytes doesn't work in current 4-2-stable
## (see https://github.com/irods/irods/issues/5315)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5315 is now closed... so this can now be enabled i think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , good point - will do that in upcoming testing !

@d-w-moore
Copy link
Collaborator Author

Closing in favor of #262 , which includes PRC parallel transfer.

@d-w-moore d-w-moore closed this May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants