Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix multi-process reading of detection datasource and accelerate data preprocessing #23

Merged
merged 4 commits into from
Apr 26, 2022
Merged

fix multi-process reading of detection datasource and accelerate data preprocessing #23

merged 4 commits into from
Apr 26, 2022

Conversation

Cathy0908
Copy link
Collaborator

No description provided.

@Cathy0908 Cathy0908 added the bug Something isn't working label Apr 22, 2022
@Cathy0908 Cathy0908 linked an issue Apr 24, 2022 that may be closed by this pull request
@Cathy0908 Cathy0908 added the enhancement New feature or request label Apr 25, 2022
@Cathy0908 Cathy0908 linked an issue Apr 25, 2022 that may be closed by this pull request
@Cathy0908 Cathy0908 changed the title [bugfix]: fix multi-process reading of detection datasource fix multi-process reading of detection datasource and accelerate data preprocessing Apr 25, 2022
Args:
source_item: item of source iterator
classes: classes list
parse_fn: parse pn to parse source_item, only accepts two params: source_item and classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse pn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify to parse function, done!


@abstractmethod
def get_source_iterator():
"""data list iterator, for multi-process read
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more detailed docstr is missing, iterator of which kind format of data

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Args:
source_item: item of source iterator
classes: classes list
parse_fn: parse function to parse source_item, only accepts two params: source_item and classes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe what kind of data should be returned, key_name, value_type

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

logging.warning(
'Something wrong with current sample %s,Try load next sample...'
% result_dict.get('filename', ''))
result_dict = self.get_sample(idx + 1)
Copy link
Collaborator

@wenmengzhou wenmengzhou Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this recursive call may result in index out of bounds when dealing last exmaple with exception, it will try to get the len(data_source)th element

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! and add _max_retry_num to avoid looping all the time

Copy link
Collaborator

@wenmengzhou wenmengzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs str and code snippet should be refined

@Cathy0908 Cathy0908 merged commit c6ad4c7 into alibaba:master Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accelerate data preprocessing it is very slow to scan samples of detection dataset
2 participants