Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to create task with custom jobs #5536

Merged
merged 21 commits into from
Jan 13, 2023
Merged

Add a way to create task with custom jobs #5536

merged 21 commits into from
Jan 13, 2023

Conversation

zhiltsov-max
Copy link
Contributor

@zhiltsov-max zhiltsov-max commented Dec 30, 2022

Motivation and context

This PR adds an option to specify file to job mapping explicitly during task creation. This option is incompatible with most other job-related parameters like sorting_method and frame_step.

  • Added a new task creation parameter (job_file_mapping) to set a custom file to job mapping during task creation

How has this been tested?

Unit tests, manually

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

@zhiltsov-max zhiltsov-max changed the title [WIP] Add a way to create task with custom jobs [do not merge] Add a way to create task with custom jobs Dec 30, 2022
@nmanovic nmanovic changed the title [do not merge] Add a way to create task with custom jobs [WIP] Add a way to create task with custom jobs Dec 31, 2022
@zhiltsov-max zhiltsov-max changed the title [WIP] Add a way to create task with custom jobs Add a way to create task with custom jobs Jan 3, 2023
@@ -153,7 +153,7 @@ def __init__(self,
if not source_path:
raise Exception('No image found')

if stop is None:
if not stop:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have you changed the condition? What is the reason to touch the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we import backup, there is the call to reconcile, which chains to the c-tor. The stop_frame is set to 0 after the default value of the DataSerializer here
In the reconcile args, stop_frame=0 makes little sense on its own, but it produces a dataset with only 1 frame.

The possible fixes were:

  • clean input values to the _create_thread function when importing backups
  • replace the input value with None or remove it when importing backups (which contradicts the serializer)
  • change the default value in the serializer (which refers to the model)

I've decided to make it behave as None, because 0 is the default value.

cvat/apps/engine/serializers.py Outdated Show resolved Hide resolved
Comment on lines +367 to +375
class JobFiles(serializers.ListField):
"""
Read JobFileMapping docs for more info.
"""

def __init__(self, *args, **kwargs):
kwargs.setdefault('child', serializers.CharField(allow_blank=False, max_length=1024))
kwargs.setdefault('allow_empty', False)
super().__init__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a distinct class? Seems like it could be reduced to serializers.ListField(child=..., allow_empty=False) and inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be inlined. The class is introduced to encapsulate the logic behind parameters. I'd do it to most of the other complex fields too, but it's quite a big change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class is introduced to encapsulate the logic behind parameters.

What do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for using classes to know the internal structure or construction details of this class.

cvat/apps/engine/task.py Outdated Show resolved Hide resolved
Comment on lines 153 to 161
if start_frame < 0:
raise ValidationError(
f"Failed to create segment: invalid start frame {start_frame}"
)

slogger.glob.info("New segment for task #{}: start_frame = {}, \
stop_frame = {}".format(db_task.id, start_frame, stop_frame))
if stop_frame >= db_task.data.size:
raise ValidationError(
f"Failed to create segment: stop frame {stop_frame} is beyond task size"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can either of these be possible? Shouldn't _get_task_segment_data always return valid segments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should, but it's not a question for this function. Hopefully, they should never fire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're never supposed to fire, shouldn't they be asserts?

Copy link
Contributor Author

@zhiltsov-max zhiltsov-max Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they can be asserts, currently. If new job creation methods appear, the checks may become more actual.

Upd: I've decided to remove the checks, because these conditions are checked in the tests and they are required mostly for development.

cvat/apps/engine/task.py Show resolved Hide resolved
cvat/apps/engine/task.py Outdated Show resolved Hide resolved
@@ -205,6 +205,12 @@ class Data(models.Model):
sorting_method = models.CharField(max_length=15, choices=SortingMethod.choices(), default=SortingMethod.LEXICOGRAPHICAL)
deleted_frames = IntArrayField(store_sorted=True, unique_values=True)

# Avoid storing whole mapping here, its redundant
custom_segments = models.BooleanField(default=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be determined by checking if segment_size is 0 on the task? Maybe we don't need this new field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought that this variable can be excluded. Basically, we'd need to check the segment size and the number of jobs != 1. However, it turned out that in the case of 1 job in the task, the cases are hard to distinguish. I've decided then to add a clear indicator instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have non-zero segment_size when there's no custom mapping? I thought segment_size defaulted to the number of frames.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have non-zero segment_size when there's no custom mapping?

Yes, it represents limitations on the size of a single job. The comment says about this field:

Zero means that there are no limits (default)

@nmanovic
Copy link
Contributor

nmanovic commented Jan 5, 2023

@zhiltsov-max , #4869 (related issue)

if self._db_task.mode == 'annotation':
files: Iterable[models.Image] = self._db_data.images.all().order_by('frame')
segment_files = files[db_segment.start_frame : db_segment.stop_frame + 1]
return {'files': list(frame.path for frame in segment_files)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant information? The backups already have a manifest, which records the frames in order; given that and the segment boundaries, you should be able to reconstruct the list of files in each segment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backups have such manifest only in some cases. Also, it will be simpler to extend this parameter in future if it is saved separately. For instance, if jobs can contain arbitrary files or have overlaps.

@nmanovic
Copy link
Contributor

@zhiltsov-max , could you please resolve conflicts?

@@ -387,11 +474,17 @@ def _create_thread(db_task, data, isBackupRestore=False, isDatasetImport=False):
media = _count_files(data)
media, task_mode = _validate_data(media, manifest_files)

if job_file_mapping is not None and task_mode != 'annotation':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a part of _validate_job_file_mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task mode is unknown prior to the previous line. The _validate_job_file_mapping call can be moved later, but it looks like it's better to fail as soon as possible in case of invalid parameters.

manifest = ImageManifestManager(db_data.get_manifest_path())
manifest.set_index()
# Sort the files
if (isBackupRestore and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it is time to move the code into a separate function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to do it in another PR, because it will simplify life in case of merge conflicts and failing tests.

@nmanovic
Copy link
Contributor

Our conclusion after the internal discussion:

  • Need to remove custom_segments field. (it should be addressed in the PR)
  • Basically, we have a configuration for a task (how to split it on multiple jobs). It can be defined in multiple ways. Right now we store in DB extra information as segment_size, sorting_method, etc. Indeed we should create segments and jobs, put images in the right order into the table.

@nmanovic nmanovic merged commit 31f0578 into develop Jan 13, 2023
@nmanovic nmanovic deleted the zm/custom-jobs branch January 13, 2023 16:24
mikhail-treskin pushed a commit to retailnext/cvat that referenced this pull request Jul 1, 2023
This PR adds an option to specify file to job mapping explicitly during
task creation. This option is incompatible with most other job-related
parameters like `sorting_method` and `frame_step`.

- Added a new task creation parameter (`job_file_mapping`) to set a
custom file to job mapping during task creation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants