Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Segmentation messages #71

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ set(msg_files
msg/Detection3D.msg
msg/ObjectHypothesis.msg
msg/ObjectHypothesisWithPose.msg
msg/SemanticClass.msg
msg/SemanticSegmentation.msg
msg/Point2D.msg
msg/Pose2D.msg
msg/VisionInfo.msg
Expand Down
10 changes: 10 additions & 0 deletions msg/SemanticClass.msg
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# A key value pair that maps an integer class_id to a string class label.
# The class_id should be interpreted as the pixel value corresponding to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's trailing whitespace in this line which makes the tests fail. Also both files are missing a newline at the end of the file.

# given class name in a segmentation mask

# Integer value corresponding to the value of pixels belonging
# to a given class in a segmentation mask
uint16 class_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a uint16 when the data field in SemanticSegmentation.msg is only uint8?

Limiting the number of classes to 255 is probably bad, but doubling the message size for the segmentation image by making it a uint16 is equally bad. Anyhow, the two should be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, it's probably better to leave it as uint16. If we use a sensor_msgs/Image as the segmentation image format, we can choose between mono8 and mono16 encodings as appropriate there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't speak to that exactly without his comment, but there can easily be more than 255 class values in segmentation algorithms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, they must be consistent. I will use uint16 for the final version of the PR


# Label corresponding to the class_id
string class_name
21 changes: 21 additions & 0 deletions msg/SemanticSegmentation.msg
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# A message to contain the output of a Semantic Segmentation process

std_msgs/Header header

uint32 height # mask image height, that is, number of rows
uint32 width # mask image width, that is, number of columns

# the bytes of the single-channel image made by
# the segmentation mask
uint8[] data #

# the confidence of the inference of each pixel
# between 0-100%
uint8[] confidence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the full range 0-255 here?

Copy link
Member

@SteveMacenski SteveMacenski Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not standard for the output of AI libraries, they usually give out a probability value. It would be a bit unnatural to convert it to 0-255 for use (e.g. I want to use predictions that are at least 80% confident). I'd argue this should actually be a float (?) from 0-1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a float, but does it really make a difference to have a confidence of say 60% (uint8) or 60.5% (float) knowing that this would increase the space used by this array by 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specially on an array field that is expected to contain a lot of elements. On a 640x360 mask changing this for a float32 would mean an increase of 640x360x3 bytes, almost 700kB per message with respect to uint8

Copy link
Member

@SteveMacenski SteveMacenski Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should make sure to explain these are uints not floats, otherwise people will just put their 0-1 percentages from TF or something into it and it'll always render as 0 since it needs to be multiplied by 100 and then cast to an int (in the comment above)


# an array of SemanticClasses specifying which integer value in the
# segmentation mask corresponds to each semantic class
vision_msgs/SemanticClass[] class_map

# the threshold value used in the segmentation model
float32 threshold