Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a crash caused by malformed data #4850

Closed
wants to merge 1 commit into from

Conversation

bmribler
Copy link
Contributor

This PR

  • fixes an abort of the user's program (issue allocation-size-too-big error in H5Centry.c #4435) because the malformed data caused a very large size of memory to be allocated.
  • removes memory leaks causing the free-list code to abort in addition to the reported issue.

@fortnern provided helpful code to the fix.

This PR
- fixes an abort of the user's program (issue HDFGroupGH-4435) because the
  malformed data caused a very large size of memory to be allocated.
- removes memory leaks causing the free-list code to abort in addition
  to the reported issue.

NF provided helpful code to the fix.
@mattjala
Copy link
Contributor

Can we add a reproduction script to the tests?

@@ -594,8 +594,11 @@ H5C__flush_single_entry(H5F_t *f, H5C_cache_entry_t *entry_ptr, unsigned flags)
else
mem_type = entry_ptr->type->mem_type;

if (H5F_block_write(f, mem_type, entry_ptr->addr, entry_ptr->size, entry_ptr->image_ptr) < 0)
if (H5F_block_write(f, mem_type, entry_ptr->addr, entry_ptr->size, entry_ptr->image_ptr) < 0) {
H5C__REMOVE_ENTRY_FROM_SLIST(cache_ptr, entry_ptr, during_flush, FAIL);
Copy link
Collaborator

@jhendersonHDF jhendersonHDF Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like something that should be handled by the caller rather than the function itself, as the caller should determine whether or not to remove the entry from the list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely other things to be considered here, such as:

  • the entry may also need to be removed from the collective list when operating in parallel
  • the entry may also need to go through the rest of the destroy sequence even on write failure, unless it should also be left up to the caller to handle that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - how is this change to writing cache entries related to the crash on reading?

Copy link
Contributor Author

@bmribler bmribler Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crash happened during the shutdown process. When some memory was not freed, causing failure because the factory list still has allocated objects, and processing the failure crashed:
H5Eint.c:1464: H5E__push_stack: Assertion `cls_id > 0' failed.

This shows how the writing cache entries got involved:
...
#13: ../../src/H5Fint.c line 1538 in H5F__dest(): problems closing file
major: File accessibility
minor: Unable to release object
#14: ../../src/H5AC.c line 477 in H5AC_dest(): can't destroy cache
major: Object cache
minor: Unable to free object
#15: ../../src/H5C.c line 546 in H5C_dest(): disable slist on flush dest failure failed
major: Object cache
minor: Internal error detected
#16: ../../src/H5C.c line 1115 in H5C_set_slist_enabled(): slist not empty?
major: Object cache
minor: Internal error detected
#17: ../../src/H5C.c line 502 in H5C_dest(): unable to flush cache
major: Object cache
minor: Unable to flush data from cache
#18: ../../src/H5Cint.c line 1052 in H5C__flush_invalidate_cache(): flush invalidate ring failed
major: Object cache
minor: Unable to flush data from cache
#19: ../../src/H5Cint.c line 1349 in H5C__flush_invalidate_ring(): dirty entry flush destroy failed
major: Object cache
minor: Unable to flush data from cache
#20: ../../src/H5Centry.c line 598 in H5C__flush_single_entry(): Can't write image to file
major: Object cache
minor: Unable to flush data from cache
#21: ../../src/H5Fio.c line 219 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#22: ../../src/H5PB.c line 1001 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#23: ../../src/H5Faccum.c line 633 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#24: ../../src/H5FDint.c line 312 in H5FD_write(): addr overflow, addr = 4294967295, size=16, eoa=2848
major: Invalid arguments to routine
minor: Address overflowed
H5Fclose failed

I'd appreciate any suggestions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are likely other things to be considered here, such as:

  • the entry may also need to be removed from the collective list when operating in parallel
  • the entry may also need to go through the rest of the destroy sequence even on write failure, unless it should also be left up to the caller to handle that

I'm looking into these suggestions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my suggestion is to find out what is happening when the read occurs that causes memory to be corrupted (which is almost certainly why the eventual crash occurs).

Copy link
Contributor Author

@bmribler bmribler Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corrupted data was acquired in H5HL__hdr_deserialize() at:

H5_DECODE_LENGTH_LEN(image, heap->dblk_size, udata->sizeof_size);
H5_DECODE_LENGTH_LEN(image, heap->free_block, udata->sizeof_size);

during the call to H5Gcreate1(). Both heap->dblk_size and heap->free_block are huge. I put in the fix (NF's help) to catch that but during the shutdown process, memory leaks caused subsequent crashes, hence, the need to close those.

Although, at this point, I just realized I missed something in the PR, let me take care of that while you're thinking. :-)

@jhendersonHDF jhendersonHDF added Merge - To 1.14 Priority - 1. High 🔼 These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Type - Bug / Bugfix Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub labels Sep 18, 2024
Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that you've reached the root cause for this failure?

@qkoziol
Copy link
Contributor

qkoziol commented Sep 18, 2024

Are you certain that you've reached the root cause for this failure?

After reviewing the example code in the bug report, it definitely seems like something that should be detected at read time.

@bmribler bmribler marked this pull request as draft September 19, 2024 16:23
@bmribler
Copy link
Contributor Author

I want to take more time to do this right.

@bmribler
Copy link
Contributor Author

Can we add a reproduction script to the tests?

The bad file will be added to the cve_hdf5 repo.

@bmribler
Copy link
Contributor Author

This is outdated now, I'll start a fresh PR when I'm ready.

@bmribler bmribler closed this Sep 26, 2024
@bmribler bmribler deleted the fix_GH_4435 branch October 30, 2024 18:28
@bmribler bmribler restored the fix_GH_4435 branch November 3, 2024 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Priority - 1. High 🔼 These are important issues that should be resolved in the next release Type - Bug / Bugfix Please report security issues to help@hdfgroup.org instead of creating an issue on GitHub
Projects
Status: Merges Complete
Development

Successfully merging this pull request may close these issues.

4 participants