Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_s3: daemon thread stability refactor rebased on top of AWS distro 2.31.10 #24

Open
wants to merge 3 commits into
base: 2_31_10_all_cherry_picks
Choose a base branch
from

Conversation

PettitWesley
Copy link
Owner

…g line


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

…g line

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
@PettitWesley
Copy link
Owner Author

@mathewfala the diff in the PR on upstream is misleading since our distro has diverged. This is the S3 changes that I'm testing/reviewing on top of our 2.31.10 all cherry picks.

@PettitWesley PettitWesley changed the title out_s3: daemon thread stability refactor all in one commit May 2nd ta… out_s3: daemon thread stability refactor rebased on top of AWS distro 2.31.10 May 2, 2023
/* skip metadata stream */
if (fs_stream == ctx->stream_metadata) {
continue;
}

/* on startup, we only send old chunks in this routine */
if (is_startup == FLB_TRUE && fs_stream == ctx->stream_active) {
flb_info("put_all_chunks: stream_active has %d chunks", mk_list_size(&fs_stream->files));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to remove log for testing purposes

return -1;
}

/* data was sent successfully- delete the local buffer */
if (chunk->locked == FLB_TRUE) {
/* remove from upload_queue */
if (chunk->_head.next != NULL && chunk->_head.prev != NULL) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a mk_list function for this check

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
…clay's PR

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
@PettitWesley PettitWesley temporarily deployed to pr May 3, 2023 05:17 — with GitHub Actions Inactive
@PettitWesley PettitWesley temporarily deployed to pr May 3, 2023 05:17 — with GitHub Actions Inactive
@PettitWesley PettitWesley temporarily deployed to pr May 3, 2023 05:17 — with GitHub Actions Inactive
@PettitWesley PettitWesley temporarily deployed to pr May 3, 2023 05:33 — with GitHub Actions Inactive
* Puts an event on the event
* loop which will wake this coro back up
*/
flb_time_sleep(ctx->timer_ms);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the final version I think I may need to set this to be some static number that's always less than the Grace setting. Some fraction of it. Otherwise the sleep here can block shutdown until it resumes and eat up time that could be used to send data on shutdown.

if (ctx->compression == FLB_AWS_COMPRESS_GZIP) {
flb_free(payload_buf);
}
if (ret < 0) {
/* re-add chunk to list */
if (chunk) {
s3_store_file_unlock(chunk);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got re-added by my rebase and needs to be removed otherwise it will crash...

@@ -1114,29 +1149,41 @@ static int upload_data(struct flb_s3 *ctx, struct s3_file *chunk,
if (ctx->compression == FLB_AWS_COMPRESS_GZIP) {
flb_free(payload_buf);
}
m_upload->upload_errors += 1;
/* re-add chunk to list */
if (chunk) {
s3_store_file_unlock(chunk);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unlock from s3_store.c, its not needed and shouldn't be a function anymore

@@ -1245,10 +1285,41 @@ static int put_all_chunks(struct flb_s3 *ctx)
if (ret < 0) {
s3_store_file_unlock(chunk);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -981,11 +1018,23 @@ static int upload_data(struct flb_s3 *ctx, struct s3_file *chunk,
int size_check = FLB_FALSE;
int part_num_check = FLB_FALSE;
int timeout_check = FLB_FALSE;
time_t create_time;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment above is wrong:

 * return value is one of 0, -1, -2.	
  • 0 means uploaded successfully.
  • -1 means failed to upload data, will retry.
  • -2 means failed to upload data, but can't retry because reach the retry_limit.

The may13th one commit branch has most up to date changes that need to be sync'd here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return Value is wrong here too. Another issue from rebase.

Copy link
Owner Author

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to sync all changes here from branch: s3-refactor-aws-distro-may13

There were more issues from rebasing which I found when I ran the file here through a diff against the one on upstream github (rebased on top of a different set of commits).

@github-actions
Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

Copy link

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant