Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Docker Hub images for 4.1.0 #30919

Closed
3 tasks done
padbk opened this issue Nov 14, 2024 · 19 comments
Closed
3 tasks done

Missing Docker Hub images for 4.1.0 #30919

padbk opened this issue Nov 14, 2024 · 19 comments
Assignees

Comments

@padbk
Copy link
Contributor

padbk commented Nov 14, 2024

Bug description

The Github Actions to upload the 4.1.0 release images to Docker Hub failed.
https://github.com/apache/superset/actions/runs/11826540775

It looks like this was due to the PR being incorrectly labelled.
https://github.com/apache/superset/actions/runs/11826540775/job/32952794218#step:9:41

Please can a new PR be made to ensure the images are pushed to Docker Hub.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@sadpandajoe
Copy link
Member

@padbk unsure why the docker image hasn't published? I see that 4.1.0rc4 was published fine and this is using the same SHA. May need someone with more knowledge of how our Docker pipeline works to take a look at it.

@michael-s-molina @mistercrunch any ideas? Wondering if this is something in the dockerfile in the branch or if it's an issue with the dockerfile in the master branch.

@villebro
Copy link
Member

@sadpandajoe for some reason the release pipeline hasn't triggered correctly when 4.1.0 was released: https://github.com/apache/superset/actions/workflows/docker-release.yml I wonder if something has changed on this workflow since 4.0 branch that causes it to not trigger..

@sadpandajoe
Copy link
Member

@sadpandajoe for some reason the release pipeline hasn't triggered correctly when 4.1.0 was released: https://github.com/apache/superset/actions/workflows/docker-release.yml I wonder if something has changed on this workflow since 4.0 branch that causes it to not trigger..

Yeah really weird that that 4.1.0rc releases aren't there either

@villebro
Copy link
Member

Hmm, it seems that one has been replaced by this workflow: https://github.com/apache/superset/actions/workflows/tag-release.yml But now I'm confused, why is it publishing the RCs correctly, despite them having failed here, but not the actual release? 🤔

@sadpandajoe
Copy link
Member

Potentially I see this issue: #30928. Weird thing is that RC4 is in docker and the official isn't. I can fast follow with a 4.1.1 to address this as I've already cherried a fix and other things in it. Thoughts?

@mistercrunch
Copy link
Member

mistercrunch commented Nov 15, 2024

I looked into it yesterday and seems the docker build are running out of memory/getting killed. I checked to see if that's the case in master too, but seems to be ok on master, so I was wondering if there have been docker optimizations since the 4.1 cut that explain that.

To clarify, the GHA uses the Dockerfile from the release branch to build, meaning if we improved the Dockerfile since then, those improvements wouldn't be in the release. Wondering if we could/should cherry-pick improvements to the Dockerfile/build-process

@mistercrunch
Copy link
Member

mistercrunch commented Nov 15, 2024

Potentially I see this issue: #30928.

to me it really looked like out-of-memory issue, seeing randomly killed jobs in the middle of their runs.

As to why master doesn't run out of mem and 4.1 does, my guess is the particular combo of os-level deps and python-deps installation is lighter on mem on master somehow (?)

@mistercrunch
Copy link
Member

mistercrunch commented Nov 15, 2024

As to what to cherry-pick to test my theory, looking at file change history on the dockerfile seems like a decent start -> https://github.com/apache/superset/commits/master/Dockerfile

Also maybe changes to https://github.com/apache/superset/commits/master/.github/workflows/release.yml

Unclear how cherry-pickable those PRs are though, can't allocate time to help with this today but sharing the pointers.

@villebro
Copy link
Member

What's really confusing is why did 4.1.0rc4 run but not 4.1.0, as they have identical SHAs. So this would have to be some type of intermittent issue that randomly works/doesn't work..

@mistercrunch
Copy link
Member

mistercrunch commented Nov 15, 2024

I feel we've been on the edge of the memory cliff for docker builds, to a point where failures are non-deterministic. Maybe there are some envrionment-related variations, where a worker may have more memory than another (?)

I tried to look for sequential layer builds (docker does parallel layer build when it can), but didn't have success with what I tried. Similar results with trying to reduce parallelization in apt-get installation. Could also look into limiting pip parallelism.

About docker, one easy-ish option could be to run a sequence of build commands, say if layer C depends on A and B, you issue a build --target A first, and then a build --target C

@villebro
Copy link
Member

Ok then this makes sense. As a quick remedy, are we able to boost memory resources? Or are we already maxed out?

@mistercrunch
Copy link
Member

As a quick remedy, are we able to boost memory resources?

Also was a dead end when I tried this. I think GHA is pretty fixed unless you run your own set of workers which seemed like a gigantic rabbit hole. Also we might be running on ASF-infra-hosted-workers here, though I'm unclear on that, and I decided to stay away from anything that requires opening an ASF-infra jira ticket.

@sadpandajoe
Copy link
Member

Created a new 4.1.1 patch to hopefully fix the issues. We can see that the docker files are all created for 4.1.1rc1: https://hub.docker.com/r/apache/superset/tags?name=4.1.1. Once we get the votes, we can go ahead and push that out.

@dpgaspar
Copy link
Member

did a rerun of the action here: https://github.com/apache/superset/actions/runs/11826540775/job/33133371854

seeing this error: Invalid: lock file's luxon@3.4.4 does not satisfy luxon@3.5.0

@sadpandajoe
Copy link
Member

did a rerun of the action here: https://github.com/apache/superset/actions/runs/11826540775/job/33133371854

seeing this error: Invalid: lock file's luxon@3.4.4 does not satisfy luxon@3.5.0

@dpgaspar do we know why this was not a problem during our 4.1.0rc4 cut but it is on the official one? Nothing changed between cuts.

@gpchandran
Copy link

Created a new 4.1.1 patch to hopefully fix the issues. We can see that the docker files are all created for 4.1.1rc1: https://hub.docker.com/r/apache/superset/tags?name=4.1.1. Once we get the votes, we can go ahead and push that out.

Hi @sadpandajoe - can you please help to push this ASAP

@sadpandajoe
Copy link
Member

Created a new 4.1.1 patch to hopefully fix the issues. We can see that the docker files are all created for 4.1.1rc1: https://hub.docker.com/r/apache/superset/tags?name=4.1.1. Once we get the votes, we can go ahead and push that out.

Hi @sadpandajoe - can you please help to push this ASAP

@gpchandran will need 3 binding +1 votes before I can make it official.

@sadpandajoe
Copy link
Member

4.1.1 docker image are up so those can now be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants