-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Store the Notebook Image Name #1370
Comments
@kywalker-rh / @vconzola Thoughts on if we should denote "images that are old and untracked"? Essentially when we keep up-versioning images, the oldest ones fall off and the table represents them as Unknown -- but we should probably at least keep the name, but would we want to perhaps show "(deprecated)" or something like that beside it? Essentially the image as the name has either been deleted or the tag has moved on and you can't recreate this workbench anymore. Or do you guys just feel we shouldn't do anything and just leave it as-is. On Edit, we won't be able to select the image because it does not exist anymore. |
@andrewballantyne What happens to existing workbenches that are using "old" images? Do they automatically get updated to use the newer version of the image? |
@vconzola Nope, nothing gets automatically updated when it is created by the user; such as Workbenches. The items tracked by the Operator (our OOTB images for instance) are the only ones that get "auto updated". The way we render the name is a live representation to what is on the cluster (eg. the Workbench stores "I use image x, on tag y" then we ask image "x:y", what is your name and render that) -- which when we do the n / n+1 upgrades, eventually falls off the cluster. Thus the name is "Unknown". Anything using it works fine because it's already in the definition of the pod, but you cannot create more with that image; you'd need to use a later one. |
I have never seen image names (really imagestream names) such as jupyter-datascience-notebook getting deleted in the master branch. What has changed though was imagestream tags getting deleted and replaced, e.g. from py3.9 to 2023.1. This is probably what you mean. A main conceptual problem is that you all intuitively refer to images when you really mean imagestream tags (which have images behind them, coming from quai.io specified in tag.from.name). The only time an imagestream can really be deleted is in the case of BYON, importing your own notebook image, which results in an imagestream, which you can then delete. You cannot really expect imagestream or image tag details such as package versions to still be available then. First of all, which version of odh-manifests are you working with? The master branch or a version branch, e.g. v1.6? @vconzola regarding "at least keep the name (in the details tab)": putting imagestreamname:imagestreamtag in the pod env variable JUPYTER_IMAGE will do that. But ONLY as long as the imagestream names are non-changing, i.e. no imagestreams are deleted or renamed in any way.
I agree with Andrew that the name rendering is a live representation of what is on the cluster, i.e. which imagestreams and which tags they have. You have to be careful not to confuse images with imagestream name plus tag. I see what you mean by the name getting deleted ... at least, the tag was recently deleted and changed in the manifests master branch (i.e. the recent tag name changes of imagestreams from v.3.9 to 2023.1 or so) in which case there is no way you can get the infos on the old imagestream tag, because it was simply overwritten. The reason notebooks keep running: they run with the local copy of the image on the openshift node. When the notebook restarts for any reason, let's say notebook server is stopped (replica: 0) and started (replica:1) again, then the image is pulled again due to the container's imagePullPolicy: Always. @vconzola @harshad16 There are ways to keep the container image up to date, a combination of using image change trigger annotation with paused: false and imagestream tag importpolicy: scheduled: true. That way, your container restarts when changes from quai.io ripple though then. @VaishnaviHire wanted to avoid notebook restarts, that'd be either possible by not making Harshad's changes (odh manifests PRwith tag importpolicy scheduled: true) or by setting image change trigger annotation to paused: true after the notebook is initially created. You basically have to keep in mind that weekly image changes at quai.io and long-running notebooks are kind of in conflict. Regarding up-versioning images and the oldest ones falling off: That is currently most apparent when the image behind a recommended imagestream tag gets updated at quai.io, imagestream defs are applied anew to the cluster, and there are somewhere long-running notebooks. In that case, the imagestream (in master referring to the image by tag, e.g. cuda-jupyter-weekly) refers to the latest digest at time of imagestream creation in quai.io. Older created, still-running notebooks refer to old digests at the time they were created. That is fine as long as the digest behind the image is still there in the internal openshift registry. Usually, the last three digests for an image behind an imagestream tag are kept in the internal registry. You really need to formulate your expectations regarding past spawned notebooks in terms of use cases, especially with regards to the fast-changing (in terms of content / digest) images with a weekly tag at quai.io and with regards to the master branch of odh-manifests, which is what your developers are using. Whee the imagestream tag points to an image tag at quay.io, e.g. blabla-weekly, by definition we always get the digest of the image for the time the imagestream was created. When the digest / weekly image at quay.io gets updated, the old digest is not usable anymore. @atheo89 try if at all possible do not change existing imagestream tags in future, as those imagestream names and tags (not image-tags) are used by odh dashboard. My recent proposal PR-800, initially with regards to image-field values in digest format, of storing the initial imgestream name and tag value in the env var JUPYTER_IMAGE might at least help keep the imagestream name and tag persistent, but true, you will loose access to older -weekly image versions as the -weekly tag in external quay.io moves on currently. Pointing to an sha256 digest (the image trigger annotation always does that, even if from.name has a tag), which is also part of my proposal, can alleviate that problem by not pointing to the weekly tag in the internal openshift repo, which always points to the latest digest, but to the digest, which is kept in cache for a while ( last three digests values). (keepTagRevisions) @harshad16 suggesting to use referencePolicy: type: Source for the "recommended" n n-1 imagestream tags is ok because you always softlink requests for the image to the external repo that way. A lot of info, I realize. My main point at the end: do not expect to reliably use long-running notebooks with development images, i.e. those with weekly tag at quai.io. If you want to keep those running long-term, you have to use the possibility to set images to paused: true with the image change trigger annotation and you have to make sure to use the Openshift-internal registry cache, meaning referencePolicy: type: Local for a tag. Also, do not try to handle imagestream image versioning and image-field values with ODH Dashboard logic, let the image change trigger plug-in of Openshift handle that. |
@andrewballantyne Yes, I think that's a good idea. Deprecated even makes sense to me, although I'm open to suggestions if @kaedward thinks there is a better word for an image that is no longer supported. |
@andrewballantyne @kywalker-rh In RHOSAK, we used "Degraded" to refer to an instance that is experiencing a non-terminal error. It sounds like if this image is no longer supported, though, that "deprecated" might be more accurate. So thumbs up from me :) |
Thanks for the UX Input @kywalker-rh & @kaedward let me try to figure out how to make use of it. Okay, so I followed most of what you mentioned @shalberd -- here is my suggestion, let me know if I'm over complicating this.
That way it's better handling the "no reference" as well as the on-going "there is a reference" scenarios. On Edit of the workbench, we just need to make sure we don't "break" -- the user will need to reselect an image anyways. |
I am not sure what that means, it seemed to me so far you had similar logic for looking up imagestream name and tag from the container image field in backend and frontend. My main point there was that it is impossible to get the imagestream tag if ever the container image field value has sha256 digest notation, which would happen if we use the image change trigger plugin. But regarding notebook spec assembly and so on, I see no difference between the jupyter tile backend and dsp frontend logic.
Where do you want to add that annotation? In the pod spec of a notebook or in the notebook metadata annotations part? I assume "workbench create/update" is the assembleNotebook / updateNotebook part, where the notebook CR yaml is constructed. If yes, then there already is a notebook CR metadata annotation called If you want to put the human-readable name into that new notebook annotation, sounds good. My PR 800 with the new mechanism for container image field lookup from imagestream name and tag, independent of the presence of an openshift-internal docker registry, but works with the internal registry if present, ever makes it into production, has imagestreamname:imagestreamtag e.g. The advantage of putting imagestream name and tag e.g. Fully ACK on point 4. |
Hi all, After trying various combinations to reproduce the bug, the sources of truth, in terms of which image and tag were initially used to create a specific notebook, are as follows:
Furthermore, even if you remove the associated tag from the ImageStream, the utilized workbench continue to work as expected (checked also the case to change its status from Running to Stopped and vice versa). However, the field for the notebook image still displays Considering that there are alternatives available to identify the image:tag, is it necessary to include the opendatahub.io/image-display-name or the above sources are enough? If not, let us know where and what could be the format of the |
@atheo89 @harshad16 The problem is in that case, the imagestream tag or even the imagestream name itself changed, possibly even deleted, similar to here (tag name changes) .... there is no way to get the related info when an imagestream yaml was changed. This is purely a matter of fast-changing master-branch or comparable dev changes in my view, not a conceptual problem. Is the imagestream s2i-minimal-notebook with tag 1.2 still present in the namespace? image-registry.openshift-image-registry.svc:5000/redhat-ods-applications/ s2i-minimal-notebook:1.2 is cached, not pruned yet in the internal registry. That explains why the pod container is still running. But I am almost 100% certain that in this scenario, the imagestream tag changed, or the imagestream itself was deleted ... it is a lookup of all these annotations for a given tag ... |
The current tag-based notation is not suitable for on-prem and mirrored environments anyways, because it does not use sha256 notation. In adddition, it also always references the internal openshift registry, even when no internal openshift registry is present. That's why my proposed change in PR-800 But even that PR would not have solved the problem of image metadata getting lost when either the imagestream metadata.name changes or the tags in an imagestream change. |
@shalberd oh, sorry I corrected my comment above, I experimented with 1.2-new tag not 1.2. I tested with that changed tag then I deleted it and checked again, and all the above mentions shows the tag that the image has been created a d it seems that it persists. |
Yeah, in my approach with the image content change trigger annotation on the Notebook CR / StatefulSet CR, you'd have to update that annotation with the changed tag, too. The thing is: dashboard rightfully does not expect imagestream tags to change ... how would it be aware of those changes to tags in an imagestream? As mentioned, even my annotation approach assumes the imagestream tag does not change. Usually, tags get added to an imagestream, not changed or modified. docker.from.name tags and/or digests can change, but tags are supposed to stay in place and unchanged. |
And even if you continuously checked an imagestream tag's status conditions i.e. transitionTime for event ImportSuccess via the odh notebook controller (kubeflow notebook controller does not even have that annotation), even then you'd have to decide what to update the notebook with based on 1.2-new it was created with, now it is 1.2 ... but how would you know that it now needs to refer to new tag 1.2? Might just as well be another tag 1.3 or 2023-stable or whatever in the imagestream tags array and tag 1.2 is not there anymore in the imagestream yaml ... plus an imagestream tag alone says nothing about the underlying image. |
I use a different function to get all the image details, from env JUPYTER_IMAGE containing now the value In the meantime .. I will check my way of handling imagestreams on how the image change trigger reacts to changes in imagestream tag names themselves, meaning mostly, whether the container keeps running. |
I don't think it was addressed in the comments here... but We could theoretically try a fallback operation here and look at the image-stream to try and figure out a display name. Most likely case here is the tag slips off the cluster, not the whole image stream. But in the BYON use-case, you could lose the entire image-stream -- of which all data is lost. We can do this through a few stages and layers. To make this one simple, we could do my |
@harshad16 Yes, I just tried the same thing with my PR-800 image change trigger annotation. First referenced an image with an imagestream tag. Notebook starts up. Then changed that imagestream tag in the imagestream spec to a new value. Also, and that is even better, deleting the whole imagestream (what dashboard calls image) also leads to the notebook still running. Really good for stability. But not more than that. So, even there, no more details on the existing tag in odh dashboard. Makes sense because the spec.tags[x] with all the details is not present for the old tag. I agree with @andrewballantyne that in such a case, setting the image name to Unknown in dashboard image details is a good idea. That does not seem to be the case yet in the classical Jupyter tile, see my picture. I could also imagine just showing the content of imagestream annotation image-display-name, e.g. Jupyter Data Science, there, but not reading in the additional details on packages used and so on in case the imagestream tag name changed. If the imagestream yaml itself is gone and deleted, that's a different story. In that case, Unknown is most definitely warranted. You just won't have the additional info (packages, software and so on) anymore that was associated with the old imagestream tag. image-display-name is not really the image, though ... it is the imagestream info, https://github.com/opendatahub-io/odh-manifests/blob/master/notebook-images/base/jupyter-datascience-notebook-imagestream.yaml#L8 like Jupyter Data Science. @DaoDaoNoCode About the idea looking at imagestream status tag conditions: those conditions are not showing up at any time in my use case at least, I made so many changes to the imagestream defs and tags, there was never any sign of a conditions section in the imagestream status. At least not when not using the internal docker registry of Openshift. To further increase stability, we should change the imagePullPolicy from https://github.com/opendatahub-io/odh-dashboard/blob/main/frontend/src/api/k8s/notebooks.ts#L103 |
if my change in PR-800, putting imagestream name and tag in env var JUPYTER_IMAGE i.e. Same for the proposal of getting lastSelectedImage new from env var JUPYTER_IMAGE ... we'd have to either parse verbatim if there is the new value I opted for that most flexible approach in the part of dashboard that gets imagestream name and tag details in frontend and backend. No more parsing it from container image-field value, as that is in future im sha256 digest format, which does not have imagestream tag in it. I am thinking of integrating proposed changes in my PR-800 soon. We can then "kill multiple birds with one stone". |
@VaishnaviHire @harshad16 @andrewballantyne @LaVLaS on second thought, though, this here is an enhancement for tracking imagestream metadata on the GUI. Let's proceed step by step. |
Full ACK, good idea. We will cover the most cases this way, as you described succinctly and as I was meandering through :-) |
I finally caught up with the thread. With the imagestream tag deletion, the imagestream reference would be pruned from the internal registry with the garbage-collection runs, though as the workbench pod was already running, as the container was started, it would
@andrewballantyne from this comment, I take the action items as, including image-stream annotation Please let us know, if any further assistance is needed. |
@shalberd ack, i think as you described this seems to be a great feature to have and is responding a lot of the concerns around the imagestream. |
@andrewballantyne @harshad16 what is important is that those changes in kubeflow notebook controller are rolled out, the build image tag/digest reference, to odh-manifests in parallel. Once PR-800 is merged and used if dashboard, it only works in conjunction with the newest v1.6 version of kubeflow notebook controller. So, whatever the new release of odh-dashboard including Pr-800 is, it needs to go together with the to-come release of kubeflow-notebook-controller, e.g. v1.6.1-3 |
Eventually, yes. We could think about setting notebook container imagePullPolicy to IfNotPresent, which is a sensible change from Always, but at some point, even the cache on then Openshift nodes will not be there anymore. |
Ack, we will switch to imagePullPolicy IfNotPresent. |
@andrewballantyne there is an imagestream annotation for the imagestream / image called opendatahub.io/notebook-image-name I am not familiar with the BYON code for assembling imagestream, but that could be useful, right? You could add a notebook CR annotation |
@harshad16 @andrewballantyne @lucferbux changed imagePullPolicy to IfNotPresent in Notebook CR assembly. Also in some mock notebooks, except Deployment nvidia-dcgm-mock-exporter file, which uses an image with :latest tag, for which "Always" is needed. Regarding BYON: what if someone were to bring their own notebook image with a latest tag image behind it, and then has an imagestream again, same name and imagestream tag, with a new latest tag: Don't fret, as with PR-800, image-field value of container is in sha256 digest format anyways, that is, latest tag in from.name would change to a new digest and current ":latest" image would be pulled even with "IfNotPresent" |
This is still having friction and I'm bumping the priority so we can get to this sooner vs later. |
Is there an existing issue for this?
Current Behavior
In the table of workbenches, we explain the notebook image in play. This is a hard reference to images we have in play at this point in time.
However, when we keep moving n & n+1 images forward... older notebooks lose that reference and fall into "Unknown" image name. This is not ideal for upgrades and on-going improvements for the Dashboard to cause "loss" of image.
Expected Behavior
Store the image name for when we lose it, it can still represent the image in the table -- but it won't be able to find it in the list of edit.
Maybe UX can let us know if we should identify it in some way or just make it seamless.
Lots of discussions on this ticket, feel free to read to get a better idea of the scenario. Solution is prescribed here.
Steps To Reproduce
See the table as
Unknown
-- this is effectively the effort behind upgrades.Workaround (if any)
No response
What browsers are you seeing the problem on?
No response
Open Data Hub Version
Anything else
No response
The text was updated successfully, but these errors were encountered: