-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"operation not permitted" errors for batch operations #147
Comments
Hm, i'll investigate this now. I typically run with tracee in background to see what syscall is giving the EPERM and go from there. |
I guess, it's time for me to give tracee a spin as well :D |
@kakkoyun Do you have an issue on the parca agent describing this so we can discuss how to reproduce? |
I'm seeing that if the Checking out the documentation for the libbpf function, count is an input and output parameter, so you should be able to print that value to see if anything is being read/deleted before the permission denied error occurs. |
This is an amazing find. I guess I can start from here. Thank you very much. I have the PR and we have a demo setup to run it in a minikube if it helps next time parca-dev/parca-agent#326 I assume you debugged it using tracee, right? I want to add that to our debugging flow if it's case :) I'll update here with my findings. |
@grantseltzer I think we were passing the capacity of the map, and we need to pass the size of the map, (still tying to validate) In any case, from the documentation I understand that even though AP returns error it could be a partial success. And the index of the last successful operation is indicated by the count in/out parameter. Is this also want you understand from it? If it's the case, the current batch APIs don't consider this fact. |
I would have but no I have no tried running parca-agent yet. I do recommend tracee for debugging though! It's easier to use for debugging than strace.
Yes that's what I understand as well (I wrote that documentation btw haha)
Do you mean within libbpfgo? You may be right, i'm not sure if an error would surface if the 'count' output value isn't checked. This should also be elaborated on in the GoDocs for these functions. I will create an issue for improving. |
Thanks 👍 I'll be working on this through-out the week, and I'll update here. |
Hey @grantseltzer, I made it work in a degree. My previous mistake was to pass the capacity of the array as batch size/count. The "somewhat" working version is below. The problem with it that I need to know the actual number of elements in the map before determining the maximum allow batch count. My first question is, is there a neater way to fetch the number of the elements in a BPF map? The second and maybe more important question is concerning this: https://github.com/parca-dev/parca-agent/blob/d44bf3134624064580b269c81621b85857bdd7e4/pkg/profiler/profiler.go#L356 Is there reconciliation lag or implicit behavior between kernel and user-space regarding BPF maps? Without waiting between operations, it'd constantly give Do you have any idea? What's happening here? What am I doing wrong? |
I've also added this PR from what I've gathered from the libbpf docs. #152 |
@grantseltzer Sorry to ping you again, any ideas about this part? |
@kakkoyun Hi, sorry about the delay, I will get back to you on this a little later today! |
Do I understand correctly that you get EPERM if you set the count to max_entries? That it has to be equal to the actual number of entries? I wonder if it's possible to initialize the map with It doesn't seem that there's any API available to get the number of loaded entries.
It's possible that the userspace program is updating the map in a different thread? No where along the line down to the actual BPF system call invocation is there threads being spawned. It's perhaps possible that the BPF syscall returns a value before completing work but I've never heard of that being done. Perhaps there's another way that EPERM can be returned besides the issue of Overall I don't have a very good answer for you :-/ I highly recommend asking both of these questions on the bpf mailing list where you'll get a much better in-depth answer than what I can tell you. Please let me know if you're not familiar with the mailing list and I can guide you in that. |
I'm also curious to see what syscall is actually return the EPERM, did you try running tracee in the background to see? |
Yes, exactly you need to pass a count value that's less than or equal to the number of elements in the BPF map, otherwise it seems like you get a EPERM error. Probably because it tries to read a memory chunk that's not allocated for the map. I'll check if it is possible to initialize a BPF map with zero-values. The implementation on our side would be neater.
That's a bummer. Right now, we are just counting using an iterator. It does the job 🤷
I don't think we are using a different thread knowingly. The Go runtime might be the culprit here. I'll make sure we lock the threads and test if it's the case. Nice pointer. Thanks!
Thanks, your blogpost was helpful to dip my toes into Linux mailing lists. And I'll reach out if it comes to that 😊 |
I haven't done that. I guess I need to do that first. Is there a good place to start with tracee? A tutorial to run it on a minikube cluster maybe? |
We have one for running with vagrant, and a small sheet on installing in kubernetes. I'm not very experienced with k8s but my teammates are, so if you have any issues feel free to ask and i'll tag the appropriate people! |
I have been trying to debug the reason for this, but so far I haven't managed to be successful. Thus, I'm asking for help.
We try to use
GetValuesAndBatch
and it receives "operation not permitted" error.https://github.com/parca-dev/parca-agent/blob/bd9807a3a0e16302b5944d570967ef5a828dfc80/pkg/profiler/profiler.go#L344-L358
I have tried to bump the
rlimits
(usually that's the culprit under this error) but no luck with that eitherhttps://github.com/parca-dev/parca-agent/blob/bd9807a3a0e16302b5944d570967ef5a828dfc80/pkg/profiler/profiler.go#L623-L642
Do you happen to have any pointers or guideline for me to further debug this? Or could this be related to error handling?
The text was updated successfully, but these errors were encountered: