Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety Issue with "run_udf"!!! #81

Closed
MichaelBrueggemann opened this issue Jan 18, 2024 · 5 comments
Closed

Safety Issue with "run_udf"!!! #81

MichaelBrueggemann opened this issue Jan 18, 2024 · 5 comments

Comments

@MichaelBrueggemann
Copy link
Contributor

MichaelBrueggemann commented Jan 18, 2024

@PondiB

URGENT ISSUE !!!

In the current implementation, the Process "run_udf" takes an arbitrary string with a user-defined-function. This String isn't checked but rather is directly parsed into an expression by base::parse() and is then directly evaluated with base::eval().

This is a significant safety hazard for the operating system currently running an istance of "openeocubes". With this process ones could not only provide functions as intended by openEO but can also run any R-Code. In the following example i have created an UDF that uses base::system() to execute a shell command by the operating system. With this i was able to create a C-Program and also compile and execute this. This could be an potential entrypoint for Malware and other people with questionable intend (clone a git repo with malware code and execute it).

I recommend implementing some kind of safety, as to restrict the kind of R-functions that could be passed (e.g. forbid the use of system() and other similar functions) or rather remove this functionately entirely, until this issue is resolved.

Example Code:

con = openeo::connect("http://localhost:8000")
openeo::login(user = "user",
              password = "password")
p = openeo::processes()

# just some sample data to retrieve a datacube
xmin = 854523.1986700408
ymin = 6797063.516360302
xmax = 857831.1917130196
ymax = 6799315.301182906

bands = c("B02")

datacube_init1 = p$load_collection(id = "sentinel-s2-l2a-cogs",
                                   spatial_extent = list(west= xmin ,
                                                         south= ymin,
                                                         east= xmax,
                                                         north= ymax),
                                   crs = 3857,
                                   temporal_extent = c("2022-06-01", "2022-06-30"),
                                   bands = bands,
                                   resolution = 30)

  udf = 'function(x){system("cmd /C echo #include ^<stdio.h^> > test.c & echo int main(void){return 10;} >> test.c & gcc -o test.exe test.c & test.exe")
  return(x["B02"])}'


  udf_cube = p$run_udf(data = datacube_init1, udf = udf)

  formats = openeo::list_file_formats()

  result1 = p$save_result(data = udf_cube , format = formats$output$GTiff)

  # save file to local storage
  openeo::compute_result(graph = result1, output_file = "test.tif")

I recommend trying this with a local instance of openeocubes (with startLocal.R) and see the issue for yourself

@MichaelBrueggemann
Copy link
Contributor Author

A very simple solution would be to add this code to run_udf():

            # check, if udf string contains forbidden keywords
            forbidden_keywords = c("system", "Sys.", "processx")

            if (any(sapply(forbidden_keywords, grepl, udf)))
            {
              message("Forbidden keyword used!")
              stop()
            }

@PondiB
Copy link
Owner

PondiB commented Jan 18, 2024

@MichaelBrueggemann , nice suggestion. kindly create a branch of the current repo and update the UDF function then send a pull request.

@PondiB
Copy link
Owner

PondiB commented Jan 24, 2024

@MichaelBrueggemann , Taken care of.

@PondiB PondiB closed this as completed Jan 25, 2024
@goergen95
Copy link

@mikemahoney218 has come up with a nice solution for a similar problem in {rsi} including a whitelist instead of a blacklist which should be much safer. Maybe this can be expandable by users via an option?

https://github.com/Permian-Global-Research/rsi/blob/25e2dd293270767b720db1e46d6f6d49ad3049dd/R/calculate_indices.R#L77-L137

@PondiB
Copy link
Owner

PondiB commented Jan 25, 2024

https://github.com/Permian-Global-Research/rsi/blob/25e2dd293270767b720db1e46d6f6d49ad3049dd/R/calculate_indices.R#L77-L137

@goergen95 , Thanks for pointing out this alternative. I'll refactor with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants