Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPA Runtime v1.0 rework #11

Open
10 tasks
marstamm opened this issue Oct 29, 2024 · 12 comments
Open
10 tasks

RPA Runtime v1.0 rework #11

marstamm opened this issue Oct 29, 2024 · 12 comments
Labels
Milestone

Comments

@marstamm
Copy link
Member

marstamm commented Oct 29, 2024

Rewrite the python worker implementation with a Java implementation. This allows us to easier integrate it with other Camunda components and spin them up together, e.g. camunda run.

  • Uses the Java worker: https://docs.camunda.io/docs/apis-tools/java-client/job-worker/
  • Fetches Scripts from Zeebe
    • Caches existing scripts
  • Resolves Secrets from Console API
    • This is optional if the Token has the Secrets scope
  • Uses Camunda RPA libraries
    • Has Camunda RPA Library for setting Variables and Document handling
    • Has Camunda-Namespaced RPA libraries
  • Offers local testing API
    • simple REST endpoint that allows execution of scripts
@marstamm
Copy link
Member Author

Scripts can take vastly different amount of time to finish. We need to allow defining Script timeout on a per-script basis, rather than a generic timeout for the Job. Since we only get the Script after Job activation, we can't set the Timeout in the usual way, e.g. with the topic subscription.

To allow Scripts to define their own Timeouts, we can use https://docs.camunda.io/docs/apis-tools/camunda-api-rest/specifications/update-a-job/

See the sketch below on what requests need to happen between zeebe and the worker to facilitate this.

image

@marstamm marstamm mentioned this issue Nov 13, 2024
4 tasks
@marstamm marstamm added this to the 8.8 milestone Nov 25, 2024
@nikku
Copy link
Member

nikku commented Dec 16, 2024

To consider (mentioned here):

  • Will it still be extensible (i.e. what is the story to pull in additional libraries, extend our run-time)

@marstamm marstamm mentioned this issue Jan 6, 2025
4 tasks
@marstamm
Copy link
Member Author

marstamm commented Jan 14, 2025

I investigated Document handling capabilities today.

Summary

  • A Document in a process instance is represented by a reference. The reference has the following structure:
    {
      "camunda.document.type": "camunda",
      "storeId": "gcp",
      "documentId": "someID",
      "metadata": {}
    }
  • you can use the REST API to create and fetch documents

Problems we need to solve

How do we reference documents in RPA?

  • Opinion: documents should be downloaded before we run the script and replaced by a local file path
  • As a Document can be in a nested variable, we need to decide if
    • Document resolving is implicit, you can use documents as a nested variable. This would mean we need to iterate over all variables, find references and replace them.
    • Document handling is explicit: at modeling time, you provide a map of documents you want to use in the RPA script. This potentially also saves on bandwidth, since we don't download documents that are not needed for the Script.

How do we create new documents

For simplicity, let's allow writing to a top-level variable only (e.g. Set Output Document invoice /path/to/invoice.pdf

How do we update documents

As far as I could find, you can not update documents with the API. So the lifecycle would be to upload a new document and changing the existing variable to the new reference.

Technical considerations

Do we want to handle document upload in the Robot script or the Java worker?

  • I believe we should keep Camunda related funcitonality in the Java Worker.
  • Proposed API: We split output.yml into variables and files. files will contain a map of variable name => file path. The Java worker uploads the file and sets the given variable name to the created doc reference once the script finished

@natanielstrack
Copy link

My suggestion is to go with Explicit.

Implicit seems more complex to implement and bring some "magic" to that can lead to all kind of corner cases bugs (what happens when the file is updated, removed or renamed?).

Explicit allow us to make a simpler and stronger solution. Since there's a mapping for the documents, we can easily extend it in the future to cover advanced document handling cases (versioning, TTL). It also forces the user to be more conscious when considering referencing files.

@natanielstrack
Copy link

For separation of concern, I also believe that document handling should be placed in the Java Worker.

Do we want to handle document upload in the Robot script or the Java worker?

I believe we should keep Camunda related funcitonality in the Java Worker.
Proposed API: We split output.yml into variables and files. files will contain a map of variable name => file path. The Java worker uploads the file and sets the given variable name to the created doc reference once the script finished

@nikku
Copy link
Member

nikku commented Jan 14, 2025

Opinion: documents should be downloaded before we run the script and replaced by a local file path

How would that work for the user? Let's provide an example.

@marstamm
Copy link
Member Author

Let's provide an example.

Scenario:

Process takes customer data from an XLS file and updates it in legacy ERP system.

  1. User uploads the file customer.xlsx file via Start form. The document reference is stored in customerFile
  2. User configures RPA Task to use customerFile (explicit or implicit)
  3. The RPA worker resolves the Doc Reference and downloads the file to /path/to/job/id/customer.xlsx.
  4. In the Robot script, the user can use Open Workbook ${customerFile}. as we resolved the reference, this will be equal to Open Workbook "/path/to/job/id/customer.xlsx".

@nikku
Copy link
Member

nikku commented Jan 15, 2025

Great example! What is missing for me is the mechanism how customerFile is flagged as "a file", and hence converted. Is this:

  • 🅰 Because the user explicitly configures, as additional input / task header that a particular reference should be de-referenced, resulting in the locally available file path? ==> Explicit, but quite a burden on our users, also not part of the RPA script, but part of the task definition.
  • 🅱 Because the run-time automatically unwraps the customerFile variable and exposes it as a file prior to script execution? ==> Nice and simple mechanism, but magic, and unclear it stops "unwrapping" of input.
  • 🍎 Because the run-time will in place (during execution) fetch + unwrap the file, whenever it is "required as a file reference" ==> Magic + may be hard to pull off / implement as an extension to RPA framework.

I'd like to bring another option into play:

  • 🍊 We extend the robot file syntax to explicitly flag variables as "unwrap as file" or "wrap as a file", i.e.:
    • Open Workbook ${file:customerFile}
    • This allows us to still access the meta-data of the file in the robot script, or later on update it conveniently, preserving some of the relevant meta-data:
    • Set Output Document customerFile ${ref:localFilePath}

The beauty of the last approach is two-fold:

  • Users are in control, model a clear intend, no magic involved
  • We can parse the RPA script for the data model/contract, and infer both "where you expect a document" and "where you expect a plain variable", and produce it (ref)

Connectors explicitly model "operations on documents", i.e. creating links, too.


Curious what you think!

@marstamm
Copy link
Member Author

Extending the Syntax beyond Keywords will be difficult, as ${file:customerFile} will resolve to a variable with the same name. To actually do something during variable resolving, we need to change RobotFramework directly. I'd avoid that as we would need to maintain our fork of RobotFramework.

@Poundex brought up another Idea to make Variable handling more explicit. In this example, Camunda.Documents would communicate with the local Java Runtime to resolve document references at Runtime. We can do this without extending syntax by providing custom keyword implementation:

*** Settings ***

Library     Camunda.Documents

*** Tasks ***
Example
    ${localPath}=  Fetch Document    ${customerFile}
    
    ## or set as explicit path:
    # Fetch Document    ${customerFile}     "customers.xls"

    Open Workbook   ${localPath}
    
    ## Do some more work

    # Upload generated Invoice
    ${fileDescriptor}=   Upload Document      "./invoice.pdf"
    Set Output Variable   "invoice"     ${fileDescriptor}

I like this approach, as we

  • Can use nested variables with document references
  • Users don't need to explicitly add a list of documents to fetch at modeling time
  • we only fetch files we need
  • Resolving and creation is explicit

@Poundex
Copy link

Poundex commented Jan 15, 2025

I think file actions should be explicit (in the Robot Script) and be fulfilled JIT by the script execution, not in advance by either scraping the script/diagram or requiring the user to provide (and manually maintain) manifests of files that will be used within the script.
By taking this approach and providing RF keywords for file handling, this means that the use of these files with other keywords does not have to change in any way.
So for instance, if we had the two keywords Fetch Zeebe File and Upload Zeebe File, then they could be used as follows:

${LOCAL_PATH_TO_FILE}    Fetch Zeebe File    "the_input_file.xlsx"
Open Workbook    ${LOCAL_PATH_TO_FILE}
# ... rest of steps...
Upload Zeebe File    "the_output_file"    "the_output_file.xlsx"

Because the keyword implementation has downloaded the script, then library keywords (such as Open Workbook) work exactly the same way as they always do, operating on a local file. Users do not need to second-guess these as nothing is happening behind the scenes, they are working as documented

@nikku
Copy link
Member

nikku commented Jan 15, 2025

Thanks folks for chiming in. I like the direction that I see, too, as it checks the boxes I'd love a solution to check. ⭐

@nikku
Copy link
Member

nikku commented Jan 15, 2025

    ${fileDescriptor}=   Upload Document      "./invoice.pdf"
    Set Output Variable   "invoice"     ${fileDescriptor}

Syntactic suggar for that could be

    Upload File "invoice" "invoice.pdf"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants