Configuration file schema #137

tdoan2010 · 2022-11-14T12:59:26Z

This file is required by the Processing Broker at startup. By parsing this file, the broker knows:

Should it deploy a queuing system or re-use a running one.
Should it deploy a processor or re-use a running one.
Where to deploy which processor.
How to deploy a processor, via Docker or native.
How to connect to a machine via ssh, with username-password or username-private key.

Use the ocrd_tool.schema.yml as a reference implementation to write this schema.

The text was updated successfully, but these errors were encountered:

joschrew · 2022-11-17T07:44:42Z

My first draft of an example configuration file:


deploy-queue: true
queue-address: localhost:5672
processors:
    ocrd-cis-ocropy-binarize:
        deploy: true
        type: native
        host: some-url.gwdg.de
        port: 5051
        ssh:
            host: 123.123.123.123
            user: cloud
            keyfile: /path/to/a-keyfile-for-cloud
    ocrd-cis-ocropy-binarize:
        deploy: true
        type: docker
        host: some-url.gwdg.de
        port: 5052
        ssh:
            host: 123.123.123.123
            user: cloud
            password: the-password
    ocrd-olena-binarize:
        deploy: true
        type: docker
        host: localhost
        port: 5053
    ocrd-eynollah-segment:
        deploy: false
        host: 123.123.123.123
        port: 5052

possibly open questions:

do I need more information to start a queue than where to do it (I don't think so, but i am not sure)?
is it necessary to ensure (and if so how) that a remote processor can access the queue?
i am currently not sure if more information is needed to start a processor (I don't think so right now)
all Processing-Servers must have access to the workspaces stored on a/the nfs. Should/could that somehow be part of this configuration

tdoan2010 · 2022-11-17T16:29:01Z

Hi, why do you need host and port under each processor?

Your file structure was what I had in mind at the beginning, but then I realized that it's better to group items by host since we tend to deploy multiple processors per host, and the code structure is also easier later (just loop through all hosts and do things). So, I would suggest something like this:

message_queue:
  address: localhost
  port: 5672
  ssh:
    username: cloud
    password: 1234
hosts:
  - localhost:
      address: localhost
      username: cloud
      password: 1234
      deploy_processors:
        - name: ocrd-cis-ocropy-binarize
          number_of_instance: 2
          type: native
        - name: ocrd-olena-binarize
          number_of_instance: 1
          type: docker
  - vm01:
      address: 123.456.789
      username: tdoan
      path_to_privkey: /path/to/file
      deploy_processors:
        - name: ocrd-eynollah-segment
          number_of_instance: 1
          type: native

Regarding your questions:

I don't think so either, but we will see as we develop.
If you can somehow test it, that would be nice, but I don't know how at the moment. For the first step, it would be enough to have them listen on the correct queue.
No, they just need to know the address of the queuing system.
No, each processing server has no idea about workspace or whatever. This information must come to them through the message in the queue.

joschrew · 2022-11-18T07:52:19Z

why do you need host and port under each processor:

My thoughts where that maybe the address for ssh is not the same as for reaching the server. For example for ssh into a server you use the ip and for accessing the processing server a domain/webadress is used. But when I thought about your question, it came clear that it is not necessary to access the processing-server at all (Hollywood principle) and all missing information could be send through the queue.

joschrew · 2022-11-29T12:21:32Z

Suggestion/Discussion: remove path_to_privkey and password from config for ssh login (e.g. hosts.localhost.password and hosts.localhost.path_to_privkey). And message_queue.ssh.password because it can be deployed with docker-sdk, too.

How to replace: ~/.ssh/config is read per default from python-docker-sdk (and can be used with paramiko too) and authentification is completely done with it.

Reason: python-docker-sdk cannot easily (if at all) be used with password or with specifying the path to keyfile: www.github.com/docker/docker-py/issues/2416. When I started to use ~/.ssh/config for that reason for authentification it seemd resonable to me to completely rely on ~/.ssh/config. But maybe I am missing something and providing password or keyfile must be possible. In this case we could think about dropping docker-sdk and completely rely on paramiko and shell commands.

Drawbacks: no login without keyfile(s), which is better anyway in my opinion. And maybe more ...

tdoan2010 · 2022-11-29T14:49:02Z

We should not remove it, since it's a nice feature and it works well with the native deployment.

After checking the source code, the Python Docker SDK uses Paramiko under the hood for ssh connection. I suppose one can easily implement a custom Adapter based on its SSHHTTPAdapter to accept private key and password. I can help you with that when the time come.

At the moment, I would suggest focusing on the native deployment only.

joschrew · 2022-11-29T15:46:16Z

Ok, thanks for your opinion/decision. I will look deeper into the adapter, just tried briefly. And I will change my implementation for the native deployment to use password and path.

tdoan2010 · 2022-12-07T14:12:44Z

@joschrew: there should be a part in the configuration file for the MongoDB as well. I think the Processing Broker should be able to deploy, re-use, and shutdown the database in the same way as it does with the Message Queue and Processing Servers.

tdoan2010 · 2022-12-12T15:13:38Z

@joschrew: after thinking more about it, I think an example configuration file should look like this. The detailed description can be found in this PR OCR-D/spec#222.

message_queue:
  address: localhost
  port: 5672
  ssh:
    username: cloud
    password: 1234
mongo_db:
  address: localhost
  port: 27017
  credentials:
    username: admin
    password: admin
  ssh:
    username: cloud
    password: 1234
hosts:
  - address: localhost
    username: cloud
    password: 1234
    deploy_processors:
      - name: ocrd-cis-ocropy-binarize
        number_of_instance: 2
        type: native
      - name: ocrd-olena-binarize
        number_of_instance: 1
        type: docker

  - address: 134.76.1.1
    username: tdoan
    path_to_privkey: /path/to/file
    deploy_processors:
      - name: ocrd-eynollah-segment
        number_of_instance: 1
        type: native

tdoan2010 · 2022-12-13T16:34:47Z

@joschrew I have created a schema for the configuration file. You can find it in this PR OCR-D/spec#222. Please try using it to validate the file. Basically, and example valid file would look like this:

message_queue:
  address: localhost
  port: 5672
  credentials:
    username: admin
    password: admin
  ssh:
    username: cloud
    path_to_privkey: /path/to/file
mongo_db:
  address: localhost
  port: 27017
  credentials:
    username: admin
    password: admin
  ssh:
    username: cloud
    password: "1234"
hosts:
  - address: localhost
    username: cloud
    password: "1234"
    deploy_processors:
      - name: ocrd-cis-ocropy-binarize
        number_of_instance: 2
        deploy_type: native
      - name: ocrd-olena-binarize
        number_of_instance: 1
        deploy_type: docker

  - address: 134.76.1.1
    username: tdoan
    path_to_privkey: /path/to/file
    deploy_processors:
      - name: ocrd-eynollah-segment
        number_of_instance: 1
        deploy_type: native

joschrew · 2023-04-25T08:58:35Z

Closed because it is available in ocrd_network in core

tdoan2010 added the enhancement New feature or request label Nov 14, 2022

This was referenced Nov 14, 2022

Native deployment management of Processing Broker #138

Closed

Docker deployment management of Processing Broker #140

Open

tdoan2010 assigned joschrew Nov 14, 2022

MehmedGIT mentioned this issue Nov 25, 2022

Message structure #139

Open

joschrew closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration file schema #137

Configuration file schema #137

tdoan2010 commented Nov 14, 2022 •

edited

Loading

joschrew commented Nov 17, 2022 •

edited

Loading

tdoan2010 commented Nov 17, 2022

joschrew commented Nov 18, 2022 •

edited

Loading

joschrew commented Nov 29, 2022

tdoan2010 commented Nov 29, 2022

joschrew commented Nov 29, 2022

tdoan2010 commented Dec 7, 2022

tdoan2010 commented Dec 12, 2022 •

edited

Loading

tdoan2010 commented Dec 13, 2022

joschrew commented Apr 25, 2023

Configuration file schema #137

Configuration file schema #137

Comments

tdoan2010 commented Nov 14, 2022 • edited Loading

joschrew commented Nov 17, 2022 • edited Loading

tdoan2010 commented Nov 17, 2022

joschrew commented Nov 18, 2022 • edited Loading

joschrew commented Nov 29, 2022

tdoan2010 commented Nov 29, 2022

joschrew commented Nov 29, 2022

tdoan2010 commented Dec 7, 2022

tdoan2010 commented Dec 12, 2022 • edited Loading

tdoan2010 commented Dec 13, 2022

joschrew commented Apr 25, 2023

tdoan2010 commented Nov 14, 2022 •

edited

Loading

joschrew commented Nov 17, 2022 •

edited

Loading

joschrew commented Nov 18, 2022 •

edited

Loading

tdoan2010 commented Dec 12, 2022 •

edited

Loading