Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaces flask with fastapi monitor #45

Merged
merged 32 commits into from
Feb 14, 2023

Conversation

SvenMarcus
Copy link
Collaborator

Adds first draft for FastAPI ocrd monitor implementation.
There are still several things that are still very WIP.

Settings

Until settings are finalized, I have hardcoded them in the monitor's init.sh. Eventually they should be parsed from existing variables or new variables that can be added to the main .env file.
When reviewing, please check if I got all the paths to workspaces and so on correctly.

I intended to use pydantic's automatic settings management that is able to automatically parse environment variables or an env file into a settings object (see pydantic docs). Unfortunately it doesn't handle environment variables that are named exactly like sub settings classes very well. Therefore I had to prefix my new setting environment variables with OCRD_, since we already had variables starting with CONTROLLER.

Jobs page

I'm not sure I got all the links on the jobs page correctly. Especially, the links to the log files were a bit wonky in my testing, but could be because of all the changes I made during the implementation.

Testing

While I've written some tests for core logic, I'd like to expand tests for the web application a bit more in the future. This will probably involve setting up some dummy workspaces and mocking out the controller.

Documentation

Since the end user will probably only have to deal with setting up some new variables, I haven't added any documentation for that until we finalize all the settings we need.

Copy link
Collaborator

@markusweigelt markusweigelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good implementation! Especially the separation of monitor server and lib I like quite well.
Could only contribute this review for now with my modest Python knowledge.

But I will have a look at the modules you used and then go deeper into the source code.

From my point of view it would make sense to document some functions later on. (As you have already mentioned in the PR description) Maybe we could also use "doctest" for the functionality tests of the modules.

I will check out the source code next week and try to get further into it. Currently I can't give more feedback.

@markusweigelt
Copy link
Collaborator

Here is the summary of the test on Monday. The initial condition was the test with the current version of the base repository and the Manager module on this branch.

  • workspaces are not listet atm
  • request of workflow from inactive jobs results in not found http://localhost:5000/workflows/detail/ocr-workflow-default.sh
  • request of log from inactive jobs results in not found http://localhost:5000/logs/view/ocr-d/data/3
  • linking of dozzle logs from home is missing
  • for tests it is necessary to use Python 3.10 so it would be better to test this over seperate Dockerfile with fixed version to avoid updating or extending the host
  • maybe small documentation of how to run test in Readme.md
  • adjustment of enviroment variables in base repro -> separate PR maybe after current PR is merged
  • documentation of components in base documentation -> separate PR maybe after current PR is merged

@SvenMarcus SvenMarcus force-pushed the feature/monitor-reimplementation branch from ddec95d to dcc340f Compare January 27, 2023 15:35
@bertsky
Copy link
Member

bertsky commented Feb 2, 2023

See SvenMarcus#1

  • workspaces are not listet atm

should be fixed now

  • request of workflow from inactive jobs results in not found http://localhost:5000/workflows/detail/ocr-workflow-default.sh

The problem here is that the workflow files are only in ocrd_manager (e.g. /usr/bin/ocr-workflow-default.sh), not in ocrd_monitor.

Since we want to make all of these editable and perhaps even version-control them, IMO we have to introduce an additional volume, shared by both the Manager and the Monitor.

  • request of log from inactive jobs results in not found http://localhost:5000/logs/view/ocr-d/data/3

should be fixed now

  • linking of dozzle logs from home is missing

should be fixed now

  • maybe small documentation of how to run test in Readme.md
  • adjustment of enviroment variables in base repro -> separate PR maybe after current PR is merged
  • documentation of components in base documentation -> separate PR maybe after current PR is merged

I agree.

Copy link
Member

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results for me (running under SvenMarcus#1): 3 failed, 27 passed, 6 errors

(But I don't understand how I am supposed to set this up within the Compose framework properly. Also, it seems I'd have to include some .env Setting for ocrd_logview now...)

Integration test within ocrd_kitodo works, but I am having serious issues with the OCRD Browser:

  • if I open the same workspace on different tabs, each one steals the session of the other
  • if I open a workspace which fails the browser, e.g. testdata-presentation/mets.xml after make test-presentation due to Support remote images hnesk/browse-ocrd#52, then subsequently no browser will work anymore, anywhere

@SvenMarcus
Copy link
Collaborator Author

With the latest commits, I believe this PR should be ready to be merged now. While the code is not perfect yet, we should be able to further improve it from the main branch.

@SvenMarcus SvenMarcus marked this pull request as ready for review February 10, 2023 15:49
@markusweigelt
Copy link
Collaborator

Thx! 👍🏻

Unfortunately I get a 404 when I try to open the default workflow because the path is assembled incorrectly http://localhost:5000/workflows/detail//usr/bin/ocr-workflow-default.sh

While the code is not perfect yet, we should be able to further improve it from the main branch.

Can you open a issue to see what specifically needs to be changed? This would also have to be taken into account in further implementation.

@bertsky
Copy link
Member

bertsky commented Feb 13, 2023

Unfortunately I get a 404 when I try to open the default workflow because the path is assembled incorrectly http://localhost:5000/workflows/detail//usr/bin/ocr-workflow-default.sh

Like I said above, the reason for this is not in the Monitor (code), but in the way we currently set up the division of labour between containers. The workflows are only in the Manager. We'd first have to define a shared volume.

@bertsky
Copy link
Member

bertsky commented Feb 13, 2023

Can you open a issue to see what specifically needs to be changed?

Here's one: #46

@bertsky
Copy link
Member

bertsky commented Feb 13, 2023

One thing that must be updated before merging though: the ports section of docker-compose.yml!

@markusweigelt
Copy link
Collaborator

Like I said above, the reason for this is not in the Monitor (code), but in the way we currently set up the division of labour between containers. The workflows are only in the Manager. We'd first have to define a shared volume.

Ok I overlooked this. I think the adjustment is divided into OCR-D Manager and Monitor Repo and it is one repo atm. How had we made the workflows visible in the monitor before?

@bertsky
Copy link
Member

bertsky commented Feb 13, 2023

Ok I overlooked this. I think the adjustment is divided into OCR-D Manager and Monitor Repo and it is one repo atm.

That's independent, though. Containers/volumes are one thing, repos another.

How had we made the workflows visible in the monitor before?

We had a copy of the workflow in the process directory.

@markusweigelt
Copy link
Collaborator

markusweigelt commented Feb 13, 2023

Ok I overlooked this. I think the adjustment is divided into OCR-D Manager and Monitor Repo and it is one repo atm.

That's independent, though. Containers/volumes are one thing, repos another.

How had we made the workflows visible in the monitor before?

We had a copy of the workflow in the process directory.

Ah ok then it worked before because our demo data brought the workflow. By default, there is no workflow in the process dir, but only if it is configured in Kitodo.Production.

Then we fix that separately.

@markusweigelt
Copy link
Collaborator

markusweigelt commented Feb 13, 2023

How were we left with the Python test? Should a image or container be created with e.g. DockerfileTest here which contains the dependencies or should the depended code to Python 10 be adjusted?

In my case test do not run through yet.

weigelt@LDV163:~/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor$ nox
nox > Running session mypy-3.9
nox > Missing interpreters will error by default on CI systems.
nox > Session mypy-3.9 skipped: Python interpreter 3.9 not found.
nox > Running session mypy-3.10
nox > Creating virtual environment (virtualenv) using python3.10 in .nox/mypy-3-10
nox > python -m pip install -r requirements.txt
nox > Command python -m pip install -r requirements.txt failed with exit code 1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/__main__.py", line 19, in <module>
    sys.exit(_main())
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/cli/main.py", line 73, in main
    command = create_command(cmd_name, isolated=("--isolated" in cmd_args))
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/commands/__init__.py", line 96, in create_command
    module = importlib.import_module(module_path)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/commands/install.py", line 24, in <module>
    from pip._internal.cli.req_command import RequirementCommand
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/cli/req_command.py", line 15, in <module>
    from pip._internal.index.package_finder import PackageFinder
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/index/package_finder.py", line 21, in <module>
    from pip._internal.index.collector import parse_links
  File "/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_internal/index/collector.py", line 12, in <module>
    from pip._vendor import html5lib, requests
ImportError: cannot import name 'html5lib' from 'pip._vendor' (/home/weigelt/Work/OCR-D/test/ocrd_kitodo/_modules/ocrd_manager/ocrd_monitor/.nox/mypy-3-10/lib/python3.10/site-packages/pip/_vendor/__init__.py)

A small docu would be helpful here to perform the tests (#45 (comment)) so that not everything is assumed or you have to orientate yourself on the source.

From my point of view, we can solve the runnability of the tests later, because initially the Flask application came even without tests.

@SvenMarcus
Copy link
Collaborator Author

How were we left with the Python test? Should a image or container be created with e.g. DockerfileTest here which contains the dependencies or should the depended code to Python 10 be adjusted?

In my case test do not run through yet.

I think this may be a problem with an outdated pip installation on your system:
https://stackoverflow.com/questions/69503329/pip-is-not-working-for-python-3-10-on-ubuntu/69527217#69527217

The noxfile runs mypy and the tests for python versions 3.9, 3.10, and 3.11 and all of them are passing when I run it.

@markusweigelt
Copy link
Collaborator

markusweigelt commented Feb 13, 2023

How were we left with the Python test? Should a image or container be created with e.g. DockerfileTest here which contains the dependencies or should the depended code to Python 10 be adjusted?
In my case test do not run through yet.

I think this may be a problem with an outdated pip installation on your system: https://stackoverflow.com/questions/69503329/pip-is-not-working-for-python-3-10-on-ubuntu/69527217#69527217

Unfortunately this did not helped. Have the latest version and before that the previous bugfix version.

The noxfile runs mypy and the tests for python versions 3.9, 3.10, and 3.11 and all of them are passing when I run it.

My question was also whether we should install all the requirements on the host (then a venv would be great) or whether we should just build an image temporarily and start it as a container.

@markusweigelt
Copy link
Collaborator

markusweigelt commented Feb 13, 2023

my locale side-packages directory has the html5lib. But nox opens virtual environments which contains an older pip version and in the .nox/.../side-packages/ the html5lib is not available.

Which python version an pip version do you use?

@markusweigelt
Copy link
Collaborator

As workaround i can run the test with following command docker run --rm -it -v $(pwd):/src thekevjames/nox:2022.11.21 nox -f src/noxfile.py however a few still fail. Probably it is because they are running in the container and not on the host.

@markusweigelt
Copy link
Collaborator

@SvenMarcus if you still make the discussed documentation adjustment we can merge the branch. 🚀

@bertsky bertsky self-requested a review February 14, 2023 14:59
@markusweigelt markusweigelt changed the title [Draft] Replaces flask with fastapi monitor Replaces flask with fastapi monitor Feb 14, 2023
@markusweigelt markusweigelt merged commit bfb40dc into slub:main Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants