Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement processing worker in bashlib #1023

Open
kba opened this issue Mar 24, 2023 · 6 comments
Open

Implement processing worker in bashlib #1023

kba opened this issue Mar 24, 2023 · 6 comments

Comments

@kba
Copy link
Member

kba commented Mar 24, 2023

https://github.com/OCR-D/core/pull/974/files#r1140759640

Well, among the recent changes @joschrew introduced is_bashlib_processor. Instead of trying to look inside – which is error-prone, and by OCR-D CLI we might even deal with pure program code (binaries), which would not work at all – I recommend simply adding the Processing Server functionality to bashlib's ocrd__wrap, so bashlib-based processors behave like Pythonic processors. Of course, this would simply delegate to ocrd processing-worker internally.

@kba
Copy link
Member Author

kba commented Mar 24, 2023

Implementing this in bashlib processors will be very difficult. Honestly, at this point, I'm wondering whether we want to / can continue to support bashlib fully or whether it wouldn't be easier to convert ocrd_olena, ocrd_fileformat etc. to python.

@kba kba mentioned this issue Mar 24, 2023
@bertsky
Copy link
Collaborator

bertsky commented Mar 24, 2023

Implementing this in bashlib processors will be very difficult.

This is absolutely not about (re-)implementing the Processing Server. It's merely about delegating to the ocrd processing-worker subcommand, implemented as part of #974, to make it available in bashlib, so I can do e.g. …

ocrd-olena-binarize --queue amqp://admin:admin@localhost:5672 --database mongodb://localhost:27018

…instead of…

ocrd processing-worker ocrd-olena-binarize --queue amqp://admin:admin@localhost:5672 --database mongodb://localhost:27018

The reason is not to save you the few extra characters to type, but to implement the (extended) OCR-D CLI fully within bashlib, so as a user/caller you would not need to know whether a certain processor to call is bashlib or Python – as has happened in https://github.com/OCR-D/core/pull/974/files#r1140759640

@bertsky
Copy link
Collaborator

bertsky commented Mar 24, 2023

I'm wondering whether we want to / can continue to support bashlib fully or whether it wouldn't be easier to convert ocrd_olena, ocrd_fileformat etc. to python.

I've said this before, but think it cannot be overstated: Keeping bashlib is as important as it gets. We want core to be a framework for writing OCR-D compliant processors, which can mean implementing them in Python (if possible), but also merely integrating them via a tiny shell wrapper. The latter must always be possible, otherwise developers will have to reimplement our conventions for Java, C++, Go, whathaveyou.

@kba
Copy link
Member Author

kba commented Mar 26, 2023

The latter must always be possible, otherwise developers will have to reimplement our conventions for Java, C++, Go, whathaveyou.

I'm very fond of bashlib, don't get me wrong. But you could easily reimplement bashlib procesors in python by just delegating the calls of the actual tools to subprocess.run and have the full expressivity and OCR-D/core support of Python.

Be that as it may, I just misunderstood the problem, #1024 is indeed straightforward, no need for mongo/queue CLI in bashlib as I feared.

@bertsky
Copy link
Collaborator

bertsky commented Mar 26, 2023

But you could easily reimplement bashlib procesors in python by just delegating the calls of the actual tools to subprocess.run and have the full expressivity and OCR-D/core support of Python.

You mean provide a "wildcard processor" in Python which can do system calls to external tools – like ocrd_wrap's ocrd-preprocess-image, but not just for image processing, and as a base class to inherit from?

Ok, perhaps avoiding bash would make our life easier (only maintaining Python parts, being able to use the API). I thought for external modules it would help keeping it (no knowledge of Python necessary). Of course, ATM our bashlib-enabled processor boilerplate (page looping, PAGE handling with xmlstarlet) is so huge that it's not feasible for outsiders anyway. (But perhaps this model can help overcome this.)

@bertsky
Copy link
Collaborator

bertsky commented Jun 4, 2023

The original issue was resolved via #1024. So do we want to repurpose (and rename) this for your new idea of providing a Python-only general purpose wrapper doing preconfigured shell calls, @kba?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants