Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor crawler.rb to depend on GHA workflows #35

Open
ronaldtse opened this issue Dec 18, 2024 · 7 comments
Open

Refactor crawler.rb to depend on GHA workflows #35

ronaldtse opened this issue Dec 18, 2024 · 7 comments

Comments

@ronaldtse
Copy link

This code can be removed if we can re-use the well-maintained actions-mn workflows managed by @CAMOBAP

https://github.com/relaton/relaton-data-bipm/blob/9affdaf0d71dcc758666fe8a9ab9631f2b36f542/crawler.rb#L20C1-L31C4

@CAMOBAP
Copy link
Contributor

CAMOBAP commented Dec 18, 2024

In my opinion this is something that should be part for RelatonBipm::DataFetcher.fetch (@andrew2net need you opinion on this)

If we decide to go with GHA we will lose the possibility to use the reusable crawler.yml workflow

@ronaldtse
Copy link
Author

We won't re-use this code because it is just about git clone and building Metanorma documents. I'd rather use actions-mn to build the documents and remove this responsibility from Relaton.

@CAMOBAP
Copy link
Contributor

CAMOBAP commented Dec 18, 2024

@ronaldtse where else this code can be reused?

@andrew2net
Copy link
Contributor

We definitely need to move the SI Brochure build somewhere. It takes long time to build it and if it fails it blocks to update docs from other sources.
@CAMOBAP we need run aside this part of the code

Bundler.with_unbundled_env do
fast_fail_system('ls', chdir: 'bipm-si-brochure')
fast_fail_system('bundle update', chdir: 'bipm-si-brochure')
fast_fail_system('bundle exec metanorma site generate --agree-to-terms', chdir: 'bipm-si-brochure')
fast_fail_system('ls', chdir: 'bipm-si-brochure/_site/documents')
end

and make the generated documents available for fetching.

@ronaldtse
Copy link
Author

I agree with @andrew2net . In any case, we have a problem with this in general. Look at how we need to compile many documents at the CalConnect document repository website.

We need to provide a way to handle a publication workflow (published artifacts at stages), offering a Relaton dataset based on a set of Metanorma documents, and additionally some Relaton content (i.e. only bibdata but no document available or not in Metanorma).

@ronaldtse
Copy link
Author

I opened #36 to illustrate what I meant.

@ronaldtse
Copy link
Author

@CAMOBAP the problem is that the building of all the documents takes a long time, and we cannot rebuild the entire document collection just to obtain bibliographic data from them.

We need to be able to publish bibliographic data in a document repository itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants