Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google provider requires leveldb installed in system #15770

Closed
dstandish opened this issue May 10, 2021 · 18 comments
Closed

google provider requires leveldb installed in system #15770

dstandish opened this issue May 10, 2021 · 18 comments
Labels
kind:bug This is a clearly a bug

Comments

@dstandish
Copy link
Contributor

dstandish commented May 10, 2021

google provider now requires plyvel, which is used by leveldb hook.

to install plyvel, the user needs leveldb headers installed in the system, otherwise it will fail:

  plyvel/_plyvel.cpp:632:10: fatal error: 'leveldb/db.h' file not found
  #include "leveldb/db.h"
           ^~~~~~~~~~~~~~
  1 error generated.
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for plyvel

the average GCP user may not use leveldb and it seems a little burdernsome to require its installation just to use GCP hooks, when most of them don't need this library.

perhaps there is a way to make this dependency optional?

or perhaps is there a way to install only certain components of a provider?

update

after looking more closely, i see that levelDB isn't a GCP service

it seems to me levelDB might belong as a distinct provider, e.g. google-gcp vs google-level like there is currently with apache and microsoft? @potiuk any thoughts?

cc: @mik-laj
related: #14105

@dstandish dstandish added the kind:bug This is a clearly a bug label May 10, 2021
@dstandish dstandish changed the title google provider requires leveldb in host google provider requires leveldb installed in system May 10, 2021
@potiuk
Copy link
Member

potiuk commented May 11, 2021

I think it's a good idea. We already have the possibility of adding optional dependencies in providers, so it should be rather easy thing to do maybe you would like to add it @dstandish ?
See:

And

apache.beam: apache-beam[gcp]

It will be just a matter of adding "plyvel" additional extra and moving plyvel out of the 'google' extra dependencies (+ some documentation)..

@uranusjr
Copy link
Member

Doesn’t plyvel publish manylinux wheels and does not require compilation on most system?

@potiuk
Copy link
Member

potiuk commented May 11, 2021

Doesn’t plyvel publish manylinux wheels and does not require compilation on most system?

They do - but it does not work for MacOS.

@uranusjr
Copy link
Member

But is macOS a supported production system? I’ve thought it’s only supported as a development platform…

Either way, it’s a good idea to split plyvel out anyway. Not every Linux is manylinux and there might be some people having problems with this e.g. on Alpine.

@potiuk
Copy link
Member

potiuk commented May 11, 2021

It's mostly for development convenience. Plyvel is really such a niche thing that it's worth making it optional if only for the convenience of developers who want to develop GCP provider..

@dstandish
Copy link
Contributor Author

dstandish commented May 11, 2021

Plyvel is really such a niche thing that it's worth making it optional if only for the convenience of developers who want to develop GCP provider..

Keep in mind this is not just about airflow developers

This is about any dag developer that uses mac and virtualenv, and simply has a repo that requires GCP

That's a lot of people.

As a cluster maintainer, you have to manage dev env setup scripts and documentation. While I might have steps for installing certain odbc drivers to connect to a specific resource, I am generally able to avoid having setup steps where you need to brew install this or that before airflow will even install. You want your minimal setup ideally to be pip install -r requirements.txt.

@dstandish
Copy link
Contributor Author

dstandish commented May 11, 2021

I think it's a good idea. We already have the possibility of adding optional dependencies in providers, so it should be rather easy thing to do maybe you would like to add it @dstandish ?

Consider me interested. I will try to find a minute to do so.

To be clear, you recommend making leveldb an optional extra within the google provider? But not splitting google into google-gcp and google-leveldb?

@potiuk
Copy link
Member

potiuk commented May 11, 2021

To be clear, you recommend making leveldb an optional extra within the google provider? But not splitting google into google-gcp and google-leveldb?

Correct.

@potiuk
Copy link
Member

potiuk commented May 12, 2021

I hit the same problem on linux today. When installing python 3.6 version of airflow for whatever reason plyvel failed to compile :(

@potiuk
Copy link
Member

potiuk commented May 12, 2021

We SHOULD do it.

@dstandish
Copy link
Contributor Author

dstandish commented May 12, 2021

I may not be able to get to it this weekend ... Traveling... but will let you know... anyone else please take if you have time

@javatarz
Copy link
Contributor

javatarz commented Oct 7, 2021

Until this one's fixed on Mac, it might be worth updating pip install -e ".[devel_all]" to pip install -e ".[devel]" in breeze.

@potiuk
Copy link
Member

potiuk commented Oct 7, 2021

Why not - will you make PR for that @javatarz ? I kept on using devel anyway.

@javatarz
Copy link
Contributor

javatarz commented Oct 7, 2021

I'd be happy to. Incoming PR!

javatarz added a commit to javatarz/airflow that referenced this issue Oct 7, 2021
Fixes apache#15770 unblocking `./breeze initialize-local-virtualenv` on Mac OSX
@potiuk potiuk closed this as completed in 4c9964c Oct 7, 2021
@rodrigo-fss
Copy link

rodrigo-fss commented May 25, 2022

Hello guys
any updates? I'm still struggling with the leveldb requirement to install the google provider and I can see that it wasn't split apart yet. Do you have any recommendations?

plyvel/_plyvel.cpp:703:10: fatal error: leveldb/db.h: No such file or directory
#14 3.011         703 | #include "leveldb/db.h"

Airflow 2.0.2
Python 3.9.6

Thanks!

@uranusjr
Copy link
Member

You would probably have better luck reaching out to the plyvel maintainers. https://github.com/wbolster/plyvel

@dstandish
Copy link
Contributor Author

@rodrigo-fss pretty sure that leveldb was made optional in google provider. You might try installing a more recent version of the provider. Which one did you try to install?

@potiuk
Copy link
Member

potiuk commented May 26, 2022

Yeah. As mentioned in #15933 the right solution to your problem is to upgrade to Airflow 2.1+.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants