Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of testing Providers that were prepared on July 15, 2021 #17037

Closed
5 of 26 tasks
potiuk opened this issue Jul 15, 2021 · 35 comments · Fixed by #17061
Closed
5 of 26 tasks

Status of testing Providers that were prepared on July 15, 2021 #17037

potiuk opened this issue Jul 15, 2021 · 35 comments · Fixed by #17061
Labels
testing status Status of testing releases

Comments

@potiuk
Copy link
Member

potiuk commented Jul 15, 2021

have a kind request for all the contributors to the latest provider packages release.
Could you help us to test the RC versions of the providers and let us know in the comment,
if the issue is addressed there.

Providers that need testing

Those are providers that require testing as there were some substantial changes introduced:

Provider amazon: 2.1.0rc1

Provider apache.hive: 2.0.1rc1

Provider apache.sqoop: 2.0.1rc1

Provider cncf.kubernetes: 2.0.1rc1

Provider docker: 2.1.0rc1

- [ ] [Adds option to disable mounting temporary folder in DockerOperator (#16932)(https://github.com//pull/16932): @potiuk: bug found.

Provider google: 4.1.0rc1

Provider jenkins: 2.0.1rc1

Provider microsoft.azure: 3.1.0rc1

Provider mysql: 2.1.0rc1: Marking for RC2 release

- [ ] Added template_fields_renderers for MySQL Operator (#16914): @oyarushe
- [ ] Extended template_fields_renderers for MySQL provider (#16987): @oyarushe

Provider postgres: 2.1.0rc1

Provider sftp: 2.1.0rc1

Provider snowflake: 2.1.0rc1

Provider ssh: 2.1.0rc1

Provider tableau: 2.1.0rc1

New Providers

@potiuk potiuk added the kind:bug This is a clearly a bug label Jul 15, 2021
@raphaelauv
Copy link
Contributor

raphaelauv commented Jul 17, 2021

for the docker provider

apache-airflow-providers-docker==2.1.0rc1

still failing for docker in docker

I tried with airflow 2.1.2

t3 = DockerOperator(
    api_version='auto',
    docker_url="unix://var/run/docker.sock",
    command='/bin/sleep 30',
    image='centos:latest',
    network_mode='bridge',
    task_id='docker_op_tester',
    dag=dag,
)

give

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 268, in _raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.9/site-packages/requests/models.py", line 953, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http+docker://localhost/v1.41/containers/create

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 253, in _run_image
    return self._run_image_with_mounts(self.mounts + [tmp_mount], add_tmp_variable=True)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 272, in _run_image_with_mounts
    self.container = self.cli.create_container(
  File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 430, in create_container
    return self.create_container_from_config(config, name)
  File "/usr/local/lib/python3.9/site-packages/docker/api/container.py", line 441, in create_container_from_config
    return self._result(res, True)
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 274, in _result
    self._raise_for_status(response)
  File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 270, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.9/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error for http+docker://localhost/v1.41/containers/create: Bad Request ("invalid mount config for type "bind": bind source path does not exist: /tmp/airflowtmp3d8fxnte")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 346, in execute
    return self._run_image()
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/docker/operators/docker.py", line 255, in _run_image
    if self.host_tmp_dir in str(e):
TypeError: 'in <string>' requires string as left operand, not NoneType

the line

if self.host_tmp_dir in str(e):

is buggy

a fix -> #17061

@raphaelauv
Copy link
Contributor

raphaelauv commented Jul 17, 2021

for the cncf.kubernetes provider

apache-airflow-providers-cncf-kubernetes==2.0.1rc1

still failing with the an arg containing ".json"

I tried with airflow 2.1.2

    k = KubernetesPodOperator(
        namespace=namespace,
        image="hello-world",
        labels={"foo": "bar"},
        arguments=["vivi.json"],
        name="airflow-test-pod",
        task_id="task-one",
        ...
        )

give

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1294, in _prepare_and_execute_task_with_callbacks
    self.render_templates(context=context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1793, in render_templates
    self.task.render_template_fields(context)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 992, in render_template_fields
    self._do_render_template_fields(self, self.template_fields, context, jinja_env, set())
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1005, in _do_render_template_fields
    rendered_content = self.render_template(content, context, jinja_env, seen_oids)
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in render_template
    return [self.render_template(element, context, jinja_env) for element in content]
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1056, in <listcomp>
    return [self.render_template(element, context, jinja_env) for element in content]
  File "/usr/local/lib/python3.9/site-packages/airflow/models/baseoperator.py", line 1040, in render_template
    return jinja_env.get_template(content).render(**context)
  File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 883, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 857, in _load_template
    template = self.loader.load(self, name, globals)
  File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 115, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/usr/local/lib/python3.9/site-packages/jinja2/loaders.py", line 197, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: vivi.json

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

if self.host_tmp_dir in str(e):

Indeed - I remove docker provider from this round and prepare a next RC. My mistake :)

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

jinja2.exceptions.TemplateNotFound: vivi.json

@raphaelauv Are you sure you had vivi.json file locally ? For me that error looks like it simply was not available locally?

@raphaelauv
Copy link
Contributor

There is no file vivi.json in local, it's just an argument for the pod

the issue is #16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod

it's still the case even with this MR #16930

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

the issue is #16922 -> Using the string ".json" in a dag makes KubernetesPodOperator worker unable to trigger the pod

it's still the case even with this MR #16930

Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json could appear somewhat randomly (for example the argument could be "aaaajson". When a templated field value ends with one of the "template_extensions", then all operators (including KPO) will try to read the file with this extension, process it by the template engine and fail when it does not exist (which is precisely what we see). Do you think it is a problem ? And how do you think it should work (and keep backwards compatibility)?

@raphaelauv
Copy link
Contributor

raphaelauv commented Jul 17, 2021

With ".json", this is expected behaviour

then it could need a new option to let the user disable the templating on the field arguments or anything else.

But the issue should not be considered done since the OP asked for ".json"

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

Sometimes the description is not precise enough. Actually what you propose is a new feature not a bug - would you like to propose PR for that?

@raphaelauv
Copy link
Contributor

I propose to re-open the issue #16922 since it's still happening.

@raphaelauv
Copy link
Contributor

otherwise

    k = KubernetesPodOperator(
        namespace=namespace,
        image="hello-world",
        labels={"foo": "bar"},
        arguments=["vivijson"],
        ...
        )

do not throw anymore an error with apache-airflow-providers-cncf-kubernetes==2.0.1rc1

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

I will let @kaxil and @eladkal tell here what they think. In here we are very focused on getting community opinions on issues (committers have binding votes). In the meantime, I strongly encourage you @raphaelauv to open a PR with proposed new feature. Due to backwards compatibility this anyhow needs to be discussed and opinion of at least a few people has to be taken into account if we are going to implement such a change, as it impacts ~70 providers and ~600 operators, so it is highly unlikely to be resolved (and issue reopened) with an issue related to one operator.

Our community works on reaching consensus and discussing such things, their consequences, backwards compatibilities etc.

@raphaelauv
Copy link
Contributor

I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

I don't need that feature and not asking for, just caring about the guy that opened the issue and that think that is problem is solved. I just don't want miss communication.

i started discussion there. Airflow is a community managed project, and it is quite common when you "want" something, you are asked to contribute that, especially if it is not a big fix. A lot of people here who use airflow and comment here, also contribute back fixes and features so this is very common that we are asking people to help with what they think is a good feature. If you make a statement that something "should" be done, I think it is only natural that you "want" this to happen and in open-source world, the natural consequence might be that you are asked to contribute it.

@raphaelauv
Copy link
Contributor

Again , I don'k ask for anything , just proposing to let know the guy of the issue.

I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

I have probably not been clear in my messages, I just wanted to contribute by helping to check the new RC providers.

This is cool that you want to help. I did not want to discourage it, I just politely asked if you would like to contribute it (which you do not have to). And look what happened - I started the discussion in the original issue #16922 and also asked the author what he thinks :). So I think we are pretty aligned here. My question to you was just the normal thing that happens in open-source - we are asking people who raise concerns and expres opinions, to help. This is how our community works :).

Not sure why the frowning faces here, I was just acting with a very good intent here. It's ok to have different opinions and discuss, and it's also OK to ask people who raise concerns, if they are ok to follow up and contribute, I think.

@raphaelauv
Copy link
Contributor

raphaelauv commented Jul 17, 2021

frowning faces

it's the emoji : confused

cause I was not understanding why you explain me all this common sense about OSS , let me remove them.

@potiuk
Copy link
Member Author

potiuk commented Jul 17, 2021

Really - what we are here up to is to get help from everyone who participates, and we are gently and politely (I hope) asking people to participate. If my question was understood differently - I think this is more of the medium :). Not everyone understands it - apologies if it created confusion

@jnturton
Copy link
Contributor

jnturton commented Jul 18, 2021

Hi, in a virtualenv in which I installed RC1, the apache.drill: 1.0.0rc1 provider does work. I do have two notes of things I didn't expect.

  1. There was no predefined drill_default connection and I had to define it myself, yet I did add a drill_default connection to airflow/utils/db.py.
  2. I noticed that my Drill password was included in the clear in the logging output from base.py, example shown below.
[2021-07-18 09:15:05,050] {base.py:69} INFO - Using connection to: id: drill_default. Host: localhost, Port: 8047, Schema: , Login: james, Password: **should-be-secret**, extra: {'driver_dialect': 'drill+sadrill', 'storage_plugin': 'dfs'}

@potiuk
Copy link
Member Author

potiuk commented Jul 18, 2021

Thanks for testing @dzamo! And thanks for being so thorough! Really appreciated!

  1. There was no predefined drill_default connection and I had to define it myself, yet I did add a drill_default connection to airflow/utils/db.py.

Yep. Totally expected. Default connections is one thing that is not included in "provider packages". The default connections are part of "airflow" package as they are really only useful during testing and airflow development (so in main branch) and possibly for quick airflow testing with docker-compose etc. The default drill connection is already present in main (you can see it with breeze start-airflow --load-default-connections for example). It will be automatically included in Airflow when we release 2.2.

  1. I noticed that my Drill password was included in the clear in the logging output from base.py, example shown below.

Yeah. This is a known issue that will be addressed in 2.1.3. #16579. It's internal webserver logs only (not task logs visible via Airflow UI) so while important to fix, it is not critical.

@potiuk potiuk reopened this Jul 19, 2021
@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2021

Closed by mistake :).

@pmalafosse @baolsen @hewe @scottypate @codenamestif @ciancolo @kaxil @Daniel-Han-Yang @namjals @malthe @BKronenbitter @oyarushe @LukeHong @saurasingh @freget @ashb @ciancolo @samgans - it would be great if at least some of those changes are tested :). Appreciate your help with that.

@oyarushe
Copy link
Contributor

Tested the MySQL operator and found the issue, could you please review fix #17080.

@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2021

Tested the MySQL operator and found the issue, could you please review fix #17080.

Thanks. I will mark it also for rc2 and release together with Docker Provider separately.

@pavelhlushchanka
Copy link
Contributor

#16820 looks good

@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2021

Some more tests would be nice :)

@oyarushe
Copy link
Contributor

Regarding the Amazon provider, looks like not all template_fields_renderers were fixed.
Please, check my PR #17087

@potiuk
Copy link
Member Author

potiuk commented Jul 19, 2021

Sounds like we might want to release all the packages together with rc2 :D

@kaxil
Copy link
Member

kaxil commented Jul 19, 2021

Not really. With ".json", this is expected behaviour. As I understand it, previously the problem was that json could appear somewhat randomly (for example the argument could be "aaaajson".

Correct, the issue was with json randomly used at end of string, not with .json the extension. (For KubernetesPodOperator)

@hewe
Copy link
Contributor

hewe commented Jul 19, 2021

Verified #16767 and it is working as expected 😃

@ciancolo
Copy link
Contributor

ciancolo commented Jul 21, 2021

there is a problem with #16365. In the case of Extra parameter in format {'verify': true} the conversion str to bool is not necessary.

With Extra parameters as string, the PR looks good.

Check #17125 for the bug fix.

[2021-07-21, 07:15:33 UTC] {taskinstance.py:1455} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/airflow/airflow/models/taskinstance.py", line 1182, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/opt/airflow/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/opt/airflow/airflow/models/taskinstance.py", line 1315, in _execute_task
    result = task_copy.execute(context=context)
  File "/opt/airflow/airflow/providers/tableau/operators/tableau_refresh_workbook.py", line 70, in execute
    with TableauHook(self.site_id, self.tableau_conn_id) as tableau_hook:
  File "/opt/airflow/airflow/providers/tableau/hooks/tableau.py", line 70, in __init__
    verify = bool(strtobool(verify))
  File "/usr/local/lib/python3.8/distutils/util.py", line 313, in strtobool
    val = val.lower()
AttributeError: 'bool' object has no attribute 'lower'

@ciancolo
Copy link
Contributor

Tested #16350 on my local Sqoop env and it works as expected

@potiuk
Copy link
Member Author

potiuk commented Jul 21, 2021

There were quite a bit too many problems with this wave, I am going to release a second rc2 wave with all the providers with fixes soon (plus few more changes). Closing this one for now.

@potiuk potiuk closed this as completed Jul 21, 2021
@potiuk
Copy link
Member Author

potiuk commented Jul 21, 2021

Check #17125 for the bug fix.

Cool. Thanks!

@pmalafosse
Copy link
Contributor

There is a bug with #16685 I will create a fix by tomorrow

@potiuk
Copy link
Member Author

potiuk commented Jul 21, 2021

Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!

@pmalafosse
Copy link
Contributor

Ahh. Seems like REALLY buggy release candidate set this time :). I will wait for the fix then with rc2 @pmalafosse !. Thanks for letting me know!

Thanks! The fix is in that PR #17141

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing status Status of testing releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants