Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Cache table missing expected columns #9

Closed
ThaliaBarrera opened this issue Feb 6, 2024 · 10 comments
Closed

🐛 Bug: Cache table missing expected columns #9

ThaliaBarrera opened this issue Feb 6, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@ThaliaBarrera
Copy link

Found this issue when reading from source-google-analytics-data-api. The error is seen in all streams I tried for this source.

Here's an example error trace for the pages stream:

[/usr/local/lib/python3.10/dist-packages/airbyte_lib/caches/base.py](https://localhost:8080/#) in _ensure_compatible_table_schema(self, stream_name, raise_on_error)
    411         if missing_columns:
    412             if raise_on_error:
--> 413                 raise exc.AirbyteLibCacheTableValidationError(
    414                     violation="Cache table is missing expected columns.",
    415                     context={

AirbyteLibCacheTableValidationError: AirbyteLibCacheTableValidationError: Cache table validation failed.
    Violation: 'Cache table is missing expected columns.'
    Missing Columns: {'pagePathPlusQueryString', 'screenPageViews', 'bounceRate', 'hostName'}

The interesting thing is that all cache files do seem to have those columns:

Screenshot 2024-02-06 at 19 38 32

👉 Colab for reference

@ThaliaBarrera ThaliaBarrera added the bug Something isn't working label Feb 6, 2024
@aaronsteers
Copy link
Contributor

@ThaliaBarrera - Thanks for logging!

@aaronsteers
Copy link
Contributor

@ThaliaBarrera - This could be caused by the case sensitivity issue - Could you try installing AirbyteLib from branch aj/airbyte-lib/case-insensitive-missing-col-check and let me know if you have any luck?

Related PR (reopening):

@aaronsteers
Copy link
Contributor

aaronsteers commented Feb 6, 2024

Update:

I don't think that PR will fix it actually, but I'd like to ask if you could still retry using that branch. I've just expanded the error message on that branch so it should give us better debug info now:

https://github.com/airbytehq/airbyte/blob/4de4498385ba11c47bb2be077caf24d4ea221ae5/airbyte-lib/airbyte_lib/caches/base.py#L385-L397

                raise exc.AirbyteLibCacheTableValidationError(
                    violation="Cache table is missing expected columns.",
                    context={
                        "stream_column_names": stream_column_names,
                        "table_column_names": table_column_names,
                        "missing_columns": missing_columns,
                    },
                )

@aaronsteers aaronsteers changed the title Cache table missing expected columns 🐛 Bug: Cache table missing expected columns Feb 6, 2024
@ThaliaBarrera
Copy link
Author

ThaliaBarrera commented Feb 6, 2024

@aaronsteers it indeed seems like a case sensitivity issue:

    Violation: 'Cache table is missing expected columns.'
    Missing Columns: {'pagePathPlusQueryString', 'hostName', 'screenPageViews', 'bounceRate'}
    Stream Column Names: {'screenPageViews', 'bounceRate', 'pagePathPlusQueryString', 'property_id', 'date', 'hostName'}
    Table Column Names: {'hostname', 'pagepathplusquerystring', 'property_id', 'screenpageviews', 'bouncerate', 'date'}

Not fixed when using branch aj/airbyte-lib/case-insensitive-missing-col-check, as suspected

@aaronsteers
Copy link
Contributor

aaronsteers commented Feb 7, 2024

@ThaliaBarrera - This should be resolve now by:

Can you retest and report back? Thanks!

@ThaliaBarrera
Copy link
Author

@aaronsteers It think we are past the schema compatibility issue 🚀

However, I'm seeing another problem with the same stream, and others, now during the emulated merge phase if I'm correct. Here's the error I see:

[/usr/local/lib/python3.10/dist-packages/airbyte_lib/caches/base.py](https://localhost:8080/#) in <listcomp>(.0)
    866         # Craft the WHERE clause for composite primary keys
    867         join_conditions = [
--> 868             getattr(final_table.c, pk_column) == getattr(temp_table.c, pk_column)
    869             for pk_column in pk_columns
    870         ]

[/usr/local/lib/python3.10/dist-packages/sqlalchemy/sql/base.py](https://localhost:8080/#) in __getattr__(self, key)
   1223             return self._index[key]
   1224         except KeyError as err:
-> 1225             util.raise_(AttributeError(key), replace_context=err)
   1226 
   1227     def __contains__(self, key):

[/usr/local/lib/python3.10/dist-packages/sqlalchemy/util/compat.py](https://localhost:8080/#) in raise_(***failed resolving arguments***)
    209 
    210         try:
--> 211             raise exception
    212         finally:
    213             # credit to

AttributeError: hostName

And the Colab if you'd like to take a look.

Happy to open another issue for this one. Let me know 🙂

@aaronsteers
Copy link
Contributor

Thanks, @ThaliaBarrera !

It thinks that hostName doesn't exist in either the final table or the stage table - but I'm not able to tell which because the error occurs on a line that references both:

getattr(final_table.c, pk_column) == getattr(temp_table.c, pk_column)

Could you try querying both tables in sql to see if they have a column called "hostName"?

I think you can see the list of table names with something like %sqlcmd show tables (exact syntax is in one of the earlier demo notebooks).

@ThaliaBarrera
Copy link
Author

@aaronsteers It's weird, but I don't see a stage table. When I run %sql show tables this is what I get:

_airbytelib_state
_airbytelib_streams
pages

Then, if I query the pages table, it does have the hostName column, but in lowercase: hostname (maybe that's expected)

@aaronsteers
Copy link
Contributor

@aaronsteers It's weird, but I don't see a stage table.

That's okay - I think the temp table may be cleaning itself up after failure, as designed.

Then, if I query the pages table, it does have the hostName column, but in lowercase: hostname (maybe that's expected)

Thank you thank you! Yes, this confirms, I think, the root cause. That definitely points to another case of case sensitivity issues. I can work from this... 🙏

@aaronsteers
Copy link
Contributor

@ThaliaBarrera I'm closing this issue as resolved, and will handle follow-ups on this new issue:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants