Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate slow adls #359

Closed
twuebi opened this issue Sep 19, 2024 · 1 comment
Closed

investigate slow adls #359

twuebi opened this issue Sep 19, 2024 · 1 comment
Labels

Comments

@twuebi
Copy link
Contributor

twuebi commented Sep 19, 2024

Creating a warehouse in adls is approximately 5 times slower than creating a warehouse in aws s3 (sts-enabled).

Creating 10 warehouses (times are millis):

s3 took: '13386', mean: '1338', all: '[1687, 1215, 1204, 2282, 1204, 1131, 1159, 1140, 1122, 1242]
adls took: '66873', mean: '6687', all: '[6824, 6644, 6545, 6673, 6895, 6682, 6546, 6597, 6643, 6824]

It seems that 7 times more connections are started to azure than to aws:

cat log | grep  "starting new connection" | rev | cut -f 1 -d" " | rev | sort | uniq -c
  30 https://ht0cht0tabular0eastus.blob.core.windows.net/
  50 https://ht0cht0tabular0eastus.dfs.core.windows.net/
  60 https://login.microsoftonline.com/
  20 https://s3.eu-north-1.amazonaws.com/

Recreating FileIOs means clients are also recreated every time which may mean that connection pooling is not really working.

@twuebi
Copy link
Contributor Author

twuebi commented Oct 25, 2024

#425 reuses http clients against azure and reduces fetching sas from 800ms to 200ms for subsequent calls, also makes creation of table faster, down from 1700ms to 1200ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants