-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The response json returns the empty columns #2
Comments
Actually this cause the json to have wrong data. |
Good use case. Let me look into it. Could you please share the SQL query you are invoking and the sample data that you have in Athena - so I can replicate? |
The query is pretty simple "SELECT * FROM db" That is the csv as it is saved after the query.
and that is what the json should be when I made a csv to json conversion at an online tool
and below is what it is exported by the athena-express module
|
Which version of athena-express are you using? I just loaded your csv into Athena and queried {
Items: [{
resource: 'module',
name: 'util',
bucket_name: '',
directory_path: '',
file_name: '',
comments: 'mpla mpla mpla',
path_name: '',
function_name: 'login_mm',
index_name: '',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'dynamo',
name: 'customers',
bucket_name: '',
directory_path: '',
file_name: '',
comments: 'mpla',
path_name: '',
function_name: '',
index_name: 'table',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'dynamo',
name: 'storeCustomers',
bucket_name: '',
directory_path: '',
file_name: '',
comments: 'mpla',
path_name: '',
function_name: '',
index_name: 'table',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'athena',
name: '',
bucket_name: '',
directory_path: '',
file_name: '',
comments: '',
path_name: '',
function_name: '',
index_name: '',
lambda: 'format_athena_table',
branch: 'master'
}, {
resource: 's3',
name: '',
bucket_name: 'reporesources',
directory_path: '',
file_name: 'resources.json',
comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
path_name: '',
function_name: '',
index_name: '',
lambda: 'resources_github_webhook',
branch: 'master'
}, {
resource: 'api',
name: 'api.github.com',
bucket_name: '',
directory_path: '',
file_name: '',
comments: 'We use the api of github to get the resources.json file of each repo that comes from the webhook',
path_name: 'getContents',
function_name: '',
index_name: '',
lambda: 'resources_github_webhook',
branch: 'master'
}]
} |
And I just pushed 3.1.0 that removes empty column values from final JSON. So the above JSON response looks like this now: {
Items: [{
resource: 'module',
name: 'util',
comments: 'mpla mpla mpla',
function_name: 'login_mm',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'dynamo',
name: 'customers',
comments: 'mpla',
index_name: 'table',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'dynamo',
name: 'storeCustomers',
comments: 'mpla',
index_name: 'table',
lambda: 'fetchCustomer',
branch: 'production'
}, {
resource: 'athena',
lambda: 'format_athena_table',
branch: 'master'
}, {
resource: 's3',
bucket_name: 'reporesources',
file_name: 'resources.json',
comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
lambda: 'resources_github_webhook',
branch: 'master'
}, {
resource: 'api',
name: 'api.github.com',
comments: 'We use the api of github to get the resources.json file of each repo that comes from the webhook',
path_name: 'getContents',
lambda: 'resources_github_webhook',
branch: 'master'
}]
} Try it out and let me know. |
Hello. This is what I get from the response of the query.
and below is the csv
|
Another example that returned weird data is:
Which returned a row attr for some reason.
Trying the same csv online gives me the following: |
Quick question: In your config object, do you have |
I have the exact same issue, version 3.2.5 installed |
This is what I get from athena-express with select * query on your other sample data {
Items: [{
bucket_name: 'bucket_name',
directory_path: 'directory_path',
file_name: 'file_name',
comments: 'comments',
lambda: 'lambda',
branch: 'branch'
}, {
bucket_name: 'reporesources',
file_name: 'resources.json',
comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
lambda: 'resources_github_webhook',
branch: 'master'
}, {
bucket_name: 'reporesources',
file_name: 'validation.json',
comments: 'Uploads the validation.json so that the other lambdas can read it',
lambda: 'resources_github_webhook',
branch: 'master'
}]
} How are you importing your csv into athena? Following is how I added. Does your
|
I use AWS Firehose. When I query the data in Athena through AWS web interface the table looks fine. By the way I use structs as column types in some cases. |
Just pushed 3.2.5, which fixed the |
I wanna try and replicate, can you send me your create table query and export the csv from Athena? |
Have you pushed the new version to npm? |
I followed this tutorial. https://aws.amazon.com/blogs/machine-learning/build-a-social-media-dashboard-using-machine-learning-and-bi-services/ It looks like, the csv parser is not working properly.
|
Yeah, 35 minutes ago @3.2.5 |
I am testing 3.2.5 version. Nothing changed yet. |
The problem is in the statementType which in my case is undefined and it is not DML. |
3.2.5 is currently the latest with the only change being the fix for |
Oh, interesting. what is the specific query you are sending to Athena? |
SELECT bucket_name, directory_path, file_name, comments, lambda, branch FROM repos.resources WHERE resource='s3' AND bucket_name='reporesources'; |
OK so the problem is the fact that the GetQueryExecution of AWS Athena, does not respond by default with a vaiue at StatementType. So you should consider the following:
which solves the whole opened issue with the data etc. |
@deliverymanager I tried it out, it did not change anything for me. The problem still exists that the csv is not properly turned into json. This is the output I get. Everything is messed up from first field on.
|
Yes after initializing the StatementType to DML in case that the response did not contain it, the only time it worked perfectly, was if I had all the key and values filled with data. After removing some values the same problem reappeared. Would you consider placing a module csvtojson
|
@deliverymanager that is exactly what I am doing now. Just deactivate jsonFormat and parse the csv with csvtojson. @ghdna It would be great if you use csvtojson, instead of selfwritten parser. Your rule for cutting the strings is not robust enough.
|
I'm trying out these scenarios and will post an update. |
Let me tackle this one by one. { Items:
[ { bucket_name: 's3',
file_name: 'reporesources',
lambda: 'resources.json',
branch:
'Uploads the resources.gz for the specific repo that comes from the webhook' } ] } So I'd like to understand what's different between the csv you shared above and the one I imported into Athena. |
The reason this works, is the fact that every key has a value. PS: it is not possible to make a session right now... |
Yes exactly. Those are the empty ones |
Yeah, but as you can see in the screenshot, it's parsing it out and ignoring anything that is empty. That was the change done in 3.1.0. |
That's why I'd like to see what's going on on your side. Let me know when you have 30 mins later today. There is something else going on here. |
@wasserholz I'm trying to replicate your use case. With the create table command, when I import your .csv file, it comes up as empty in Athena - see this. Can you confirm if this is the correct file that you are importing with your create table command? |
Published v3.4.0 that defaults to DML. |
@ghdna like I said, I don't use csv imports. I gave you the csv output of the table, as I have a stream input setup. Take the lineReader and read the csv file I sent you. Then run it through your cleanUp functions and look at the output. I am now reading the raw csv through athena-express and just use csvtojson to convert it to json as a workaround. |
@ghdna so I debugged the code now and like I said the csv parser is not robust enough.
This line messes up. |
Just pushed v4.0.0 that parses using csvtojson. |
When there are some columns in a row that are empty / null, then the response json that is returned from the query is having empty indexes.
I think you should create an array with all falsey values removed.
So that only the columns that actually exist in a specific row will be returned in the response.
The text was updated successfully, but these errors were encountered: