Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The response json returns the empty columns #2

Closed
deliverymanager opened this issue Jan 27, 2019 · 37 comments
Closed

The response json returns the empty columns #2

deliverymanager opened this issue Jan 27, 2019 · 37 comments
Assignees
Labels
question Further information is requested

Comments

@deliverymanager
Copy link

When there are some columns in a row that are empty / null, then the response json that is returned from the query is having empty indexes.

I think you should create an array with all falsey values removed.
So that only the columns that actually exist in a specific row will be returned in the response.

{ resource: 'dynamo',
       name: 'customers",,,,"mpla",,,"table',
       bucket_name: 'fetchCustomer',
       directory_path: 'production' }
@deliverymanager deliverymanager changed the title The response json returns null object The response json returns some empty elements Jan 27, 2019
@deliverymanager deliverymanager changed the title The response json returns some empty elements The response json returns the empty columns Jan 27, 2019
@deliverymanager
Copy link
Author

Actually this cause the json to have wrong data.

@ghdna
Copy link
Owner

ghdna commented Jan 27, 2019

Good use case. Let me look into it. Could you please share the SQL query you are invoking and the sample data that you have in Athena - so I can replicate?

@deliverymanager
Copy link
Author

deliverymanager commented Jan 27, 2019

The query is pretty simple "SELECT * FROM db"

That is the csv as it is saved after the query.

resource,"name","bucket_name","directory_path","file_name","comments","path_name","function_name","index_name","lambda","branch"
--
module,"util",,,,"mpla   mpla   mpla",,"login_mm",,"fetchCustomer","production"
dynamo,"customers",,,,"mpla",,,"table","fetchCustomer","production"
dynamo,"storeCustomers",,,,"mpla",,,"table","fetchCustomer","production"
athena,,,,,,,,,"format_athena_table","master"
s3,,"reporesources","","resources.json","Uploads   the resources.gz for the specific repo that comes from the   webhook",,,,"resources_github_webhook","master"
api,"api.github.com",,,,"We   use the api of github to get the resources.json file of each repo that comes   from the   webhook","getContents",,,"resources_github_webhook","master"

and that is what the json should be when I made a csv to json conversion at an online tool

[
  {
    "resource": "module",
    "name": "util",
    "bucket_name": "",
    "directory_path": "",
    "file_name": "",
    "comments": "mpla mpla mpla",
    "path_name": "",
    "function_name": "login_mm",
    "index_name": "",
    "lambda": "fetchCustomer",
    "branch": "production"
  },
  {
    "resource": "dynamo",
    "name": "customers",
    "bucket_name": "",
    "directory_path": "",
    "file_name": "",
    "comments": "mpla",
    "path_name": "",
    "function_name": "",
    "index_name": "table",
    "lambda": "fetchCustomer",
    "branch": "production"
  },
  {
    "resource": "dynamo",
    "name": "storeCustomers",
    "bucket_name": "",
    "directory_path": "",
    "file_name": "",
    "comments": "mpla",
    "path_name": "",
    "function_name": "",
    "index_name": "table",
    "lambda": "fetchCustomer",
    "branch": "production"
  },
  {
    "resource": "athena",
    "name": "",
    "bucket_name": "",
    "directory_path": "",
    "file_name": "",
    "comments": "",
    "path_name": "",
    "function_name": "",
    "index_name": "",
    "lambda": "format_athena_table",
    "branch": "master"
  },
  {
    "resource": "s3",
    "name": "",
    "bucket_name": "reporesources",
    "directory_path": "",
    "file_name": "resources.json",
    "comments": "Uploads the resources.gz for the specific repo that comes from the webhook",
    "path_name": "",
    "function_name": "",
    "index_name": "",
    "lambda": "resources_github_webhook",
    "branch": "master"
  },
  {
    "resource": "api",
    "name": "api.github.com",
    "bucket_name": "",
    "directory_path": "",
    "file_name": "",
    "comments": "We use the api of github to get the resources.json file of each repo that comes from the webhook",
    "path_name": "getContents",
    "function_name": "",
    "index_name": "",
    "lambda": "resources_github_webhook",
    "branch": "master"
  }
]

and below is what it is exported by the athena-express module

[
    {
      "resource": "module",
      "name": "util\",,,,\"mpla mpla mpla\",,\"login_mm\",,\"fetchCustomer",
      "bucket_name": "production"
    },
    {
      "resource": "dynamo",
      "name": "customers\",,,,\"mpla\",,,\"table",
      "bucket_name": "fetchCustomer",
      "directory_path": "production"
    },
    {
      "resource": "dynamo",
      "name": "storeCustomers\",,,,\"mpla\",,,\"table",
      "bucket_name": "fetchCustomer",
      "directory_path": "production"
    },
    {
      "resource": "s3\",,\"reporesources",
      "name": "",
      "bucket_name": "resources.json",
      "directory_path": "Uploads the resources.gz for the specific repo that comes from the webhook\",,,,\"resources_github_webhook",
      "file_name": "master"
    },
    {
      "resource": "api",
      "name": "api.github.com\",,,,\"We use the api of github to get the resources.json file of each repo that comes from the webhook",
      "bucket_name": "getContents\",,,\"resources_github_webhook",
      "directory_path": "master"
    },
    {
      "resource": "athena\",,,,,,,,,\"format_athena_table",
      "name": "master"
    }
  ]

@ghdna
Copy link
Owner

ghdna commented Jan 27, 2019

Which version of athena-express are you using? I just loaded your csv into Athena and queried select * and got this:

{
    Items: [{
        resource: 'module',
        name: 'util',
        bucket_name: '',
        directory_path: '',
        file_name: '',
        comments: 'mpla mpla mpla',
        path_name: '',
        function_name: 'login_mm',
        index_name: '',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'dynamo',
        name: 'customers',
        bucket_name: '',
        directory_path: '',
        file_name: '',
        comments: 'mpla',
        path_name: '',
        function_name: '',
        index_name: 'table',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'dynamo',
        name: 'storeCustomers',
        bucket_name: '',
        directory_path: '',
        file_name: '',
        comments: 'mpla',
        path_name: '',
        function_name: '',
        index_name: 'table',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'athena',
        name: '',
        bucket_name: '',
        directory_path: '',
        file_name: '',
        comments: '',
        path_name: '',
        function_name: '',
        index_name: '',
        lambda: 'format_athena_table',
        branch: 'master'
    }, {
        resource: 's3',
        name: '',
        bucket_name: 'reporesources',
        directory_path: '',
        file_name: 'resources.json',
        comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
        path_name: '',
        function_name: '',
        index_name: '',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }, {
        resource: 'api',
        name: 'api.github.com',
        bucket_name: '',
        directory_path: '',
        file_name: '',
        comments: 'We use the api of github to get the resources.json file of each repo that comes from the webhook',
        path_name: 'getContents',
        function_name: '',
        index_name: '',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }]
}

@ghdna
Copy link
Owner

ghdna commented Jan 27, 2019

And I just pushed 3.1.0 that removes empty column values from final JSON. So the above JSON response looks like this now:

{
    Items: [{
        resource: 'module',
        name: 'util',
        comments: 'mpla mpla mpla',
        function_name: 'login_mm',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'dynamo',
        name: 'customers',
        comments: 'mpla',
        index_name: 'table',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'dynamo',
        name: 'storeCustomers',
        comments: 'mpla',
        index_name: 'table',
        lambda: 'fetchCustomer',
        branch: 'production'
    }, {
        resource: 'athena',
        lambda: 'format_athena_table',
        branch: 'master'
    }, {
        resource: 's3',
        bucket_name: 'reporesources',
        file_name: 'resources.json',
        comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }, {
        resource: 'api',
        name: 'api.github.com',
        comments: 'We use the api of github to get the resources.json file of each repo that comes from the webhook',
        path_name: 'getContents',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }]
}

Try it out and let me know.

@deliverymanager
Copy link
Author

Hello.
It is a little strange but I don't see any difference.
I have installed the 3.1.5 version.

This is what I get from the response of the query.

[
    {
      "resource": "s3\",,\"reporesources",
      "bucket_name": "resources.json",
      "directory_path": "Uploads the resources.gz for the specific repo that comes from the webhook\",,,,\"resources_github_webhook",
      "file_name": "master"
    },
    {
      "resource": "api",
      "name": "api.github.com\",,,,\"We use the api of github to get the resources.json file of each repo that comes from the webhook",
      "bucket_name": "getContents\",,,\"resources_github_webhook",
      "directory_path": "master"
    },
    {
      "resource": "module",
      "name": "util\",,,,\"mpla mpla mpla\",,\"login_mm\",,\"fetchCustomer",
      "bucket_name": "production"
    },
    {
      "resource": "dynamo",
      "name": "customers\",,,,\"mpla\",,,\"table",
      "bucket_name": "fetchCustomer",
      "directory_path": "production"
    },
    {
      "resource": "dynamo",
      "name": "storeCustomers\",,,,\"mpla\",,,\"table",
      "bucket_name": "fetchCustomer",
      "directory_path": "production"
    },
    {
      "resource": "athena\",,,,,,,,,\"format_athena_table",
      "name": "master"
    }
  ]

and below is the csv

"resource","name","bucket_name","directory_path","file_name","comments","path_name","function_name","index_name","lambda","branch"
"athena",,,,,,,,,"format_athena_table","master"
"s3",,"reporesources","","resources.json","Uploads the resources.gz for the specific repo that comes from the webhook",,,,"resources_github_webhook","master"
"api","api.github.com",,,,"We use the api of github to get the resources.json file of each repo that comes from the webhook","getContents",,,"resources_github_webhook","master"
"module","util",,,,"mpla mpla mpla",,"login_mm",,"fetchCustomer","production"
"dynamo","customers",,,,"mpla",,,"table","fetchCustomer","production"
"dynamo","storeCustomers",,,,"mpla",,,"table","fetchCustomer","production"

@deliverymanager
Copy link
Author

Another example that returned weird data is:

bucket_name,"directory_path","file_name","comments","lambda","branch"
reporesources,"","resources.json","Uploads the resources.gz for the specific repo that comes from the webhook","resources_github_webhook","master"
reporesources,"","validation.json","Uploads the validation.json so that the other lambdas can read it","resources_github_webhook","master"

Which returned a row attr for some reason.

[
    {
      "row": "\"bucket_name\",\"directory_path\",\"file_name\",\"comments\",\"lambda\",\"branch\""
    },
    {
      "row": "\"reporesources\",\"\",\"resources.json\",\"Uploads the resources.gz for the specific repo that comes from the webhook\",\"resources_github_webhook\",\"master\""
    },
    {
      "row": "\"reporesources\",\"\",\"validation.json\",\"Uploads the validation.json so that the other lambdas can read it\",\"resources_github_webhook\",\"master\""
    }
  ]

Trying the same csv online gives me the following:
https://www.csvjson.com/csv2json/9e28b64040aba3e73406ef38cfcde994

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Quick question: In your config object, do you have formatJson: false ?

@wasserholz
Copy link

wasserholz commented Jan 28, 2019

I have the exact same issue, version 3.2.5 installed
jsonFormat: true,
if jsonFormat is false, i get no response at all

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

This is what I get from athena-express with select * query on your other sample data

{
    Items: [{
        bucket_name: 'bucket_name',
        directory_path: 'directory_path',
        file_name: 'file_name',
        comments: 'comments',
        lambda: 'lambda',
        branch: 'branch'
    }, {
        bucket_name: 'reporesources',
        file_name: 'resources.json',
        comments: 'Uploads the resources.gz for the specific repo that comes from the webhook',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }, {
        bucket_name: 'reporesources',
        file_name: 'validation.json',
        comments: 'Uploads the validation.json so that the other lambdas can read it',
        lambda: 'resources_github_webhook',
        branch: 'master'
    }]
}

How are you importing your csv into athena?

Following is how I added. Does your SERDEPROPERTIES match mine?

CREATE EXTERNAL TABLE IF NOT EXISTS results4 (
bucket_name string,
directory_path string, 
file_name string,
comments string,
lambda string,
branch string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'quoteChar' = '"',
  'field.delim' = ','
) LOCATION 's3://mybucketname/myfolder/'
TBLPROPERTIES ('has_encrypted_data'='false');

@wasserholz
Copy link

I use AWS Firehose. When I query the data in Athena through AWS web interface the table looks fine. By the way I use structs as column types in some cases.

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

I have the exact same issue, version 3.2.5 installed
jsonFormat: true,
if jsonFormat is false, i get no response at all

Just pushed 3.2.5, which fixed the jsonFormat: false issue. This is the raw data in S3 and will looked messed up.

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

I use AWS Firehose. When I query the data in Athena through AWS web interface the table looks fine. By the way I use structs as column types in some cases.

I wanna try and replicate, can you send me your create table query and export the csv from Athena?

@deliverymanager
Copy link
Author

Have you pushed the new version to npm?

@wasserholz
Copy link

wasserholz commented Jan 28, 2019

I followed this tutorial. https://aws.amazon.com/blogs/machine-learning/build-a-social-media-dashboard-using-machine-learning-and-bi-services/

It looks like, the csv parser is not working properly.
Edit: empty cells seem to be ignored somehow.

CREATE EXTERNAL TABLE socialanalyticsblog.tweets ( coordinates STRUCT< type: STRING, coordinates: ARRAY< DOUBLE > >, retweeted BOOLEAN, source STRING, entities STRUCT< hashtags: ARRAY< STRUCT< text: STRING, indices: ARRAY< BIGINT > > >, urls: ARRAY< STRUCT< url: STRING, expanded_url: STRING, display_url: STRING, indices: ARRAY< BIGINT > > > >, reply_count BIGINT, favorite_count BIGINT, geo STRUCT< type: STRING, coordinates: ARRAY< DOUBLE > >, id_str STRING, timestamp_ms BIGINT, truncated BOOLEAN, text STRING, retweet_count BIGINT, id BIGINT, possibly_sensitive BOOLEAN, filter_level STRING, created_at STRING, place STRUCT< id: STRING, url: STRING, place_type: STRING, name: STRING, full_name: STRING, country_code: STRING, country: STRING, bounding_box: STRUCT< type: STRING, coordinates: ARRAY< ARRAY< ARRAY< FLOAT > > > > >, favorited BOOLEAN, lang STRING, in_reply_to_screen_name STRING, is_quote_status BOOLEAN, in_reply_to_user_id_str STRING, user STRUCT< id: BIGINT, id_str: STRING, name: STRING, screen_name: STRING, location: STRING, url: STRING, description: STRING, translator_type: STRING, protected: BOOLEAN, verified: BOOLEAN, followers_count: BIGINT, friends_count: BIGINT, listed_count: BIGINT, favourites_count: BIGINT, statuses_count: BIGINT, created_at: STRING, utc_offset: BIGINT, time_zone: STRING, geo_enabled: BOOLEAN, lang: STRING, contributors_enabled: BOOLEAN, is_translator: BOOLEAN, profile_background_color: STRING, profile_background_image_url: STRING, profile_background_image_url_https: STRING, profile_background_tile: BOOLEAN, profile_link_color: STRING, profile_sidebar_border_color: STRING, profile_sidebar_fill_color: STRING, profile_text_color: STRING, profile_use_background_image: BOOLEAN, profile_image_url: STRING, profile_image_url_https: STRING, profile_banner_url: STRING, default_profile: BOOLEAN, default_profile_image: BOOLEAN >, quote_count BIGINT ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

01fb8c5f-054d-4dc8-96ae-7262d1a88384.csv.zip

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Have you pushed the new version to npm?

Yeah, 35 minutes ago @3.2.5

@deliverymanager
Copy link
Author

I have the exact same issue, version 3.2.5 installed
jsonFormat: true,
if jsonFormat is false, i get no response at all

Just pushed 3.3.5, which fixed the jsonFormat: false issue. This is the raw data in S3 and will looked messed up.

I am testing 3.2.5 version. Nothing changed yet.
Did you create a 3.3.5 version?

@deliverymanager
Copy link
Author

The problem is in the statementType which in my case is undefined and it is not DML.
You should probably place a param statementType to DML by default?

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

I have the exact same issue, version 3.2.5 installed
jsonFormat: true,
if jsonFormat is false, i get no response at all

Just pushed 3.3.5, which fixed the jsonFormat: false issue. This is the raw data in S3 and will looked messed up.

I am testing 3.2.5 version. Nothing changed yet.
Did you create a 3.3.5 version?

3.2.5 is currently the latest with the only change being the fix for formatJson: false.
Try querying on the sample data e.g. select * from elb_logs LIMIT 5 in sampledb

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

The problem is in the statementType which in my case is undefined and it is not DML.
You should probably place a param statementType to DML by default?

Oh, interesting. what is the specific query you are sending to Athena?

@deliverymanager
Copy link
Author

SELECT bucket_name, directory_path, file_name, comments, lambda, branch FROM repos.resources WHERE resource='s3' AND bucket_name='reporesources';

@deliverymanager
Copy link
Author

OK so the problem is the fact that the GetQueryExecution of AWS Athena, does not respond by default with a vaiue at StatementType.
So If it does not return with DML value then the json trasnformation fails.

So you should consider the following:

			const s3Output =
				queryStatus.QueryExecution.ResultConfiguration
				.OutputLocation,
				statementType = queryStatus.QueryExecution.StatementType || "DML";

which solves the whole opened issue with the data etc.

@wasserholz
Copy link

@deliverymanager I tried it out, it did not change anything for me. The problem still exists that the csv is not properly turned into json.

This is the output I get. Everything is messed up from first field on.
The raw csv output looks fine.

[{"coordinates":"\"false","retweeted":"<a href=\"\"https://www.sprinklr.com\"\" rel=\"\"nofollow\"\">Sprinklr</a>","source":"{hashtags=[], urls=[{url=https://t.co/MGEy1JOFfe, expanded_url=https://twitter.com/i/web/status/1085862649034915841, display_url=twitter.com/i/web/status/1…, indices=[116, 139]}]}","entities":"0","reply_count":"0\",,\"1085862649034915841","favorite_count":"1547724804539","geo":"true","id_str":"@Rlhoohah Thanks for your interest in this, Rich! I've made our RDS team aware of your request. Please keep an eye… https://t.co/MGEy1JOFfe","timestamp_ms":"0","truncated":"1085862649034915800","text":"false","retweet_count":"low","id":"Thu Jan 17 11:33:24 +0000 2019\",,\"false","possibly_sensitive":"en","filter_level":"Rlhoohah","created_at":"false","place":"733014462551216130","favorited":"{id=120967386, id_str=120967386, name=AWS Support, screen_name=AWSSupport, location=null, url=https://aws.amazon.com/premiumsupport/, description=We welcome AWS support questions and will guide you to the help you need., translator_type=none, protected=false, verified=true, followers_count=27029, friends_count=8, listed_count=350, favourites_count=1439, statuses_count=19059, created_at=Mon Mar 08 04:06:05 +0000 2010, utc_offset=null, time_zone=null, geo_enabled=false, lang=en, contributors_enabled=false, is_translator=false, profile_background_color=C0DEED, profile_background_image_url=http://abs.twimg.com/images/themes/theme1/bg.png, profile_background_image_url_https=https://abs.twimg.com/images/themes/theme1/bg.png, profile_background_tile=false, profile_link_color=1DA1F2, profile_sidebar_border_color=C0DEED, profile_sidebar_fill_color=DDEEF6, profile_text_color=333333, profile_use_background_image=true, profile_image_url=http://pbs.twimg.com/profile_images/1052324554171764736/LQoMp4Xr_normal.jpg, profile_image_url_https=https://pbs.twimg.com/profile_images/1052324554171764736/LQoMp4Xr_normal.jpg, profile_banner_url=https://pbs.twimg.com/profile_banners/120967386/1509397772, default_profile=true, default_profile_image=false}","lang":"0"}]

@deliverymanager
Copy link
Author

Yes after initializing the StatementType to DML in case that the response did not contain it, the only time it worked perfectly, was if I had all the key and values filled with data.

After removing some values the same problem reappeared.

Would you consider placing a module csvtojson

const csvFilePath='<path to csv file>'
const csv=require('csvtojson')
csv()
.fromFile(csvFilePath)
.then((jsonObj)=>{
    console.log(jsonObj);
    /**
     * [
     * 	{a:"1", b:"2", c:"3"},
     * 	{a:"4", b:"5". c:"6"}
     * ]
     */ 
})
 
// Async / await usage
const jsonArray=await csv().fromFile(csvFilePath);

@wasserholz
Copy link

wasserholz commented Jan 28, 2019

@deliverymanager that is exactly what I am doing now. Just deactivate jsonFormat and parse the csv with csvtojson.

@ghdna It would be great if you use csvtojson, instead of selfwritten parser. Your rule for cutting the strings is not robust enough.

function cleanUpDML(lineReader) {
	let headerList = [],
		isFirstRecord = true,
		cleanJson = [],
		noOfColumns = 0,
		singleJsonRow = {};

	return new Promise(function(resolve, reject) {
		lineReader
			.on("line", line => {
				line = line.substring(1, line.length - 1).split('","');

				if (isFirstRecord) {
					headerList = line;
					isFirstRecord = false;
				} else {
					singleJsonRow = {};
					noOfColumns = line.length;
					for (let i = 0; i < noOfColumns; i++) {
						if (line[i].length) {
							singleJsonRow[[headerList[i]]] = line[i];
						}
					}
					cleanJson.push(singleJsonRow);
				}
			})
			.on("close", function() {
				resolve(cleanJson);
			});
	});
}

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

I'm trying out these scenarios and will post an update.

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Let me tackle this one by one.
@deliverymanager : Are you able to get on a join.me session with me? I'd like to screenshare and troubleshoot this with you. I imported the data you provided and ran SELECT bucket_name, directory_path, file_name, comments, lambda, branch FROM test2 WHERE bucket_name='s3' and got

{ Items:
   [ { bucket_name: 's3',
       file_name: 'reporesources',
       lambda: 'resources.json',
       branch:
        'Uploads the resources.gz for the specific repo that comes from the webhook' } ] }

So I'd like to understand what's different between the csv you shared above and the one I imported into Athena.

@deliverymanager
Copy link
Author

The reason this works, is the fact that every key has a value.
Try it without giving value to some keys.
Just give them empty string.

PS: it is not possible to make a session right now...

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Running the same query on Athena, I can see that the keys directory_path & comments are empty. Is this what you mean by empty key values?

Athena:

Athena-Express Example

Athena-express:

Athena-Express Example

@deliverymanager
Copy link
Author

Yes exactly. Those are the empty ones

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Yes exactly. Those are the empty ones

Yeah, but as you can see in the screenshot, it's parsing it out and ignoring anything that is empty. That was the change done in 3.1.0.

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

Yes exactly. Those are the empty ones

Yeah, but as you can see in the screenshot, it's parsing it out and ignoring anything that is empty. That was the change done in 3.1.0.

That's why I'd like to see what's going on on your side. Let me know when you have 30 mins later today. There is something else going on here.

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

@wasserholz I'm trying to replicate your use case. With the create table command, when I import your .csv file, it comes up as empty in Athena - see this.

Can you confirm if this is the correct file that you are importing with your create table command?

@ghdna
Copy link
Owner

ghdna commented Jan 28, 2019

The problem is in the statementType which in my case is undefined and it is not DML.
You should probably place a param statementType to DML by default?

Published v3.4.0 that defaults to DML.

@ghdna ghdna self-assigned this Jan 28, 2019
@ghdna ghdna added the question Further information is requested label Jan 28, 2019
@wasserholz
Copy link

wasserholz commented Jan 29, 2019

@ghdna like I said, I don't use csv imports. I gave you the csv output of the table, as I have a stream input setup.
athena-express returns the correct csv raw data. That means, the data received from athena is correct. The problem is the code, that converts it to json inside athena-express.

Take the lineReader and read the csv file I sent you. Then run it through your cleanUp functions and look at the output.

I am now reading the raw csv through athena-express and just use csvtojson to convert it to json as a workaround.

@wasserholz
Copy link

@ghdna so I debugged the code now and like I said the csv parser is not robust enough.

line = line.substring(1, line.length - 1).split('","');

This line messes up.
Replace the whole process with csvtojson and it will work.

@ghdna
Copy link
Owner

ghdna commented Jan 29, 2019

Just pushed v4.0.0 that parses using csvtojson.

@ghdna ghdna closed this as completed Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants