-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PLUGIN-632] - Added: ServiceAccountType as JSON, unit tests, validations #13
[PLUGIN-632] - Added: ServiceAccountType as JSON, unit tests, validations #13
Conversation
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please test source and sink plugins in the following scenarios:
- In CDF with file path set to auto-detect.
- In CDF, specify file path
- In CDF, specify Service Account JSON
- In CDAP, specify file path
- In CDAP, specify Service Account JSON
- Test with service account type, file path/JSON set as macro param.
@@ -18,6 +18,9 @@ | |||
|
|||
import com.github.rholder.retry.RetryException; | |||
import com.google.api.services.drive.model.File; | |||
import com.google.gson.JsonObject; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these imports added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These imports were left behind after we did some additional tests. Thanks for letting me know. I removed them.
@@ -27,7 +27,7 @@ | |||
* Util class for building pipeline schema. | |||
*/ | |||
public class SchemaBuilder { | |||
public static final String SCHEMA_ROOT_RECORD_NAME = "FileFromFolder"; | |||
public static final String SCHEMA_ROOT_RECORD_NAME = "etlSchemaBody"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema was not being updated in Google Drive Source and initially we thought that this was the case, but then we found out that the issue was related with widget, not with SCHEMA_ROOT_RECORD_NAME
. Reverted to its initial name in the updated PR.
@@ -98,7 +105,7 @@ protected Credential getCredentials() throws IOException { | |||
} | |||
|
|||
protected List<String> getRequiredScopes() { | |||
return Collections.singletonList(DriveScopes.DRIVE_READONLY); | |||
return Collections.singletonList(DriveScopes.DRIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did we need to make this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also as part of the testings that we have done. We were dealing with this error message when testing the pipeline:
404 Not Found { "code" : 404, "errors" : [ { "domain" : "global", "location" : "fileId", "locationType" : "parameter", "message" : "File not found: 1ZqAr0q6gITvjqvfOJqkSIxd-usTNhvJy.", "reason" : "notFound" } ], "message" : "File not found: 1ZqAr0q6gITvjqvfOJqkSIxd-usTNhvJy." }. Please check the system logs for more details.
...and checking the documentation of Google Drive API for 404 error type, stating that:
To fix this error:
1. Inform the user they don't have read access to the file or that the file doesn't exist.
2. Instruct the user to ask the owner for permission to the file.
...we changed the SCOPE from DRIVE_READONLY
to DRIVE
.
But afterwards, when we verified that in fact we were not in the right path when testing the pipelines and DRIVE_READONLY was enough for read/write, we changed back to its initial SCOPE.
In the last commit, we reverted back the scope to DRIVE_READONLY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With SCOPE set to DRIVE
does everything work correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, everything also works correctly with SCOPE set to DRIVE_READONLY
, that's why we reverted back the code.
private boolean validateAccountFilePath(FailureCollector collector) { | ||
if (!containsMacro(ACCOUNT_FILE_PATH)) { | ||
private boolean validateServiceAccount(FailureCollector collector) { | ||
if (isServiceAccountFilePath() != null && isServiceAccountFilePath()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store the return value in a variable so you don't have to call isServiceAccountFilePath()
twice. Same thing for isServiceAccountJson()
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated.
google-drive/src/main/java/io/cdap/plugin/google/drive/source/GoogleDriveInputFormatProvider.java Lines 40 to 43 in a9be711
After debugging this issue, we found out that the entire config was serialized. Taking a consideration that we have to do a deserialization of the entire config and this is time consuming (after this fix, the entire functionality of source and sink plugin needs to be tested), we removed Macros at this moment of time. We were told that the requirements were prioritized and the fix needs to be delivered as soon as possible. However, we can proceed with the deserialization of config and we can include Macros in this PR if needed. |
@flakrimjusufi |
Also are you noticing permissions issues at pipeline runtime? If so, could you please make sure dataproc service account (GCE default service account) has permissions to access drive directories/files. |
Macros are supported, so we should make sure they work correctly. Please fix it as part of this PR. |
With our partnership accounts, we don't have access in any of gmail, google drive, etc. So, to test Google Drive Source/Sink in data proc, I need one of the followings:
Permission issues are showing up in validation when file path is set to |
In the last update, we deserialized the config and added Macros. |
|
||
@Nullable | ||
public String getServiceAccountFilePath() { | ||
if (containsMacro(ACCOUNT_FILE_PATH) || accountFilePath == null || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace accountFilePath == null || accountFilePath.isEmpty() -> Strings.insNullOrEmpty(accountFilePath)
} | ||
|
||
@Nullable | ||
public String getServiceAccount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in GoogleBasicAuthConfig.java where you're checking if the returned string is null or empty. The string value is unused.
It seems this is unnecessary. You just need something like this:
if (config.isServiceAccountFilePath()) {
...
} else if (config. isServiceAccountJson()) {
...
} else {
// use default credentials
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in the last update.
import org.junit.Test; | ||
|
||
public class GoogleAuthBaseConfigTest { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add tests for validation error handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for validation error handling in the last update.
In the last commit, GoogleSheets plugins are also updated with serviceAccountType as JSON, macros and documentation. Both Source and Sink were tested thoroughly and everything is working well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an ETL test with macros for google drive and sheets?
public static final String NAME_SERVICE_ACCOUNT_JSON = "serviceAccountJSON"; | ||
public static final String SERVICE_ACCOUNT_FILE_PATH = "filePath"; | ||
public static final String SERVICE_ACCOUNT_JSON = "JSON"; | ||
public static final String SCHEMA = "schema"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCHEMA
is used in deserialization of config:
google-drive/src/main/java/io/cdap/plugin/google/drive/source/GoogleDriveSourceConfig.java
Lines 358 to 360 in 3074341
if (properties.has(GoogleDriveSourceConfig.SCHEMA)) { | |
googleDriveSourceConfig.setSchema(properties.get(GoogleDriveSourceConfig.SCHEMA).getAsString()); | |
} |
/** | ||
* Returns the instance of Schema. | ||
* @return The instance of Schema | ||
*/ | ||
public Schema getSchema() { | ||
if (schema == null) { | ||
if (dataSchemaInfo.isEmpty()) { | ||
throw new RuntimeException("There are no headers to process. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not relevant anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look relevant to me because we are already checking if Source folder (directory) contains any spreadsheets files in here:
google-drive/src/main/java/io/cdap/plugin/google/sheets/source/GoogleSheetsSourceConfig.java
Lines 348 to 354 in 3074341
private void validateSourceFolder(FailureCollector collector, List<File> spreadsheetsFiles) { | |
if (spreadsheetsFiles.isEmpty()) { | |
collector.addFailure(String.format("No spreadsheets found in '%s' folder with '%s' filter.", | |
getDirectoryIdentifier(), getFilter()), null) | |
.withConfigProperty(DIRECTORY_IDENTIFIER).withConfigProperty(FILTER); | |
} | |
} |
...but I will revert this RuntimeException in getSchema().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm why are we throwing a RuntimeException here? If config validation fails, we should add the failure to the failure collector (pass in FailureCollector, call collector.addFailure()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR updated.
throw new RuntimeException("There are no headers to process. " + | ||
"Perhaps no validation step was executed before schema generation."); | ||
} | ||
if (schema == null && !containsMacro("serviceAccountType")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if service account type is not macro (set to default, i.e. file path), but the file path is a macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. In this scenario the validation would fail. In the next update, I will also check if file path or service acccount json contains Macro.
I'm seeing lots of dependencies that are missing in |
You can add a test in a separate PR if that's easier for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash commits
…ions, documentation and re-factoring of GoogleDrive and GoogleSheets plugin
9892bea
to
2e92d32
Compare
Commits squashed. |
Make Google drive/sheet plugins production ready
JIRA Ticket: https://cdap.atlassian.net/browse/PLUGIN-632