Skip to content

S106 init#1

Merged
eschultink merged 49 commits intomainfrom
s106-init
Oct 6, 2021
Merged

S106 init#1
eschultink merged 49 commits intomainfrom
s106-init

Conversation

@eschultink
Copy link
Member

Initial push of working version into psoxy github repo. Includes:

  • various project files, including setup for repo (Github) / IDE (intellij)
  • terraform code to setup infra (with example config for a personal dev project)
  • CI test suite to cover terraform code (just init, validate) and java code (basic tests of pseudonymization/redaction of examples)
  • working proxy deployments for gmail/google-chat, with example invocations via curl

@@ -0,0 +1,3 @@
SOURCE: gmail
TARGET_HOST: gmail.googleapis.com
OAUTH_SCOPES: https://www.googleapis.com/auth/gmail.metadata
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these files fill environment variables for the cloud function. in essence, they are a config.

README.md Outdated
## Development

Can run locally via IntelliJ + maven, using run config:
- `psoxy - run gmail (located in `.idea/runConfigurations`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing backtick

Copy link
Contributor

@aperez-worklytics aperez-worklytics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, just some comments. Impressive work, congrats!

public class PrebuiltSanitizerOptions {

static final Sanitizer.Options GMAIL_V1 = Sanitizer.Options.builder()
.pseudonymization(Pair.of("\\/gmail\\/v1\\/users\\/.*?\\/messages\\/.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed and maybe not for the PoC, but that will be better if parametrized in files

*/
@Builder
@Value
public class Pseudonym {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again and not for now, but we could create a shared package for pseudonym thing to we can reuse code for creating hashes between psoxy and worklytics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, probably sooner than later

import java.util.stream.Collectors;

@Log
public class Route implements HttpFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, function is not actually routing as we are transforming the results, I'd rename it as Sanitize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 well, it does something beyond sanitizing:

  • route the request to the source API (removes the host + cloud function name; adds the source API host)
  • add authentication for source, as needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Route is ok, other names could be Transponder (transmit and responds, more or less) or ApiCallTranslator(probably more accurate than router)

com.google.api.client.http.HttpRequest sourceApiRequest =
requestFactory.buildGetRequest(targetUrl);

//TODO: what headers to forward???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that can be included as part of the config

.setConnectTimeout(SOURCE_API_REQUEST_CONNECT_TIMEOUT)
.setReadTimeout(SOURCE_API_REQUEST_READ_TIMEOUT);

//q: add exception handlers for IOExceptions / HTTP error responses, so those retries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it should happen in both but with different purpose... proxy should check the request and then worklytics should check the connection against proxy as any other external endpoint.
Proxy should return the status of the request to the source as it is done in L132; if after consuming its retries source request still not working I'll return a 500 or 502 + header. In Worklytics with that code and with the header we may know that we should not retry the connection again, as something happened.


Configuration jsonConfiguration;

public void initConfiguration() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm so better if this is the class constructor instead of an init method that should be called by someone

// return response
response.setStatusCode(sourceApiResponse.getStatusCode());

if (sourceApiResponse.getStatusCode() == HttpStatusCodes.STATUS_CODE_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better with Response.Status.Family.familyOf(sourceApiResponse.getStatusCode()).equals(Response.Status.Family.SUCCESSFUL); as it will cover all 2xx status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which pkg is this from? base java, or we need to add a dependency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this? https://docs.oracle.com/javaee/7/api/javax/ws/rs/core/Response.html

that requires us to bring in the jax-rs API. not sure that's better than just checking the 200-300 range.

I do question whether it's what we want, as 204 is conventionally 'No Content' right, so will our attempt to parse content crash?

} else {
//write error, which shouldn't contain PII, directly
//TODO: could run this through DLP to be extra safe
sourceApiResponse.getContent().transferTo(response.getOutputStream());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add log here to catch info about why request was not successful; the customer could check the function logs to have more details



GenericUrl buildTarget(HttpRequest request) {
String targetUri = "https://"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Jose mentioned yesterday, better with a uriBuilder

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, i'll improve a bit by using URL - but URLBuilder (java) would require us to parse the query string for the request, and then add it back via addParam - seems pointless

URIBuilder is apache http-client, which isn't currently a dependency of this repo


Pseudonym.PseudonymBuilder builder = Pseudonym.builder();
String canonicalValue;
//q: this auto-detect a good idea? Or invert control and let caller specify with a header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment it is, I think. Let's see how this evolve in the future; probably as you say we could specify what we want as a header - and if not specified use the default, which is the one implemented here

@eschultink eschultink merged commit 1259c53 into main Oct 6, 2021
@eschultink eschultink deleted the s106-init branch October 6, 2021 16:58

project_id = var.project_id
connector_service_account_id = "psoxy-gmail-dwd"
display_name = "Psoxy Connector - GMail Dev Erik"
Copy link
Member

@jlorper jlorper Oct 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use developer name as variable? Whenever there is "Erik" replace (maybe done in other PR)


# todo needed bc as of Sept 2021, no way to expose secret via Cloud Function Maven plugin or
# terraform
resource "local_file" "todo" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +6 to +7
"per connection" basis (connection in this context is defined as the abstract concept of ongoing
data import from a data source to a Worklytics account).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth to do a glossary markdown file?

Comment on lines +16 to +20
<gcpProjectId>psoxy-dev-erik</gcpProjectId>
<sourceApi>gmail</sourceApi>
<!-- serviceAccount that function will run as. For Google Workspace use cases, should be the
one configured as OAuth Client and authorized for your instance. -->
<serviceAccount>psoxy-gmail-dwd@psoxy-dev-erik.iam.gserviceaccount.com</serviceAccount>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all "erik" related stuff ok for ease development, but should be wiped out and use parameters in the future

Comment on lines +32 to +33
final int SOURCE_API_REQUEST_CONNECT_TIMEOUT = 30_000;
final int SOURCE_API_REQUEST_READ_TIMEOUT = 300_000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the amount, I assume are millis. But good to document or set as suffix

Comment on lines +89 to +95
void initSanitizer() {
//TODO: pull salt from Secret Manager
sanitizer = new SanitizerImpl(
PrebuiltSanitizerOptions.MAP.get(getRequiredConfigProperty(ConfigProperty.SOURCE))
.withPseudonymizationSalt("salt")
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably done already, but remember the static initializer to provide global context

Comment on lines +158 to +161
String path =
request.getPath()
.replace(System.getenv(RuntimeEnvironmentVariables.K_SERVICE.name()) + "/", "")
+ request.getQuery().map(s -> "?" + s).orElse("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use UriBuilder

Copy link
Member

@jlorper jlorper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, sorry for the late review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants