Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bulk import/export from the local filesystem #1055

Closed
lmsurpre opened this issue May 8, 2020 · 2 comments
Closed

Support bulk import/export from the local filesystem #1055

lmsurpre opened this issue May 8, 2020 · 2 comments
Assignees

Comments

@lmsurpre
Copy link
Member

lmsurpre commented May 8, 2020

Is your feature request related to a problem? Please describe.
While less valid for true integration scenarios, sometimes its useful just to load some ndjson files from a local filesystem or a volume mount.

Describe the solution you'd like
Via configuration, it should be possible to:

  1. Allow the server to import from file urls that start with file://
  2. Export to files in a designated directory instead of exporting them to some S3-compatible object store

In fact, I'd like the local filesystem to be the "default" configuration so that we have a working import/export feature out-of-the box.

Describe alternatives you've considered
This is an alternative to the existing options that are already supported

Additional context
This would be especially handy for #1054

@prb112
Copy link
Contributor

prb112 commented Oct 9, 2020

Let's confirm this work right now, I think it's technically supported

@prb112 prb112 self-assigned this Feb 17, 2021
@prb112 prb112 added this to the Sprint 2021-03 milestone Feb 27, 2021
prb112 added a commit that referenced this issue Feb 27, 2021
- Change Runtime Check for ANTLR Version to 4.9.1
- Change confusing $import error message to reflect actual error
- Check the HttpStatusCode for $import/$export job creation for 201,
send useful message back
- Normalize the directory structure for the jbatch webapp
- Associate the jbatch webapp with the operation to normalize the code
- Move the IndexGenerator the the src/test/java to the tools package
- Remove prior FHIRBasicAuthenticator
- Tests Clients must use custom vm arguments to use the TLS key
    -Djavax.net.ssl.keyStore=/wlp/usr/servers/fhir-server/resources/security/fhirTrustStore.p12
    -Djavax.net.ssl.trustStore=/wlp/usr/servers/fhir-server/resources/security/fhirTrustStore.p12
    -Djavax.net.ssl.keyStoreType=p12
    -Djavax.net.ssl.trustStoreType=p12
    -Djavax.net.ssl.trustStorePassword=change-password
    -Djavax.net.ssl.keyStorePassword=change-password
- Remove the implied dependency on Apache CXF client in the bulkdata
client in test
- Remove DummpyImportExportImpl as it is unused
- Refactor the BulkDataClient to use Apache Http Client
- Created an Adapter to manage changes in the Configuration
- Added an Enabled Configuration Setting that is per tenant
- Added logging for the BulkData Client
- Refactor the batch-jobs definitions to include a consistent and
minimally sensitive Configuration.
- Normalize the OperationFields into a registry
- Update JobParameters to reflect the OperationFields
- Refactor Duplicate StorageType classes
- Move the tools to src/test/java
- Mark fhir-server provided in the webapp for bulkdata
- Bulk Data Audit Logging #1332
	- Add BulkAuditLogger and fhir-audit dependency
	- Add to Import
- NullPointer in ImportJobListener.afterJob #1361
	- Refactore the AfterJob ExitStatus and Reporter
	- Replicate and harden the partition handling in the
ImportPartitionMapper which feeds the transientUserData object
	- Occasionally we don't set the importResourceType on the
PartitionPlan.  I've added code to make sure this is set.
- Enforce StorageType source and destination for BulkData using
FHIR_BULKDATA_ALLOWED_TYPES #1982
	- Introduce configuration from System.env FHIR_BULKDATA_ALLOWED_TYPES
	- Add Operation specific checks for enabled/verify allowed source types
- Added Preflight Configuration Check for S3, Https, File
- Ambiguous dependencies when starting IBM FHIR Server 4.6.0 #1978
	- Handles the JobContext, StepContext to BulkDataContext using a
BatchContextAdapter
	- InjectionPoints are now specific versus generic String injection
points which cause the conflict.
	- Must inject the PartionPlan properties
- Remove the default Bucketname
- Refactoring of the fhirServer/bulkdata configuration #1390
	- Introduce Legacy Configuration
	- Introduce V2 Configuration
- Refactor Unique Calls to S3 Client into a Single Control Wrapper
- Exclude database-utils derby/db2 from the webapp.
- Job still in STARTED status after pod crash #1363
	- Add web.xml to block uncovered HTTP/S methods
	- Add ServletContextListener to Log Startup and hook into the local
BatchRuntime
- Added Logging Shunts where a try-catch are surrounding and rethrowing
the implemented JavaBatch Methods so the logging can push a message to
the main System.Out warning.
	- $import is now covered
	- Introduced StepChunkListener to unify the logging for Export/Import
- Increase logging in bulkdataclient
- Adding to TrustStore for S3
- Remove hardcoded password in the HttpWrapper
- Support bulk import/export from the local filesystem #1055
	- FileWrapper and a new file configuration enable writing to a local
file.
	- FilePreflight checks to see if the configuration supports write or
read
	- Also Add export to local file support to bulkdata export batchjob
#1211

Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
@prb112 prb112 linked a pull request Feb 27, 2021 that will close this issue
@prb112 prb112 removed a link to a pull request Mar 9, 2021
@lmsurpre
Copy link
Member Author

Import/Export from the filesystem is now the default.

Whereas I was thinking that users could just pass a url that start with file:// in their $import request, Paul opted to have them pass a separate parameter named storageDetail.
To signify an import from IBM COS, this should be set to ibm-cos (which is also the default I think). For AWS S3 (or an S3-compatible provider) its aws-s3.
And to import from the filesystem, this parameter must be set to file.
In any case, this setting must match the actual configuration on the server.
I think it was done this way to be a bit more secure (we wouldn't want users being able to request us to load just any file from the server's filesystem...it makes sense to sandbox that to some configured directory).
However, I think we'll need some additional user feedback on this feature to determine how well the current design is working.

My one complaint with this implementation is that it still isn't functional 'out-of-the-box'.
Specifically, because our default config sets "fileBase": "/fhir-server/output", it requires users to either:
A. change this default setting in fhir-server-config.json; or
B. create this absolute directory for the output

I've opened #2086 for this.

Finally, I noticed that the files created in a file-based export are placed directly in the configured fileBase directory with a filename that matches the random string that was generated for that particular export. This is probably ok, but I think it might be better for each export to have its own subdirectory under the configured fileBase directory. For this, I opened #2087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants