Support bulk import/export from the local filesystem #1055

lmsurpre · 2020-05-08T18:28:53Z

Is your feature request related to a problem? Please describe.
While less valid for true integration scenarios, sometimes its useful just to load some ndjson files from a local filesystem or a volume mount.

Describe the solution you'd like
Via configuration, it should be possible to:

Allow the server to import from file urls that start with file://
Export to files in a designated directory instead of exporting them to some S3-compatible object store

In fact, I'd like the local filesystem to be the "default" configuration so that we have a working import/export feature out-of-the box.

Describe alternatives you've considered
This is an alternative to the existing options that are already supported

Additional context
This would be especially handy for #1054

The text was updated successfully, but these errors were encountered:

prb112 · 2020-10-09T00:35:52Z

Let's confirm this work right now, I think it's technically supported

- Change Runtime Check for ANTLR Version to 4.9.1 - Change confusing $import error message to reflect actual error - Check the HttpStatusCode for $import/$export job creation for 201, send useful message back - Normalize the directory structure for the jbatch webapp - Associate the jbatch webapp with the operation to normalize the code - Move the IndexGenerator the the src/test/java to the tools package - Remove prior FHIRBasicAuthenticator - Tests Clients must use custom vm arguments to use the TLS key -Djavax.net.ssl.keyStore=/wlp/usr/servers/fhir-server/resources/security/fhirTrustStore.p12 -Djavax.net.ssl.trustStore=/wlp/usr/servers/fhir-server/resources/security/fhirTrustStore.p12 -Djavax.net.ssl.keyStoreType=p12 -Djavax.net.ssl.trustStoreType=p12 -Djavax.net.ssl.trustStorePassword=change-password -Djavax.net.ssl.keyStorePassword=change-password - Remove the implied dependency on Apache CXF client in the bulkdata client in test - Remove DummpyImportExportImpl as it is unused - Refactor the BulkDataClient to use Apache Http Client - Created an Adapter to manage changes in the Configuration - Added an Enabled Configuration Setting that is per tenant - Added logging for the BulkData Client - Refactor the batch-jobs definitions to include a consistent and minimally sensitive Configuration. - Normalize the OperationFields into a registry - Update JobParameters to reflect the OperationFields - Refactor Duplicate StorageType classes - Move the tools to src/test/java - Mark fhir-server provided in the webapp for bulkdata - Bulk Data Audit Logging #1332 - Add BulkAuditLogger and fhir-audit dependency - Add to Import - NullPointer in ImportJobListener.afterJob #1361 - Refactore the AfterJob ExitStatus and Reporter - Replicate and harden the partition handling in the ImportPartitionMapper which feeds the transientUserData object - Occasionally we don't set the importResourceType on the PartitionPlan. I've added code to make sure this is set. - Enforce StorageType source and destination for BulkData using FHIR_BULKDATA_ALLOWED_TYPES #1982 - Introduce configuration from System.env FHIR_BULKDATA_ALLOWED_TYPES - Add Operation specific checks for enabled/verify allowed source types - Added Preflight Configuration Check for S3, Https, File - Ambiguous dependencies when starting IBM FHIR Server 4.6.0 #1978 - Handles the JobContext, StepContext to BulkDataContext using a BatchContextAdapter - InjectionPoints are now specific versus generic String injection points which cause the conflict. - Must inject the PartionPlan properties - Remove the default Bucketname - Refactoring of the fhirServer/bulkdata configuration #1390 - Introduce Legacy Configuration - Introduce V2 Configuration - Refactor Unique Calls to S3 Client into a Single Control Wrapper - Exclude database-utils derby/db2 from the webapp. - Job still in STARTED status after pod crash #1363 - Add web.xml to block uncovered HTTP/S methods - Add ServletContextListener to Log Startup and hook into the local BatchRuntime - Added Logging Shunts where a try-catch are surrounding and rethrowing the implemented JavaBatch Methods so the logging can push a message to the main System.Out warning. - $import is now covered - Introduced StepChunkListener to unify the logging for Export/Import - Increase logging in bulkdataclient - Adding to TrustStore for S3 - Remove hardcoded password in the HttpWrapper - Support bulk import/export from the local filesystem #1055 - FileWrapper and a new file configuration enable writing to a local file. - FilePreflight checks to see if the configuration supports write or read - Also Add export to local file support to bulkdata export batchjob #1211 Signed-off-by: Paul Bastide <pbastide@us.ibm.com>

lmsurpre · 2021-03-12T22:40:30Z

Import/Export from the filesystem is now the default.

Whereas I was thinking that users could just pass a url that start with file:// in their $import request, Paul opted to have them pass a separate parameter named storageDetail.
To signify an import from IBM COS, this should be set to ibm-cos (which is also the default I think). For AWS S3 (or an S3-compatible provider) its aws-s3.
And to import from the filesystem, this parameter must be set to file.
In any case, this setting must match the actual configuration on the server.
I think it was done this way to be a bit more secure (we wouldn't want users being able to request us to load just any file from the server's filesystem...it makes sense to sandbox that to some configured directory).
However, I think we'll need some additional user feedback on this feature to determine how well the current design is working.

My one complaint with this implementation is that it still isn't functional 'out-of-the-box'.
Specifically, because our default config sets "fileBase": "/fhir-server/output", it requires users to either:
A. change this default setting in fhir-server-config.json; or
B. create this absolute directory for the output

I've opened #2086 for this.

Finally, I noticed that the files created in a file-based export are placed directly in the configured fileBase directory with a filename that matches the random string that was generated for that particular export. This is probably ok, but I think it might be better for each export to have its own subdirectory under the configured fileBase directory. For this, I opened #2087

prb112 mentioned this issue Jun 10, 2020

Add export to local file support to bulkdata export batchjob #1211

Closed

lmsurpre added the bulk-data label Jul 24, 2020

lmsurpre mentioned this issue Jul 24, 2020

Import and Export - Confirm the Operation Outcomes and Automate the Tests #1054

Closed

prb112 self-assigned this Feb 17, 2021

prb112 added this to the Sprint 2021-03 milestone Feb 27, 2021

prb112 mentioned this issue Feb 27, 2021

Improve ImportExport with JavaBatch #1999

Merged

prb112 linked a pull request Feb 27, 2021 that will close this issue

Improve ImportExport with JavaBatch #1999

Merged

prb112 removed a link to a pull request Mar 9, 2021

Improve ImportExport with JavaBatch #1999

Merged

lmsurpre mentioned this issue Mar 12, 2021

File-based exports should be organized by subdirectory #2087

Closed

lmsurpre closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support bulk import/export from the local filesystem #1055

Support bulk import/export from the local filesystem #1055

lmsurpre commented May 8, 2020

prb112 commented Oct 9, 2020

lmsurpre commented Mar 12, 2021

Support bulk import/export from the local filesystem #1055

Support bulk import/export from the local filesystem #1055

Comments

lmsurpre commented May 8, 2020

prb112 commented Oct 9, 2020

lmsurpre commented Mar 12, 2021