Skip to content

Storage of Related Documents

Matthew Logan edited this page Oct 31, 2023 · 21 revisions

GWELLS has, and will continue to, accept related documentation with well construction, alteration, decommission reports. They are PDF's, JPG's, etc and are currently stored on the Z:\ LANShare:

  • \\level\s400005\WPS\

The filenames are currently of the form "WTN 69913_Well Construction.pdf" or "WTN 69953_Well Alteration.pdf" (i.e. WTN 'space' Well-Labels-Number 'underscore' type-of-document.file-type), and organized within subfolders:

  • 010000
  • 020000
  • 030000
  • ...

There are approximately (<enter thousands of documents) historical documents and although there is no public access to them, there has historically been 15,000 webpage views of "Report 1"-type reports.

In the GWELLS world, we envision buckets of documents, one for public-releasable and the other for not-yet-public-releaseable. Once staff had approved a document for release, that document would be moved to the other bucket.

See Conceptual Requirements for more details.

OpenShift Opaque Secrets are used to specify settings:

  • S3_HOST=s3.ca-central-1.amazonaws.com
  • S3_ROOT_BUCKET=gwells-docs
  • MINIO_ACCESS_KEY=<>
  • MINIO_SECRET_KEY=<>

By definition the S3_* secrets are for the publicly viewable documents. The MINIO_* secrets are for accessing the internal Minio S3 server (deployed on the gwells-minio pod). Since the Minio SDK to access both S3 servers (public and private), there will need be another pair of keys:

  • S3_ACCESS_KEY=<>
  • S3_SECRET_KEY=<>

NOTE: CSNR InfoSec has asked about security, and some it is addressed on the Amazon AWS FAQ page. The GWELLS S3 Bucket will have an additional security policy enabled (see Denies Access to AWS Based on the Source IP.

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Deny",
    "Action": "*",
    "Resource": "*",
    "Condition": {"NotIpAddress": {"aws:SourceIp": [
      "192.0.2.0/24",
      "203.0.113.0/24"
    ]}}
  }
}

How Attachments are grabbed in GWELLS

sequenceDiagram
    actor User
    participant Browser
    participant API Endpoint
    participant Controller
    participant Minio
    participant S3

    User ->> Browser: User loads Well page
    Browser ->> API Endpoint: API Call using QS Params /{resource}/{id}/files with Bearer Token
    API Endpoint ->> Controller: Request Forwarded to Controller
    Controller ->> Controller: API Checks permissions from token for private files
    Controller ->> Minio: getDocuments({id}, {resource}, private_files)
    Minio ->> Minio: ID converted to prefix based on resource type
    Minio ->> Minio: Bucket determined from ID
    Minio ->> S3: MINIO Calls list_objects for [public] files in bucket
    S3 -->> Minio: Return all files in format "{resource}{id}..." marked public
    alt authenticated user
        Minio ->> Minio Server: MINIO Calls list_objects for [private] files in bucket
    Minio Server -->> Minio: Return all files in format "{resource}{id}..." marked private
    end
    Minio -->> Controller: List of objects are iterated and formatted
    Controller -->> API Endpoint: API returns an array of objects
    API Endpoint -->> Browser: Links to filenames are displayed on the page
    Browser -->> User: User can access Files

Loading

Key Notes:

  • GWELLS uses Param strings to indicate wells, aquifers and other resources.

  • Attachments are stored in buckets based on resource type and ID

    • Resource Wells -> /WTN_
    • ID 83246 -> 080000
  • The API uses credentials obtained from the Bearer Token to determine if private files will be shared.

Conclusion:

The storage method does not need to change as its keeping buckets based on the Well ID, but by appending a date object to the end of the filename, we can modify how data is displayed on the page, prevent naming collisions, and help differentiate the objects (by date!)

Sequence Diagram

  sequenceDiagram
    participant Browser
    participant WellDetail.Vue
    participant ApiService
    participant ListFiles
    participant Shortcuts
    participant Minio

    Browser ->> WellDetail.Vue: fetchWellData() 
    WellDetail.Vue ->> ApiService: get(resource, id)
    ApiService -->> WellDetail.Vue: <<response>>
    WellDetail.Vue ->> ApiService: query(url)
    ApiService ->> ListFiles: get(self, request, label)
    ListFiles ->> Shortcuts: get_object_or_404()
    Shortcuts -->> ListFiles: <<response>>
    alt object exists
      ListFiles ->> Minio: get_documents()
    Minio -->> ListFiles: <<response>>
    end
    ListFiles -->> ApiService: <<response>>
    ApiService -->> WellDetail.Vue: <<response>> 
    WellDetail.Vue -->> Browser: <<response>>
Loading

How Attachments are added in GWELLS

  sequenceDiagram
      actor User
      participant Browser
      participant API
      participant S3

      User ->> Browser: Uploads file(s)
      loop For each file uploaded
      Browser ->> API: Request Presigned Put URL
      API ->> API: format_object_name()
      API -->> Browser: Returns URL based on ID/Resource/Filetype with 5min TTL 
      Browser ->> Browser: Strips 'Authentication' Header from response
      Browser ->> S3: Post Request  {URL and File}
      S3 -->> Browser: Request Response [HTTP]
      end
      Browser -->> User: Success Message
Loading

Key Notes:

  • GWELLS uses Param strings to indicate wells, aquifers and other resources.
  • Attachments are stored in buckets based on resource type and ID
    • Resource Wells -> /WTN_
    • ID 83246 -> 080000
  • URLS are created from the API, forwarded to the browser and sent to S3
  • Users uploading files can determine if they are private. Driller data is Always private

Conclusion

  • The API Generates a Presigned url, the url specifies the objects key, which is its filename or unique identifier in S3
    • Adding labels involves changing the functionality here
  • The API responds to the user with the presigned url generated, The User then POSTs the data to the url provided, with the file

Proposed File Naming Structure

flowchart TD
  A(WTNID-Labels-Date) --> B(Well Label Number)
  A --> E(Labels)
  A --> D(Date obtained from upload time)
  B  -- Result --> G(12345)
  E -- Result --> F(Label1 Label2 Label3...)
  D -- Result --> 12312412451515141
Loading

Data displayed as a table on

Well ID Labels Date File
1200 Well Record January 1st 1970 WTN 1200_Well Record_18515971514132.pdf
1200 Well Record June 16th 2001 WTN 1201_Well Record_9014587145235.pdf
1200 Well Record September 13th, 2018 WTN 1202_Well Record_31341241241241.pdf
Clone this wiki locally