Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upload Improvements] Always send a zip stream to the Importer #8638

Closed
afabiani opened this issue Jan 19, 2022 · 1 comment
Closed

[Upload Improvements] Always send a zip stream to the Importer #8638

afabiani opened this issue Jan 19, 2022 · 1 comment
Labels
enhancement master performance Issues regarding server Performance and Speed

Comments

@afabiani
Copy link
Member

afabiani commented Jan 19, 2022

Currently everytime we need to add some files to an Importer session, we are relaying on the upload_task client library method, which is something like this:

    def upload_task(self, files, use_url=False, initial_opts=None):
        """create a task with the provided files
        files - collection of files to upload or zip file
        use_url - if true, post a URL to the uploader
        """
        # @todo getting the task response updates the session tasks, but
        # neglects to retreive the overall session status field
        fname = os.path.basename(files[0])
        _, ext = os.path.splitext(fname)

        def addopts(base):
            if initial_opts:
                # pass options in as value:key parameters, this allows multiple
                # options per key
                base = base + '&' + '&'.join(['option=%s:%s' % (v, k) for k, v in initial_opts.items()])
            return base
        if use_url:
            if ext == '.zip':
                upload_url = files[0]
            else:
                upload_url = os.path.dirname(files[0])
            url = self._url("imports/%s/tasks?expand=3" % self.id)
            upload_url = "file://%s" % os.path.abspath(upload_url)
            resp = self._client().post_upload_url(url, upload_url)
        elif ext == '.zip':
            url = self._url("imports/%s/tasks/%s?expand=3" % (self.id, fname))
            resp = self._client().put_zip(addopts(url), files[0])
        else:
            url = self._url("imports/%s/tasks?expand=3" % self.id)
            resp = self._client().post_multipart(addopts(url), files)
 ...

From that code snippet we can clearly see that:

  1. The method always looks for files present on the local storage, and currently this is mandatory because the GeoServer Importer is not able to read streams from different sources.
  2. GeoNode never uses the option use_url=True, meaning that it always assumes the GeoServer Importer must be able to access the files on a shared file-system

The proposal is to change this by:

  1. Zip the spatial files to send to the Importer on a local path, possibly managed by the StorageManager
  2. Stream the zip file to the Importer so that we don't need a shared file-system between GeoNode and GeoServer anymore
  3. Make sure the local zipped files are cleared once finished
@afabiani afabiani added performance Issues regarding server Performance and Speed master feature A new feature to be added to the codebase enhancement and removed feature A new feature to be added to the codebase labels Jan 19, 2022
@mattiagiupponi
Copy link
Contributor

Import flow has been changed via #10474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement master performance Issues regarding server Performance and Speed
Projects
None yet
Development

No branches or pull requests

2 participants