Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/vsiswift/ large file support (segments) #2202

Closed
constantinius opened this issue Jan 31, 2020 · 14 comments
Closed

/vsiswift/ large file support (segments) #2202

constantinius opened this issue Jan 31, 2020 · 14 comments

Comments

@constantinius
Copy link
Contributor

The OpenStack swift protocol foresees the splitting of large files into "segments". The VSI Swift implementation does not seem to recognize these files, so my guess is that there is some client side functionality missing to support this. Read support would be nice for start, for writing, there are other tools.

As far as I can tell, the reading side of this would just entail using the X-Object-Manifest header and some JSON parsing and interpreting.

How much effort would that entail? I would be willing to contribute.

@rouault
Copy link
Member

rouault commented Jan 31, 2020

I'm discovering this.
For the "dynamic large objects", for the reading side, I'd say that it should work out of the box from the example they mention

# And now we can download the segments as a single object
curl -H 'X-Auth-Token: <token>'         http://<storage_url>/container/myobject

And for "static large objects", it would seem to be also the case: "A GET request to the manifest object will return the concatenation of the objects from the manifest much like DLO"
But maybe I'm wrong

@constantinius
Copy link
Contributor Author

constantinius commented Jan 31, 2020

Thanks @rouault for looking into this.

We have the strange issue that we can access large TIFFs (i.e open them) but the actual raster access fails. We get error messages like

band 1: IReadBlock failed at X offset 4, Y offset 4: TIFFReadEncodedTile() failed
band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedTile() failed
Corrupt, empty or missing file
ERROR 1: TIFFFetchDirectory: Can not read TIFF directory count

Also, the gdalinfo produces unfamiliar output:

Driver: GTiff/GeoTIFF
Files: /vsiswift/...
Size is 39876, 30862
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (13.711828703703706,45.716439814814812)
Pixel Size = (0.000004629629630,-0.000004629629630)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=PIXEL
GTiff: ScanDirectories()
VSICURL: Request at offset 748, after end of file
ERROR 1: TIFFFetchDirectory:/vsiswift/...: Can not read TIFF directory count
ERROR 1: TIFFReadDirectory:Failed to read directory at offset 748
VSICURL: Request at offset 200, after end of file
ERROR 1: TIFFFetchDirectory:/vsiswift/...: Can not read TIFF directory count
ERROR 1: TIFFReadDirectory:Failed to read directory at offset 200
OGRCT: Source: +proj=longlat +datum=WGS84 +no_defs
OGRCT: Target: +proj=longlat +datum=WGS84 +no_defs
Corner Coordinates:
Upper Left  (  13.7118287,  45.7164398) ( 13d42'42.58"E, 45d42'59.18"N)
Lower Left  (  13.7118287,  45.5735602) ( 13d42'42.58"E, 45d34'24.82"N)
Upper Right (  13.8964398,  45.7164398) ( 13d53'47.18"E, 45d42'59.18"N)
Lower Right (  13.8964398,  45.5735602) ( 13d53'47.18"E, 45d34'24.82"N)
Center      (  13.8041343,  45.6450000) ( 13d48'14.88"E, 45d38'42.00"N)
Band 1 Block=512x512 Type=UInt16, ColorInterp=Red
  Image Structure Metadata:
    NBITS=12
Band 2 Block=512x512 Type=UInt16, ColorInterp=Green
  Image Structure Metadata:
    NBITS=12
Band 3 Block=512x512 Type=UInt16, ColorInterp=Blue
  Image Structure Metadata:
    NBITS=12
Band 4 Block=512x512 Type=UInt16, ColorInterp=Undefined
  Image Structure Metadata:
    NBITS=12

When inspecting via CPL_CURL_VERBOSE we can see a bunch of range requests, that all produce partial responses (206). I have not investigated the actual binary responses, and to be honest don't know how to. I'll investigate further next week.

thanks again.

@rouault
Copy link
Member

rouault commented Jan 31, 2020

206 responses are nominal when doing Range requests, so that's not a marker of the issue you encounter here

@constantinius
Copy link
Contributor Author

Yes, I know, I wanted to express that I think that there is something fishy with the returned byteranges.

@talaj
Copy link
Contributor

talaj commented Feb 12, 2020

I'm also facing an issue with segmented objects but different one than @constantinius. For me it doesn't work at all.

Gdal makes a call to list all objects in a bucket, something like

https://<storage url>/dem_test?delimiter=%2F&limit=10000

Which in my case returns

[{
"hash": "d41d8cd98f00b204e9800998ecf8427e",
"last_modified": "2020-02-12T13:19:10.286260",
"bytes": 0,
"name": "world_gmerc_611m_ocean.tif",
"content_type": "image/tiff"
}]

The problem here is "bytes": 0 which causes gdal to fail with

VSICURL: Request at offset 0, after end of file
ERROR 4: `/vsiswift/dem_test/world_gmerc_611m_ocean.tif' not recognized as a supported file format.

I'm wondering why does it at least read something in case of @constantinius. Maybe different version of OpenStack?

Otherwise, as @rouault writes, the object GET interface works the same whether it is segmented object or not.

@talaj
Copy link
Contributor

talaj commented Feb 12, 2020

I tried to "hardcode" the content size of that particular object I was testing into gdal and then gdalinfo worked fine.

@rouault
Copy link
Member

rouault commented Feb 12, 2020

The problem here is "bytes": 0

Yes, GDAL will trust that value. That sounds like a bug on the server side to me, if the rest of the API works transparently for segmented objects

@talaj
Copy link
Contributor

talaj commented Feb 13, 2020

@rouault I found the problem - it was on my side. I was uploading it as Dynamic Large Object (DLO). These have "bytes": 0 in the listing, which makes some sense. If I used Static Large Object (SLO), the listing contains actual object size. There is an option in python swift client for that:

swift upload --use-slo -S 524288000 dem_test world_gmerc_611m_ocean.tif

So for me gdal works with SLO just fine.

@constantinius
Copy link
Contributor Author

I can confirm that it works with the --use-slo option. I assume a server side issue when dealing with DLOs but will not waste more time to investigate further.

Thanks for your help!

@constantinius
Copy link
Contributor Author

@rouault What do you think shall we do with this issue? It seems like this is an issue with Swift, but it seems like I'm not the only one experiencing it.

Shall we close this issue as there is a workaround? Shall we add some hint in the documentation?

@rouault
Copy link
Member

rouault commented Feb 14, 2020

Shall we add some hint in the documentation?

might be good if enhancements to deal with DLOs aren't considered for now

@rouault
Copy link
Member

rouault commented Apr 5, 2020

@constantinius ping

@constantinius
Copy link
Contributor Author

I'll prep a PR for the doc update (hint to use the switch when uploading the file).
I did not come up with any more concrete information.

@rouault
Copy link
Member

rouault commented Apr 7, 2020

Fixed per doc addition of #2385

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants