-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[meta-ticket] GDAL 4.0 potential changes related to API or behaviour breaks #8440
Comments
|
|
About this default without overviews, it can be really frustrating. |
Is this any different than with a stripped tif also without overviews ? |
When answering to users questions I have been thinking often that if we had no defaults at all then users should first think what they want. But I do not really want to write every time things like BLOCKXSIZE=256 BLOCKYSIZE=256. With the data that I play with I would say that TILED=YES could be a better default but I am not sure about COMPRESS=LZW. But LZW does make files smaller and maybe it does not make any harm in common use cases. What makes it better than DEFLATE? |
yes, scanline organized TIFFs are faster to downscale (when using nearest neighbour), as GDAL will only try to access 1 line over N if downscaling by a factor of N. For tiled datasets, unless N is at least twice the block height, you need to read all tiles. |
There are tons of settings that should be set then: for a multiband dataset, INTERLEAVE ; for a non-Byte dataset, endianness; for compressed method, a number of compression options, etc. Think of JPEG2000 which has like 20 independent settings. For small enough datasets, in general you don't care about the settings. They become more relevant for big datasets where they have visible effects.
Thought to be faster to compress/decompress. But just trying on one dataset, I find deflate to be actually faster (using libdeflate) |
Definitely. |
I don't know of any software that does not support LZW. I do know at least one (IDL) that does not support deflate. Aside from that I don't have any strong preference over one or another, especially since deflate has become so much faster with Even's libdeflate integration. |
Yes, I was not serious. Maybe the compression is the default that astonish users most and by changing the default we could save gigantic amout of bits in the world because so many users run with the defaults. There are loads of questions like this https://gis.stackexchange.com/questions/310149/gdal-nearblack-increases-file-size-is-this-expected-behaviour. One could say that we should just use reasonable defaults but what is reasonable for one may not be that for the others. A lossless compression that does not alter data feels like a good default for GeoTIFF, but JPEG2000 drivers use lossy compression by default and I think that it is reasonable, too. Now thinking, maybe a bit questionable because someone may have lost the smallest details from their data unintentionally. On the other hand, 30% reduction in file size with one of the best compression methods that exist could feel disappointing for the users. When it comes to this ticket as a whole, the GeoTIFF defaults may not be the most important items on the list. But because they are so concrete it is natural that they gather many comments. |
There is one I would like to propose - although I would almost certainly have to implement it: API Behaviour change
DiscussionThere is an incompatibility between SWIG generated C# code and Mono AOT compilers that cause failures when there are certain types of callbacks. See swig/swig#1262 and related. The particular usage scenario that I have come across is
This affects any AOT in Mono. The main consumer of this is probably IL2CPP in Unity but it is also a problem in Xamarin on iOS. The fixes - as discussed here https://forum.htc.com/topic/9139-unity-il2cpp-and-callbacks/ are not complicated and actually are a matter of metadata as opposed to functional changes. I have been implementing the changes manually for the UPM Package for GDAL that I support - without any problems so far. Doing this manually is less than ideal but I have been reluctant to port this change into the main repo because of the potential for breaking something that is not being tested (i.e. the "you don't know what you don't know" problem). I would say that 4.0 would be the time to implement the change. |
Are you sure this is still true, looks like reference docs indicate IDL can now write Deflate (would be weird to not read) https://www.nv5geospatialsoftware.com/docs/WRITE_TIFF.html |
@wildintellect support for deflate seems to have been added in version 8.8.2, released in march 2022, which can be considered very recent given their release velocity and adoption rate. |
I'd like to suggest ditching |
attempt at fixing that in #8718 . Apparently on Windows setuptools create .exe launchers |
Adding another potential topic:
|
Today the default TILED=YES came up again in the mailing list. Based on his answer, I have the impression that @tbonfort was not aware of the consequences of using tiles without overview. And probably many users as well. However it is really frustrating, even something not huge like 10k * 10k pixels. I have seen that several times at work. I want to think there are two types of users: those who really go into the options, and configure the driver a lot; and those who just create a geotiff using the minimal configuration... because it is complicated. (at the beginning I was the latter, I want to think that now I am the former). For the first type we have all those options. It is really powerful. For the second type they rely on the defaults. It is simpler. |
This is not a hill I am willing to die on. Large tiffs that should have overviews but are missing them suck, and in that case stripped tiffs suck a bit less. In all other cases tiled tiffs are a better option, but my fingers have sufficient muscle memory to type |
When it comes to potential GDAL 4.0 changes I think that the TILED=YES or TILED=NO is not the most critical topic. Defaults can never be optimal for all users. My opinion is that TILED=YES would be better for more users than TILED=NO, but I do not know all users. My typical use case for smaller or bigger TIFFs is to look at them at the resolution that is close to the native resolution, either on the screen or by using them as source data for WMS.
I tried to re-produce with an image of "Size is 48000, 24000", file size on disk 3.5 GB. It takes 5 seconds to open with QGIS on Windows, (Intel64 Family 6 Model 142 Stepping 10 GenuineIntel ~1910 Mhz, 32 GB of memory, SSD drive). It does feel a bit slow but acceptable. Panning when zoomed close to the native resolution is naturally fast. Maybe a part of your frustration is caused by the hardware. I know from gis.stackexchange that it is not uncommon to have severe troubles when trying to warp or otherwise process big striped TIFF files. In that case the only solution is to re-write the image as tiled because adding overviews does not fix the processing issue even it helps with viewing. The biggest problem with changing the default of TILED= is the change itself. Users will continue to use versions 3.x and 4.x side by side for several years and during that time those users who rely on the defaults would not know what will happen because a) users do not usually know what GDAL version they are using and b) users should be aware about the change of the default value between v3 and v4. That would certainly frustrate some users. Maybe the majority of users would not notice anything. |
making me wondering if a (non-breaking) improvement could not be to extend the GDAL configuration file to have a new section where could could add:
that would be used for command line utilities. I believe they should display a message in the console in non quiet mode to recall those options $ gdal_translate in.tif out.tif That said it might perhaps only be desirable to use them for a command directly started from an interactive shell, and not from a script, and looking a bit, it seems tricky/impossible to detect reliably in which case we are, and even if we can, it might be too confusing. |
Thanks Even for looking into it, but I think that the issues arising from the variability between environments with different configuration files would be more problematic than having to repeatedly type a few creation options. |
Other ideas that have been floated around in the past:
|
Can we drop some of the deprecated functions like |
Other candidate: removing support for direct calls to python bindings's setup.py script, and rather use "pip install" or other "modern"/recommended way of packaging Python ? (expert with 20 years of Python packaging experience required). Cf #8926 |
Here's a sorta breaking change that shouldn't be too controversial: make My opinion is that the option to fully download the resource at a URI and open it from a temporary location should exist at a level above GDALOpenEx, not within GDALOpenEx or driver code. |
gdaladdo / GDALBuildOverviews does not take creation options, but instead needs to pass these as configuration options. This leads to a de-synchronization between what options are available for main datasets vs. what is available for overview levels, and a clunky api/documentation. Last example to date: #8976 |
Another potential change is get rid of the wkbPoint25D, wkbLineString25D, etc. constants of the wkbGeometryType enumeration that pre-date ISO WKB, and that have "funny" values that are wkbPoint, wkbLineString or'ed with 0x8000000, which leads to unusual enumeration values not fitting into a signed int (apparently C23 makes it legal... cf #2322 (comment)). It could be best to have wkbPointZ = wkbPoint + 1000 to be ISO WKB compliant. That would impact the C & C++ API & ABI. |
Remove unused OFTWideString and OFTWideStringList from OGRFieldType enumeration |
Installing headers in ${CMAKE_INSTALL_INCLUDEDIR}/gdal : #9276 |
Interesting GTiff discussion here. I've been looking at GeoTIFF read performance lately and some things I've noticed (from C# using MaxRev.Gdal) which might be of interest to GDAL 4.0 planning are
It's not entirely clear to me what GDAL expects of its callers for efficient IO but none of the numerous combinations I've tried yields uncompressed GeoTIFF read throughput much above 1 GB/s per thread (tested with a 5950X and PCIe 4.0 x4 NVMe). Multithreaded scaling could likely also be increased as DDR bandwidth demand is fairly high (7–10 GB/s per GB/s read) compared to analogous non-GDAL cases. Also, kind of an as aside, SWIG isn't making use of .NET types such as |
It would be good if 'gdal_translate -projwin' didn't act like 'gdalwarp -te ' for some resample algs, applying the projwin to the output rather than aligning to input pixels. I don't get why this was mixed up, having gdal_translate always be in alignment seems like a good default and let warp do prescriptive target extent. 🙏 |
I am curious, could you show an understandable example about what is wrong, maybe with some images? |
for sure, if you request a window that is not aligned to the source pixels, it snaps to the source when resample is 'near' gdal_translate vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 near.vrt -projwin 5.5 15.5 10.3 0
gdalinfo near.vrt
...
Origin = (5.000000000000000,16.000000000000000)
Pixel Size = (1.000000000000000,-1.000000000000000)
... When you use 'bilinear', it uncouples from the source grid and gives the provided extent (like warp always does). gdal_translate vrt://gcore/data/rgbsmall.tif?a_ullr=0,50,50,0 bilinear.vrt -projwin 5.5 15.5 10.3 0 -r bilinear
gdalinfo bilinear.vrt
...
Origin = (5.500000000000000,15.500000000000000)
Pixel Size = (0.960000000000000,-0.968750000000000)
...
So, to get alignment you have to your own calcs and provide the right snapped projwin. I was gobsmacked when this was pointed out to me ... but, it's in a Note here, and I understand the concern about the shift but I think it was the wrong fix: https://gdal.org/en/latest/programs/gdal_translate.html#cmdoption-gdal_translate-projwin |
Inconsistent use of String(JSON) type in the GeoJSON driver: if there's a mix of data types, a String(JSON) field is reported to mean that. The only annoying thing is that for backward compatibility with past behaviour of GDAL 3.5 where we silently homogenized to a string, we didn't go to the point to actually quoting strings, so this isn't fully JSON compliant unfortunately I mean if we have out.json with: {
"type": "FeatureCollection",
"features": [
{ "type": "Feature", "properties": { "foo": "str" }, "geometry": null },
{ "type": "Feature", "properties": { "foo": 0 }, "geometry": null },
{ "type": "Feature", "properties": { "foo": ["a", "b"] }, "geometry": null }
]
} $ ogrinfo -al out.geojson -q
Layer name: out
OGRFeature(out):0
foo (String(JSON)) = str
OGRFeature(out):1
foo (String(JSON)) = 0
OGRFeature(out):2
foo (String(JSON)) = [ "a", "b" ] In theory, we should report "str", not just str. |
Deprecate use of term "dateline" |
If we can't standardize on one pair, maybe we fix |
Any config option should accept any of these values. But the documentation probably does not give that impression.
As a non-breaking change, maybe we could add a new config option with more obvious semantics, and only probe |
Should or already does? At the moment, |
Why do you hate "1"/"0" and short forms "t"/"f", "y"/"n"? |
|
This ticket is probably just a dust bin of things that will never happen, but let's add to it:
|
List of tickets tagged with 4.0 milestone: https://github.com/OSGeo/gdal/milestone/33
Below are just proposed ideas. Nothing decided
API breakage:
std::unique_ptr<OGRFeature>
: involves lots of internal changes, affect C++ usersOGRFeature&
that they update. May help a bit performance in bulk reading operations, but probably not spectacular. EDIT: might not be doable, as some drivers return an instance of a subclass of OGRFeaturestd::shared_ptr<OGRSpatialReference>
(orstd::shared_ptr<const OGRSpatialReference>
?) instead : involves lots of internal changes, affect C++ users of the geometry and layer classes, although we might have to propagate it everywhere since OGRSpatialReferenceH would be an opaque type forstd::shared_ptr<OGRSpatialReference>
std::shared_ptr<OGRFeatureDefn>
instead: involves lots of internal changes, affect C++ users, may impact slightly the C API (OGR_FD_GetReferenceCount)API behaviour changes:
Driver changes:
TILED=YES
? #10750OGR SQL:
Driver removals?
Others :
The text was updated successfully, but these errors were encountered: