Possible enhancement: hashmap for fast dim and var query by name #234

gsjaardema · 2016-03-09T15:11:24Z

This issue is to continue some discussion started in pull request #229 related to the addition of a hashmap to speed up dim and var query by name.

A few more observations:

Need two types of queries to be fast
- query by name which returns either var or varid
- query by varid which returns var (similar for dimid/dim)
Looks like the current nc4 implementation assigns dimid's globally based on the next_dimid field in the NC_HDF5_FILE_INFO_T struct.
- dimids are unique at the file scope.
- To provide fast lookup of a dim from a dimid, it might be possible to use the dimarray concept used in the nc3 implementation to store dims off of the NC_HDF5_FILE_INFO_T struct instead of storing them using the doubly-linked list stored on the group.
The nc4 assigns varid's locally in a group using the nvars field in the NC_GRP_INFO_T struct.
- This means that there are multiple vars with the same varid -- varid scope is group.
- Could possibly use the nc3 vararray concept to store vars at the group level instead of using the doubly-linked list currently used. This would give fast lookup of var via varid.
Need fast lookup of varid and dimid via a name query. This could be provided with hashmap.
- This is relatively easy for nc3 files since there is a single namespace.
- In nc4 files, there can be multiple dim and var with the same name -- unique only within a group.
- One implementation is a hashmap per type (var, dim) per group, but this could result in lots of overhead if there are many groups with not very many dims or vars per group.
- Other possibility is to have a single hashmap at the file level for dims and another for vars.
  - This names would be the var/dim name concatenated with the full group path name which would be hashed and used as the key.
  - Overhead in creating the full group path for var/dim creation and for inquiry, but reduces overhead since only 2 hashmaps instead of 2 per group.
- Could also use a hash key based on the hash of the name combined with the group id instead of group name.
  I have a prototype hashmap usage for nc3 files that is currently passing all tests. It would need some cleanup for general use, but wanted to see how doable it was. It basically provides a quick lookup of dimid or varid from a name and then the dimid or varid to dim or var is a quick lookup based on the dimarray and vararray that nc3 files use. I hope to extend this to nc4 files, but not sure when will get a chance.

The text was updated successfully, but these errors were encountered:

WardF · 2016-04-04T20:55:47Z

This was addressed in the recent pull request I believe; closing out unless I hear different.

gsjaardema · 2016-04-04T21:00:09Z

The recent pull request was for nc3 files; the above discussion is for nc4 (netcdf-4) files.

WardF · 2016-04-04T21:04:05Z

So it is; I glanced over it when I should have read it more carefully.

edhartnett · 2016-11-16T16:23:03Z

I think this is a good idea and I like the changes in your PR.

I am amazed that linked lists should be so much slower. Especially for small numbers of variables.

WardF closed this as completed Apr 4, 2016

WardF reopened this Apr 4, 2016

gsjaardema mentioned this issue Nov 16, 2016

Replace linked list with array for var storage in netcdf-4 format #328

Merged

WardF closed this as completed Jun 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible enhancement: hashmap for fast dim and var query by name #234

Possible enhancement: hashmap for fast dim and var query by name #234

gsjaardema commented Mar 9, 2016

WardF commented Apr 4, 2016

gsjaardema commented Apr 4, 2016

WardF commented Apr 4, 2016

edhartnett commented Nov 16, 2016

Possible enhancement: hashmap for fast dim and var query by name #234

Possible enhancement: hashmap for fast dim and var query by name #234

Comments

gsjaardema commented Mar 9, 2016

WardF commented Apr 4, 2016

gsjaardema commented Apr 4, 2016

WardF commented Apr 4, 2016

edhartnett commented Nov 16, 2016