You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am doing Dedup in a single dataset of 1M size in the machine (M5.4xlarge 16 core and 64 GB RAM). I have done the following matching config, but it is running out of memory.
Indexing sortedneighbourhood for AddressTypeDescription with window=3
Indexing block for ['Designation', 'Department', 'City', 'Gender', 'Country', 'Region']
Error running out of memory
Unable to allocate 165. GiB for an array with shape (22179322464,) and data type int64
Unable to allocate 14.2 GiB for an array with shape (1906374956,) and data type int64
Unable to allocate 23.0 GiB for an array with shape (1, 3092850189) and data type object
Unable to allocate 23.0 GiB for an array with shape (3092850193, 1) and data type object
Basically it is getting stuck/stop the process at indexing step for large dataset.
Could you please suggest how to overcome this scenario?
Regards
Sid
The text was updated successfully, but these errors were encountered:
I am doing Dedup in a single dataset of 1M size in the machine (M5.4xlarge 16 core and 64 GB RAM). I have done the following matching config, but it is running out of memory.
Error running out of memory
Unable to allocate 165. GiB for an array with shape (22179322464,) and data type int64
Unable to allocate 14.2 GiB for an array with shape (1906374956,) and data type int64
Unable to allocate 23.0 GiB for an array with shape (1, 3092850189) and data type object
Unable to allocate 23.0 GiB for an array with shape (3092850193, 1) and data type object
Basically it is getting stuck/stop the process at indexing step for large dataset.
Could you please suggest how to overcome this scenario?
Regards
Sid
The text was updated successfully, but these errors were encountered: