A resizable and readable numpy array on disk.
This module is built on numpy memmap
used for accessing and modifying small segments of large files on disk, without reading the entire file into memory.
This module also supports appending your numpy arrays to disk array at any time.
Prerequisites: Python(">=3.8.0")
$ sudo pip install diskarray
# One can use older version of diskarray for Python("<3.8.0")
# Eg: sudo pip install diskarray==0.1.9
>>> import numpy as np
>>> from diskarray import DiskArray
>>> data = np.array([[2 , 3, 4], [1, 2, 3]])
>>> da = DiskArray('/tmp/disk.array', shape=(0, 3), dtype=np.float32)
>>> da.extend(data)
>>> print(da[:])
DiskArray
supports two methods, extend and append.
extend
is used to append arrays to disk array.
append
is used to append single array at a time.
Example1:
>>> import numpy as np
>>> from diskarray import DiskArray
>>> data = np.array([[2 , 3, 4], [1, 2, 3]])
# creating object to disk array
>>> da = DiskArray('/tmp/disk.array', shape=(0, 3), capacity=(10, 3), growby=200, dtype=np.float32)
# extend the data to disk array
>>> da.extend(data)
# Get the full array
>>> print(da[:])
# Get the data which is in first row
>>> print(da[1])
# Get the data from first row to third row
>>> print(da[1:3])
# Get the data which is in 1st row 1st column
>>> print(da[1][1])
-
/tmp/disk.array
is the file which holds disk arrays. -
shape
is the size of the disk array. -
capacity
is the total capacity of the disk array. This is used because when we want to extend arrays which are larger thanshape
then DiskArray creates again memmap to the file which is costliear operation. So we are usingcapacity
to directly create disk array with the size ofcapacity
-
capacity
andgrowby
are optional which takesshape
ascapacity
andgrowby
as10000
when these are not given.
Example2:
>>> import numpy as np
>>> from diskarray import DiskArray
>>> dtype = [('token', np.uint32), ('count', np.uint32), ('vec', np.float32)]
>>> data = np.array([[(1, 0, 0.), (0, 2, 0.), (0, 2, 0.)], [(1, 0, 0.), (0, 2, 0.), (0, 2, 0.)]], dtype=dtype)
>>> da = DiskArray('/tmp/disk.array', shape=(0, 3), capacity=(10, 3), dtype=dtype)
>>> da.extend(data)
# Get the full array
>>> print(da[:])
# Get the count values at 1th row
>>> print(da[1]['count'])
# Get the token value at 1th row 2nd column
>>> print(da[1][2]['token'])
# Modify the vec value at 1th row 2nd column
>>> da[1][2]['vec'] = 10.0
Example:
>>> import numpy as np
>>> from diskarray import DiskArray
>>> data = np.array([[2 , 3, 4])
# creating object to disk array
>>> da = DiskArray('/tmp/disk.array', shape=(0, 3), capacity=(10, 3), growby=200, dtype=np.float32)
# append 1 dimensional array to disk array
>>> da.append(data)
>>> da.append(data + 1)
# Get the full array
>>> print(da[:])
# Get the data which is in first row
>>> print(da[1])
# Get the data from first row to third row
>>> print(da[1:3])
# Get the data which is in 1st row 1st column
>>> print(da[1][1])
growby
is used to increase the size of disk array when it reaches to it's maximum limit.
# diskarray provides command to directly interact with it
$ diskarray interact <fpath> <shape> <dtype> --capacity <capacity> --growby <growby> --mode <mode>
# <fpath> is the input file which is used to store disk arrys.
# <shape> is the size of the disk array.
# <dtype> is the data type of the disk array.
# <capacity> is the total capacity of the disk array.
# <growby> is used to increase the size of the disk array when it reaches to it's maximum limit.
# <mode> is to open the disk array in that mode.
Example:
$ diskarray interact /tmp/test '(0, 3)' np.float32 --capacity '(10, 3)' --growby 5 --mode r+
DiskArray Console
>>> import numpy as np
>>> da.append(np.array([1, 2, 3]))
$ python setup.py test