Save NumPy array to file

1. Intro

We can learn about creating a NumPy array from plain text files like CSV, and TSV in another tutorial. In this tutorial, we will see methods that help us in saving the NumPy array on the file system. We can further use them to create a NumPy array.

Few techniques are critical for a data analyst, like saving an array in .npy or .npz format. Creation time of NumPy array is very fast from .npy file format, compare to text files like CSV or other. Hence it's advisable to save the NumPy array in this format if we wanted to refer to them in the future.

2. Save NumPy array as plain text file like CSV

We can save a NumPy array as a plain text file like CSV or TSV. We tend to use this method when we wanted to share some analysis. Most of the analysis passes through multiple steps. Key stakeholders can see the end result with CSV files easily.

We can also provide custom delimiters.

We use numpy.savetxt() method to save a NumPy array as CSV or TSV file.

numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)

#%%
# Saving NumPy array as a csv file
array_rain_fall = np.loadtxt(fname="rain-fall.csv", delimiter=",")
np.savetxt(fname="saved-rain-fall-row-col-names.csv", delimiter=",", X=array_rain_fall)
# Check generated csv file after loading it
array_rain_fall_csv_saved = np.loadtxt(
    fname="saved-rain-fall-row-col-names.csv", delimiter=","
)
print("NumPy array: \n", array_rain_fall_csv_saved)
print("Shape: ", array_rain_fall_csv_saved.shape)
print("Data Type: ", array_rain_fall_csv_saved.dtype.name)

OUTPUT:

NumPy array: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64

3. Save and read NumPy Binary file

We can save the NumPy array as a binary file format using numpy_array.tofile() method. While it is not recommended for cross-machine use for archival and transfer, it losses the precision and readiness information. It's better to use .npy or .npz format for archival and retrieving purposes.

We use numpy.fromfile() method to create a NumPy array from a binary file.

#%%
# Saving array as binary file and reading it
array_rain_fall.tofile("saved-rain-fall-binary")
array_rain_fall_binary = np.fromfile("saved-rain-fall-binary")
print("NumPy array: \n", array_rain_fall_binary)
print("Shape: ", array_rain_fall_binary.shape)
print("Data Type: ", array_rain_fall_binary.dtype.name)

OUTPUT:

NumPy array: 
 [12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5 13. 11.
 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]
Shape:  (24,)
Data Type:  float64

4. Save and read npy file

We recommend developers to use .npy and .npz files to save the NumPy array on disk for easy persistence and fast retrieval. Creating an array using a .npy file is faster in comparison to CSV or plain text files.

We use numpy.save() method to save file in .npy format.

numpy.save(file, arr, allow_pickle=True, fix_imports=True)

We create NumPy array from .npy file using numpy.load() method.

numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')

#%%
# Saving array as .npy and reading it
np.save("saved-rain-fall-binary.npy", array_rain_fall)
array_rain_fall_npy = np.load("saved-rain-fall-binary.npy")
print("NumPy array: \n", array_rain_fall_npy)
print("Shape: ", array_rain_fall_npy.shape)
print("Data Type: ", array_rain_fall_npy.dtype.name)

OUTPUT:

NumPy array: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64

5. Save multiple arrays in one npz file

NumPy provides numpy.savez() to save multiple arrays in one file. We can load the .npz file with numpy.load() method.

numpy.savez(file, *args, **kwds)

Combining several NumPy arrays into npz file, results in a faster load of NumPy arrays, comparing it with individual npy _f_iles.

#%%
# Saving multiple arrays in npz format. Loading and reading the array.
np.savez("saved-rain-fall-binary.npz", array_rain_fall, np.array([1, 2, 3, 4, 5]))
array_rain_fall_npz = np.load("saved-rain-fall-binary.npz")
print("NumPy array 1: \n", array_rain_fall_npz["arr_0"])
print("Shape of Array 1: ", array_rain_fall_npz["arr_0"].shape)
print("Data Type of Array 1: ", array_rain_fall_npz["arr_0"].dtype.name)
print("NumPy array 2: \n", array_rain_fall_npz["arr_1"])
print("Shape of Array 2: ", array_rain_fall_npz["arr_1"].shape)
print("Data Type of Array 2: ", array_rain_fall_npz["arr_1"].dtype.name)

OUTPUT:

NumPy array 1: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape of Array 1:  (2, 12)
Data Type of Array 1:  float64
NumPy array 2: 
 [1 2 3 4 5]
Shape of Array 2:  (5,)
Data Type of Array 2:  int64

We use _[numpy.savez_compressed()](https://www.numpy.org/devdocs/reference/generated/numpy.savez.html#numpy.savez)_ method to save compressed npz file.

6. Conclusion

This tutorial provides useful methods, which you use to optimize your NumPy code further. Save multiple arrays on disk and load them quickly to increase code efficiency and performance.

Please download the source code related to this tutorial here. You can run the Jupyter notebook for this tutorial here.

Mrityunjay

Search This Blog