Skip to main content

Save NumPy array to file

1. Intro

We can learn about creating a NumPy array from plain text files like CSV, and TSV in another tutorial. In this tutorial, we will see methods that help us in saving the NumPy array on the file system. We can further use them to create a NumPy array.

Few techniques are critical for a data analyst, like saving an array in .npy or .npz format. Creation time of NumPy array is very fast from .npy file format, compare to text files like CSV or other. Hence it's advisable to save the NumPy array in this format if we wanted to refer to them in the future.

2. Save NumPy array as plain text file like CSV

We can save a NumPy array as a plain text file like CSV or TSV. We tend to use this method when we wanted to share some analysis. Most of the analysis passes through multiple steps. Key stakeholders can see the end result with CSV files easily.

We can also provide custom delimiters.

We use numpy.savetxt() method to save a NumPy array as CSV or TSV file.

numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)

#%%
# Saving NumPy array as a csv file
array_rain_fall = np.loadtxt(fname="rain-fall.csv", delimiter=",")
np.savetxt(fname="saved-rain-fall-row-col-names.csv", delimiter=",", X=array_rain_fall)
# Check generated csv file after loading it
array_rain_fall_csv_saved = np.loadtxt(
    fname="saved-rain-fall-row-col-names.csv", delimiter=","
)
print("NumPy array: \n", array_rain_fall_csv_saved)
print("Shape: ", array_rain_fall_csv_saved.shape)
print("Data Type: ", array_rain_fall_csv_saved.dtype.name)

OUTPUT:

NumPy array: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64

3. Save and read NumPy Binary file

We can save the NumPy array as a binary file format using numpy_array.tofile() method. While it is not recommended for cross-machine use for archival and transfer, it losses the precision and readiness information. It's better to use .npy or .npz format for archival and retrieving purposes.

We use numpy.fromfile() method to create a NumPy array from a binary file.

#%%
# Saving array as binary file and reading it
array_rain_fall.tofile("saved-rain-fall-binary")
array_rain_fall_binary = np.fromfile("saved-rain-fall-binary")
print("NumPy array: \n", array_rain_fall_binary)
print("Shape: ", array_rain_fall_binary.shape)
print("Data Type: ", array_rain_fall_binary.dtype.name)

OUTPUT:

NumPy array: 
 [12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5 13. 11.
 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]
Shape:  (24,)
Data Type:  float64

4. Save and read npy file

We recommend developers to use .npy and .npz files to save the NumPy array on disk for easy persistence and fast retrieval. Creating an array using a .npy file is faster in comparison to CSV or plain text files.

We use numpy.save() method to save file in .npy format.

numpy.save(file, arr, allow_pickle=True, fix_imports=True)

We create NumPy array from .npy file using numpy.load() method.

numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII')

#%%
# Saving array as .npy and reading it
np.save("saved-rain-fall-binary.npy", array_rain_fall)
array_rain_fall_npy = np.load("saved-rain-fall-binary.npy")
print("NumPy array: \n", array_rain_fall_npy)
print("Shape: ", array_rain_fall_npy.shape)
print("Data Type: ", array_rain_fall_npy.dtype.name)

OUTPUT:

NumPy array: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64

5. Save multiple arrays in one npz file

NumPy provides numpy.savez() to save multiple arrays in one file. We can load the .npz file with numpy.load() method.

numpy.savez(file, *args, **kwds)

Combining several NumPy arrays into npz file, results in a faster load of NumPy arrays, comparing it with individual npy _f_iles.

#%%
# Saving multiple arrays in npz format. Loading and reading the array.
np.savez("saved-rain-fall-binary.npz", array_rain_fall, np.array([1, 2, 3, 4, 5]))
array_rain_fall_npz = np.load("saved-rain-fall-binary.npz")
print("NumPy array 1: \n", array_rain_fall_npz["arr_0"])
print("Shape of Array 1: ", array_rain_fall_npz["arr_0"].shape)
print("Data Type of Array 1: ", array_rain_fall_npz["arr_0"].dtype.name)
print("NumPy array 2: \n", array_rain_fall_npz["arr_1"])
print("Shape of Array 2: ", array_rain_fall_npz["arr_1"].shape)
print("Data Type of Array 2: ", array_rain_fall_npz["arr_1"].dtype.name)

OUTPUT:

NumPy array 1: 
 [[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
 [13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape of Array 1:  (2, 12)
Data Type of Array 1:  float64
NumPy array 2: 
 [1 2 3 4 5]
Shape of Array 2:  (5,)
Data Type of Array 2:  int64

We use _[numpy.savez_compressed()](https://www.numpy.org/devdocs/reference/generated/numpy.savez.html#numpy.savez)_ method to save compressed npz file.

6. Conclusion

This tutorial provides useful methods, which you use to optimize your NumPy code further. Save multiple arrays on disk and load them quickly to increase code efficiency and performance.

Please download the source code related to this tutorial here. You can run the Jupyter notebook for this tutorial here.

Comments

Popular posts from this blog

Extend and reuse an existing AirByte destination connector

AirByte is an open-source ELT (Extract, Load, and Transformation) application. It heavily uses containerization for the deployment of its various components. On the local machine, we need docker to run it. AirByte has an impressive list of source and destination connectors available. One of my use case data destinations is the  ClickHouse data warehouse and its destination connector is not yet (2021-12-08) available. As per the documentation, It seems that creating a destination connector is a non-trivial job. It's a great idea to build an open-source ClickHouse destination connector. However, I tried avoiding the temptation to create one because of the required effort. AirByte has a  MySql destination connector available. ClickHouse provides a MySQL connector for access from any MySQL client. We need to configure Clickhouse to give support for the MySQL connector. Accessing ClickHouse from AirByte using its MySQL destination connector looks promising. However, when ...

Understanding Type Checking

A few examples of types in the context of programming language can be integer, float, character, string, array, etc.  When a program executes then data flow between instructions and values of specific types are assigned to a variable after some operation. It's important for the system to verify if the correct types are used as operands in operations. For e.g. In a sum operation, the expectation for operands to be of numeric type. The program's execution should fail in the case there is inconsistency. We can classify programming languages into two categories based as per their ability to cater to type safety: Dynamically Typed Language Statically Typed Language

Setting Clickhouse column data warehouse at Google Cloud Compute Engine VM

I didn't have a Google Cloud account associated with my email, so I signed up for one. It needs a valid Credit Card and mobile number to check if you are human. On successful sign up I get 300$ to spend within 3 months. Creating a free forever Google Cloud Compute Engine VM As per Google Cloud documentation you can have 1 non-preemptible e2-micro VM instance (1GB 2vCPU, 30GB Disk, etc.) per month free forever in some regions with some restrictions. I wanted the following stuff in my VM before I can install Clickhouse on to that: Ubuntu 20.x LTS SSH access from my machine Enabling SSH-based access to Google Compute Engine VM Step 1 Created an ssh private and public key on my mac using the following command ssh-keygen -t rsa -f ~/.ssh/gcloud-ssh-key -C mrityunjay -b 2048 Step 2 Copied the public key from the console using the following command: cat ~/.ssh/gcloud-ssh-key.pub output ssh-rsa <Gibrish :)> mrityunjay Step 3 I went to Google Cloud Console > Co...