Introduction to Safetensors – KDnuggets

Image by Author

Hugging Face has developed a new serialization format called Safetensors, aimed at simplifying and streamlining the storage and loading of large and complex tensors. Tensors are the primary data structure used in deep learning, and their size can pose challenges when it comes to efficiency.

Safetensors use a combination of efficient serialization and compression algorithms to reduce the size of large tensors, making it faster and more efficient than other serialization formats like pickle. This means that Safetensors is 76.6X faster on CPU and 2X faster on GPU compared to the traditional PyTorch serialization format, pytorch_model.bin with model.safetensors. Check out Speed Comparison.

Easy of use

Safetensors have a simple and intuitive API to serialize and deserialize tensors in Python. This means that developers can focus on building their deep learning models instead of spending time on serialization and deserialization.

Cross-platform compatibility

You can serialize in Python and conveniently load the resulting files in various programming languages and platforms, such as C++, Java, and JavaScript. This allows for seamless sharing of models across different programming environments.

Speed

Safetensors is optimized for speed and can efficiently handle the serialization and deserialization of large tensors. As a result, it is an excellent choice for applications that use large language models.

Size Optimization

It uses a blend of effective serialization and compression algorithms to decrease the size of large tensors, resulting in faster and more efficient performance compared to other serialization formats such as pickle.

Secure

To prevent any corruption during storage or transfer of serialized tensors, Safetensors uses a checksum mechanism. This guarantees an added layer of security, ensuring that all data stored in Safetensors is accurate and dependable. Moreverover, it prevents DOS attacks.

Lazy loading

When working in distributed settings with multiple nodes or GPUs, it is helpful to load only a portion of the tensors on each model. BLOOM utilizes this format to load the model on 8 GPUs in just 45 seconds, compared to the regular PyTorch weights which took 10 minutes.

In this section, we will look at safetensors API and how you can save and load file tensor files.

We can simply Install safetensors using pip manager:

We will use the example from Torch shared tensors to build a simple neural network and save the model using safetensors.torch API for PyTorch.

<code>from torch import nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = nn.Linear(100, 100)
        self.b = self.a

    def forward(self, x):
        return self.b(self.a(x))


model = Model()
print(model.state_dict())</code>

As we can see, we have successfully created the model.

<code>OrderedDict([('a.weight', tensor([[-0.0913, 0.0470, -0.0209, ..., -0.0540, -0.0575, -0.0679], [ 0.0268, 0.0765, 0.0952, ..., -0.0616, 0.0146, -0.0343], [ 0.0216, 0.0444, -0.0347, ..., -0.0546, 0.0036, -0.0454], ...,</code>

Now, we will save the model by providing the model object and the file name. After that, we will load the save file into the model object created using nn.Module.

<code>from safetensors.torch import load_model, save_model

save_model(model, "model.safetensors")

load_model(model, "model.safetensors")
print(model.state_dict())</code>

<code>OrderedDict([('a.weight', tensor([[-0.0913, 0.0470, -0.0209, ..., -0.0540, -0.0575, -0.0679], [ 0.0268, 0.0765, 0.0952, ..., -0.0616, 0.0146, -0.0343], [ 0.0216, 0.0444, -0.0347, ..., -0.0546, 0.0036, -0.0454], ...,</code>

In the second example, we will try to save the tensor created using torch.zeros. For that we will use the save_file function.

<code>import torch
from safetensors.torch import save_file, load_file

tensors = {
   "weight1": torch.zeros((1024, 1024)),
   "weight2": torch.zeros((1024, 1024))
}
save_file(tensors, "new_model.safetensors")</code>

And to load the tensors, we will use the load_file function.

<code>load_file("new_model.safetensors")</code>

<code>{'weight1': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]]),
 'weight2': tensor([[0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         ...,
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.],
         [0., 0., 0.,  ..., 0., 0., 0.]])}</code>

The safetensors API is available for Pytorch, Tensorflow, PaddlePaddle, Flax, and Numpy. You can understand it by reading the Safetensors documentation.

Image from Torch API

In short, safetensors is a new way to store large tensors used in deep learning applications. Compared to other techniques, it offers faster, more efficient, and user-friendly features. Additionally, it ensures the confidentiality and safety of data while supporting various programming languages and platforms. By utilizing Safetensors, machine learning engineers can optimize their time and concentrate on developing superior models.

I highly recommend using Safetensors for your projects. Many top AI companies, such as Hugging Face, EleutherAI, and StabilityAI, utilize Safetensors for their projects.

Reference

Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in Technology Management and a bachelor’s degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

Source link