Thumbnail

What is a tensor, exactly?

Most deep learning practitioners know about them but can't pinpoint an exact definition.

TensorFlow, PyTorch: every deep learning framework relies on the same basic object: tensors. They're used to store almost everything in deep learning: input data, weights, biases, predictions, etc.

And yet, their definition is incredibly fuzzy: the Wikipedia category alone has over 100 pages related to tensors.

In this article, we'll give a definitive answer to the following question: what is a tensor in neural networks?

💻 Tensors in computer science

So why are there so many definitions?

It's quite simple: different fields have different definitions. Tensors in mathematics are not quite the same as tensors in physics, which are different from tensors in computer science.

Data structure vs. Objects

These definitions can be divided into two categories: tensors as a data structure or as objects (in an object-oriented programming sense).

  • Data structure: this is the definition we use in computer science. Tensors are multidimensional arrays that store a specific type of value.
  • Objects: this is the definition used in other fields. In mathematics and physics, tensors are not just a data structure: they also have a list of properties, like a specific product.

This is why you see a lot of people (sometimes quite pedantically) saying "tensors are not n-dimensional arrays/matrices": they don't talk about data structures, but about objects with properties.

Even the same words have different meanings. For instance, in computer science, a 2D tensor is a matrix (it's a tensor of rank 2). In linear algebra, a tensor with 2 dimensions means it only stores two values. The rank also has a completely different definition: it is the maximum number of its linearly independent column (or row) vectors.

In computer science, we're only interested in a definition focused on the data structure. From this point of view, tensors truly are a generalization in $n$ dimensions of matrices.

But we're still missing an important nuance when talking about tensors specifically in the context of deep learning...

🧠 Tensors in deep learning

Array vs. Tensor Icons created by Freepik and smashingstocks - Flaticon

So why are they called "tensors" instead of "multidimensional arrays"? Ok, it is shorter, but is it all there is to it? Actually, people make an implicit assumption when they talk about tensors.

PyTorch's official documentation gives us a practical answer:

The biggest difference between a numpy array and a PyTorch Tensor is that a PyTorch Tensor can run on either CPU or GPU.

In deep learning, we need performance to compute a lot of matrix multiplications in a highly parallel way. These matrices (and n-dimensional arrays in general) are generally stored and processed on GPUs to speed up training and inference times.

This is what was missing in our previous definition:tensors in deep learning are not just n-dimensional arrays, there's also the implicit assumption they can be run on a GPU.

⚔️ NumPy vs PyTorch

Let's see the difference between NumPy arrays and PyTorch tensors.

Sclar, vector, matrix

These two objects are very similar: we can initialize a 1D array and a 1D tensor with nearly the same syntax. They also share a lot of methods and can be easily converted into one another.

You can find the code used in this article at this address

import numpy as np
import torch

array = np.array([1, 2, 3])
print(f'NumPy Array: {array}')

tensor = torch.tensor([1, 2, 3])
print(f'PyTorch Tensor: {tensor}')
NumPy Array: [1 2 3]
PyTorch Tensor: tensor([1, 2, 3])

Initializing 2D arrays and 2D tensors is not more complicated.

x = np.array([[1, 2, 3],
              [4, 5, 6]])
print(f'NumPy Array:\n{x}')

x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
print(f'\nPyTorch Tensor:\n{x}')
NumPy Array:
[[1 2 3]
 [4 5 6]]

PyTorch Tensor:
tensor([[1, 2, 3],
        [4, 5, 6]])

We said that the only difference between tensors and arrays was the fact that tensors can be run on GPUs. So in the end, this distinction is based on performance. But is this boost that important?

Let's compare the performance between NumPy arrays and PyTorch tensors on matrix multiplication. In the following example, we randomly initialize 4D arrays/tensors and multiply them.

device = torch.device("cuda")

# 4D arrays
array1 = np.random.rand(100, 100, 100, 100)
array2 = np.random.rand(100, 100, 100, 100)

# 4D tensors
tensor1 = torch.rand(100, 100, 100, 100).to(device)
tensor2 = torch.rand(100, 100, 100, 100).to(device)
%%timeit
np.matmul(array1, array2)
1 loop, best of 5: 1.32 s per loop
%%timeit
torch.matmul(tensor1, tensor2)
1000 loops, best of 5: 25.2 ms per loop

As we can see, PyTorch tensors completed outperformed NumPy arrays: they completed the multiplication 52 times faster!

This is the true power of tensors: they're blazingly fast! Performance might vary depending on the dimensions, the implementation, and the hardware, but this speed is the reason why tensors (and not arrays) are so common in deep learning.

Conclusion

In this article, we wrote a definition of tensors based on:

  1. Their use in computer science (data structure);
  2. More specifically, in deep learning (they can run on GPUs).

Here's how we can summarize it in one sentence:

Tensors are n-dimensional arrays with the implicit assumption that they can run on a GPU.

Finally, we saw the difference in performance between tensors and arrays, which motivates the need for tensors in deep learning.

So next time someone tries to explain to you that tensors are not exactly a generalization of matrices, you'll know that they're right in a particular definition of tensors, but not in the computer science/deep learning one.

If you're looking for more data science and machine learning content in n-dimensions, please follow me on twitter @maximelabonne. 📣