I’ve recently been tinkering with deep learning models in Tensorflow, and have accordingly been introduced to managing data as tensors.
As a Data Engineer that works all day in tables that I can easily slice, dice, and visualize, I had absolutely no intuition around working with tensors, and I seemed to constantly run into the same errors that, especially at first, went way over my head.
However, deep diving them has taught me a lot about tensors and TensorFlow, and I wanted to consolidate those learnings here to use as a reference.
If you have a favorite error, solution, or debugging tip, please leave a comment!
Before we dive into the errors themselves, I wanted to document a few of the light-weight, simple bits and pieces of code that I’ve found helpful in debugging. (Although it must be stated for legal reasons that we of course always debug with official debugging features and never just dozens of print statements 🙂)
Seeing inside our Tensorflow Datasets
First off, looking at our actual data. When we print a Dataframe or SELECT * in SQL, we see the data! When we print a tensor dataset we see…
<_TensorSliceDataset element_spec=(TensorSpec(shape=(2, 3), dtype=tf.int32, name=None), TensorSpec(shape=(1, 1), dtype=tf.int32, name=None))>
This is all quite useful information, but it doesn’t help us understand what’s actually going on in our data.
To print a single tensor within the execution graph we can leverage tf.print. This article is a wonderful deep dive into tf.print that I highly recommend if you plan to use it often: Using tf.Print() in TensorFlow
But when working with Tensorflow datasets during development, sometimes we need to see a few values at a time. For that we can loop through and print individual pieces of data like this:
# Generate dummy 2D data
np.random.seed(42)
num_samples = 100
num_features = 5
X_data = np.random.rand(num_samples, num_features).astype(np.float32)
y_data = 2 * X_data[:, 0] + 3 * X_data[:, 1] - 1.5 * X_data[:, 2] + 0.5 * X_data[:, 3] + np.random.randn(num_samples)# Turn it into a Tensorflow Dataset
dataset = tf.data.Dataset.from_tensor_slices((X_data, y_data))
# Print the first 10 rows
for i, (features, label) in enumerate(dataset.take(10)):
print(f"Row {i + 1}: Features - {features.numpy()}, Label - {label.numpy()}")
We can also use skip to get to a specific index:
mini_dataset = dataset.skip(100).take(20)
for i, (features, label) in enumerate(mini_dataset):
print(f"Row {i + 1}: Features - {features.numpy()}, Label - {label.numpy()}")
Knowing our tensors’ specs
When working with tensors we also need to know their shape, rank, dimension, and data type (if some of that vocabulary is unfamiliar, as it was to me initially, don’t worry, we’ll get back to it later in the article). Anyway, below are a few lines of code to gather this information:
# Create a sample tensor
sample_tensor = tf.constant([[1, 2, 3], [4, 5, 6]])# Get the size of the tensor (total number of elements)
tensor_size = tf.size(sample_tensor).numpy()
# Get the rank of the tensor
tensor_rank = tf.rank(sample_tensor).numpy()
# Get the shape of the tensor
tensor_shape = sample_tensor.shape
# Get the dimensions of the tensor
tensor_dimensions = sample_tensor.shape.as_list()
# Print the results
print("Tensor Size:", tensor_size)
print("Tensor Rank:", tensor_rank)
print("Tensor Shape:", tensor_shape)
print("Tensor Dimensions:", tensor_dimensions)
The above outputs:
Tensor Size: 6
Tensor Rank: 2
Tensor Shape: (2, 3)
Tensor Dimensions: [2, 3]
Augmenting model.summary()
Finally, its is always helpful to be able to see how data is moving through a model, and how shape changes throughout inputs and outputs between layers. The source of many an error will be a mismatch between these expected input and output shapes and the shape of a given tensor.
model.summary() of course gets the job done, but we can supplement that information with the following snippet, which adds a bit more context with model and layer inputs and outputs:
print("###################Input Shape and Datatype#####################")
[print(i.shape, i.dtype) for i in model.inputs]
print("###################Output Shape and Datatype#####################")
[print(o.shape, o.dtype) for o in model.outputs]
print("###################Layer Input Shape and Datatype#####################")
[print(l.name, l.input, l.dtype) for l in model.layers]
So let’s jump into some errors!
Rank
ValueError: Shape must be rank x but is rank y….
Ok, first of all, what is a rank? Rank is just the unit of dimensionality we use to describe tensors. A rank 0 tensor is a scalar value; a rank one tensor is a vector; a rank two is a matrix, and so on for all n dimensional structures.
Take for example a 5 dimensional tensor.
rank_5_tensor = tf.constant([[[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]], [[13, 14], [15, 16]]]],
[[[[17, 18], [19, 20]], [[21, 22], [23, 24]]], [[[25, 26], [27, 28]], [[29, 30], [31, 32]]]]])
print("\nRank 5 Tensor:", rank_5_tensor.shape)
Rank 5 Tensor: (2, 2, 2, 2, 2)
The code above shows that each dimension of the five has a size of two. If we wanted to index it, we could do so along any of these axes. To get at the last element, 32, we would run something like:
rank_5_tensor.numpy()[1][1][1][1][1]
The official tensor documentation has some really helpful visualizations to make this a bit more comprehensible.
Back to the error: it is just flagging that the tensor provided is a different dimension than what is expected to a particular function. For example if the error declares that the “Shape must be rank 1 but is rank 0…” it means that we are providing a scalar value, and it expects a 1-D tensor.
Take the example below where we are trying to multiply tensors together with the matmul method.
import tensorflow as tf
import numpy as np
# Create a TensorFlow dataset with random matrices
num_samples = 5
matrix_size = 3
dataset = tf.data.Dataset.from_tensor_slices(np.random.rand(num_samples, matrix_size, matrix_size))
mul = [1,2,3,4,5,6]# Define a function that uses tf.matmul
def matmul_function(matrix):
return tf.matmul(matrix, mul)
# Apply the matmul_function to the dataset using map
result_dataset = dataset.map(matmul_function)
If we take a peek at the documentation, matmul expects at least a rank 2 tensor, so multiplying the matrix by [1,2,3,4,5,6], which is just an array, will raise this error.
ValueError: Shape must be rank 2 but is rank 1 for '{{node MatMul}} = MatMul[T=DT_DOUBLE, transpose_a=false, transpose_b=false](args_0, MatMul/b)' with input shapes: [3,3], [2].
A great first step for this error is to dive into the documentation and understand what the function you are using is looking for (Here’s a nice list of the functions available on tensors: raw_ops.
Then use the rank method to determine what we are actually providing.
print(tf.rank(mul))
tf.Tensor(1, shape=(), dtype=int32)
As far as fixes go, tf.reshape is often a good option to start with. Let’s take a brief moment to talk a little bit about tf.reshape, since it will be a faithful companion throughout our Tensorflow journey: tf.reshape(tensor, shape, name=None)
Reshape simply takes in the tensor we want to reshape, and another tensor containing what we want the shape of the output to be. For example, let’s reshape our multiplication input:
mul = [1,2,3,4,5,6]
tf.reshape(mul, [3, 2]).numpy()
array([[1, 2],
[3, 4],
[5, 6]], dtype=int32)
Our variable will turn into a (3,2) tensor (3 rows, 2 columns). A quick note, tf.reshape(t, [3, -1]).numpy() will produce the same thing because the -1 tells Tensorflow to compute the size of the dimension such that the total size remains constant. The number of elements in the shape tensor is the rank.
Once we create a tensor with the proper rank, our multiplication will work just fine!
Shape
ValueError: Input of layer is incompatible with layer….
Having an intuitive understanding of tensor shape, and how it interacts and changes across model layers has made life with deep learning significantly easier
First, getting basic vocab out of the way: the shape of a tensor refers to the number of elements along each dimension, or axis of the tensor. For example, a 2D tensor with 3 rows and 4 columns has a shape of (3, 4).
So what can go wrong with shape? Glad you asked, quite a few things!
First and foremost the shape and rank of your training data must match the input shape expected by the input layer. Let’s take a look at an example, a basic CNN:
import tensorflow as tf
from tensorflow.keras import layers, models# Create a function to generate sample data
def generate_sample_data(num_samples=100):
for _ in range(num_samples):
features = tf.random.normal(shape=(64, 64, 3))
labels = tf.one_hot(tf.random.uniform(shape=(), maxval=10, dtype=tf.int32), depth=10)
yield features, labels
# Create a TensorFlow dataset using the generator function
sample_dataset = tf.data.Dataset.from_generator(generate_sample_data, output_signature=(tf.TensorSpec(shape=(64, 64, 3), dtype=tf.float32), tf.TensorSpec(shape=(10,), dtype=tf.float32)))
# Create a CNN model with an input layer expecting (128, 128, 3)
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Fit the model using the dataset
model.fit(sample_dataset.batch(32).repeat(), epochs=5, steps_per_epoch=100, validation_steps=20)
Trying to run the code above will result in:
ValueError: Input 0 of layer "sequential_5" is incompatible with the layer: expected shape=(None, 128, 128, 3), found shape=(None, 64, 64, 3)
This is because our model is expecting the input tensor to be of the shape (128, 128, 3) and our generated data is (64, 64, 3).
In a situation like this, our good friend, reshape, or another Tensorflow function, resize, can help. If, as in the case above we are working with images, we can simply run resize or change the expectations of our model’s input:
def resize_image(image, label):
resized_image = tf.image.resize(image, size=target_shape)
return resized_image, label# Apply the resize function to the entire dataset
resized_dataset = sample_dataset.map(resize_image)
In this context, it is helpful to know a little about how common types of models and model layers expect input of different shapes, so let’s take a little detour.
Deep Neural Networks of Dense layers take in 1 dimensional tensors (or 2 dimensional, depending on whether you include batch size, but we will talk about batch size in a bit) of the format (feature_size, ) where feature_size is the number of features in each sample.
Convolutional Neural Networks take in data representing images, using 3 dimensional tensors of (width, height, channels) where channels are the color scheme, 1 for gray scale, and 3 for RBG.
And finally, Recurrent Neural Networks such as LTSMs take in 2 dimensions (time steps, feature_size)
But back to errors! Another common culprit in Tensorflow shape errors has to do with how shape changes as data passes through the model layers. As previously mentioned, different layers take in different input shapes, and they can also reshape output.
Returning to our CNN example from above, let’s break it again, by seeing what happens when we remove the Flatten layer. If we try to run the code we will see
ValueError: Shapes (None, 10) and (None, 28, 28, 10) are incompatible
This is where printing all of our model input and output shapes along with our data shapes comes in handy to help us pinpoint where there is a mismatch.
model.summary() will show us
Layer (type) Output Shape Param #
=================================================================
conv2d_15 (Conv2D) (None, 126, 126, 32) 896
max_pooling2d_10 (MaxPooli (None, 63, 63, 32) 0
ng2D)
conv2d_16 (Conv2D) (None, 61, 61, 64) 18496
max_pooling2d_11 (MaxPooling2D) (None, 30, 30, 64) 0
conv2d_17 (Conv2D) (None, 28, 28, 64) 36928
flatten_5 (Flatten) (None, 50176) 0
dense_13 (Dense) (None, 64) 3211328
dense_14 (Dense) (None, 10) 650
=================================================================
Total params: 3268298 (12.47 MB)
Trainable params: 3268298 (12.47 MB)
Non-trainable params: 0 (0.00 Byte)
And our further diagnostic will reveal
###################Input Shape and Datatype#####################
(None, 128, 128, 3) <dtype: 'float32'>
###################Output Shape and Datatype#####################
(None, 10) <dtype: 'float32'>
###################Layer Input Shape and Datatype#####################
conv2d_15 KerasTensor(type_spec=TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name='conv2d_15_input'), name='conv2d_15_input', description="created by layer 'conv2d_15_input'") float32
max_pooling2d_10 KerasTensor(type_spec=TensorSpec(shape=(None, 126, 126, 32), dtype=tf.float32, name=None), name='conv2d_15/Relu:0', description="created by layer 'conv2d_15'") float32
conv2d_16 KerasTensor(type_spec=TensorSpec(shape=(None, 63, 63, 32), dtype=tf.float32, name=None), name='max_pooling2d_10/MaxPool:0', description="created by layer 'max_pooling2d_10'") float32
max_pooling2d_11 KerasTensor(type_spec=TensorSpec(shape=(None, 61, 61, 64), dtype=tf.float32, name=None), name='conv2d_16/Relu:0', description="created by layer 'conv2d_16'") float32
conv2d_17 KerasTensor(type_spec=TensorSpec(shape=(None, 30, 30, 64), dtype=tf.float32, name=None), name='max_pooling2d_11/MaxPool:0', description="created by layer 'max_pooling2d_11'") float32
flatten_5 KerasTensor(type_spec=TensorSpec(shape=(None, 28, 28, 64), dtype=tf.float32, name=None), name='conv2d_17/Relu:0', description="created by layer 'conv2d_17'") float32
dense_13 KerasTensor(type_spec=TensorSpec(shape=(None, 50176), dtype=tf.float32, name=None), name='flatten_5/Reshape:0', description="created by layer 'flatten_5'") float32
dense_14 KerasTensor(type_spec=TensorSpec(shape=(None, 64), dtype=tf.float32, name=None), name='dense_13/Relu:0', description="created by layer 'dense_13'") float32
It is a lot of output, but we can see that dense_13 layer is looking for input of (None, 50176) shape. However, conv2d_17 layer outputs (None, 28, 28, 64)
Flatten layers transform the multi-dimensional output from previous layers into a one-dimensional (flat) vector that the Dense layer expects.
Conv2d and Max Pooling layers change their input data in other interesting ways as well, but those are out of scope for this article. For an awesome breakdown take a look at: Ultimate Guide to Input shape and Model Complexity in Neural Networks
But what about batch size?! I haven’t forgotten!
If we break our code one more time by removing the .batch(32) from the dataset in model.fit we will get the error:
ValueError: Input 0 of layer "sequential_10" is incompatible with the layer: expected shape=(None, 128, 128, 3), found shape=(128, 128, 3)
That is because, the first dimension of a layer’s input is reserved for the batch size or number of samples that we want the model to work through at a time. For a great deep dive read through Difference between batch and epoch.
Batch size defaults to None prior to fitting, as we can see in the model summary output, and our model expects us to set it elsewhere, depending on how we tune the hyperparameter. We can also force it in our input layer by using batch_input_size instead of input_size, but that decreases our flexibility in testing out different values.
Type
TypeError: Failed to convert object of type to Tensor. Unsupported object type
Finally, let’s talk a bit about some data type specifics in Tensors.
The error above is another, that, if you’re used to working in database systems with tables built from all sorts of data, can be a bit baffling, but it is one of the more simple to diagnose and fix, although there are a couple of common causes to look out for.
The main issue is that, although tensors support a variety of data types, when we convert a NumPy array to tensors (a common flow within deep learning), the datatypes must be floats. The script below initializes a contrived example of a dataframe with None and with string data points. Let’s walk through some issue and fixes for this example:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
data = [
[None, 0.2, '0.3'],
[0.1, None, '0.3'],
[0.1, 0.2, '0.3'],
]
X_train = pd.DataFrame(data=data, columns=["x1", "x2", "x3"])
y_train = pd.DataFrame(data=[1, 0, 1], columns=["y"])# Create a TensorFlow dataset
train_dataset = tf.data.Dataset.from_tensor_slices((X_train.to_numpy(), y_train.to_numpy()))
# Define the model
model = Sequential()
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Fit the model using the TensorFlow dataset
model.fit(train_dataset.batch(3), epochs=3)
Running this code will flag to us that:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
The most obvious issue is that you are sending in a NumPy array that contains some non float type, an object. If you have an actual column of categorical data, there are many ways to convert that to numeric data (One shot encoding, etc) but that is out of scope for this discussion.
We can determine that if we run print(X_train.dtypes), which will tell us what’s in our dataframe that Tensorflow doesn’t like.
x1 float64
x2 float64
x3 object
dtype: object
If we are running into non float data points, the line below will magically solve all of our problems:
X_train = np.asarray(X_train).astype('float32')
Another thing to check for is if you have None or np.nan anywhere.
To find out we can use a few lines of code such as:
null_mask = X_train.isnull().any(axis=1)
null_rows = X_train[null_mask]
print(null_rows)
Which tells us that we have nulls on rows 0 and 1:
x1 x2 x3
0 NaN 0.2 0.3
1 0.1 NaN 0.3
If so, and that is expected/intentional we need to replace those values with an acceptable alternative. Fillna can help us here.
X_train.fillna(value=0, inplace=True)
With these changes to the code below, our NumPy array will successfully convert to a tensor dataset and we can train our model!
I often find that I learn the most about a particular technology when I have to work through errors, and I hope this has been somewhat helpful to you too!
If you have cool tips and tricks or fun Tensorflow errors please pass them along!