Road to ML Engineer #16 - PyTorch vs TensorFlow

For beginners in deep learning, one of the biggest decisions they have to make is which deep learning framework they should pursue: PyTorch or TensorFlow, the two most popular frameworks in the deep learning community. Here, I would like to provide some insights using an example ML pipeline built with both frameworks.

Step 1 & 2. Data Exploration & Preprocessing

The dataset we use here is the MNIST dataset, which we have seen before. It contains 60,000 image samples of handwritten digits. Let's look at some examples of images from the MNIST dataset:

import keras
import matplotlib.pyplot as plt
 
# Download MNIST
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
 
# Display 10 samples
plt.figure(figsize=(10, 4))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_train[i], cmap='gray')
    plt.axis('off')
plt.tight_layout()
plt.show()

The images above may appear quite blurry, but they are still of size 28 by 28 pixels, containing 784 pixels in total. The pixel values before preprocessing range between 0 and 255, and the class labels are not one-hot encoded. Hence, we apply the following to normalize the data and apply one-hot encoding:

X_train = X_train.reshape(X_train.shape[0], X_train.shape[1]*X_train.shape[2])
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1]*X_test.shape[2])
 
def zscore(X, axis = None):
    X_mean = X.mean(axis=axis, keepdims=True)
    X_std  = np.std(X, axis=axis, keepdims=True)
    zscore = (X-X_mean)/X_std
    return zscore
 
X_train = zscore(X_train)
X_test = zscore(X_test)
 
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

To check for potential overfitting and underfitting, we typically set up a validation dataset and track loss and metrics on it. We can use a portion of the training dataset to create a validation dataset as follows:

from sklearn.model_selection import train_test_split
 
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=10000, random_state=101)

For TensorFlow, the data preprocessing is already done at this point since the model built with TensorFlow is compatible with NumPy arrays. However, for PyTorch, we need to convert the NumPy arrays into tensor:

import torch.nn as nn
 
X_train, X_val, X_test = map(lambda X: torch.tensor(X, dtype=torch.float32), (X_train, X_val, X_test))
y_train, y_val, y_test = map(lambda y: torch.tensor(y, dtype=torch.int64), (y_train, y_val, y_test))

Technically, the above is already sufficient for PyTorch. However, it is highly recommended to build a Dataset and DataLoader from the tensors to configure batch sizes and other hyperparameters related to data:

train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
val_dataset = torch.utils.data.TensorDataset(X_val, y_val)
test_dataset = torch.utils.data.TensorDataset(X_test, y_test)
 
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)
val_loader = torch.utils.data.DataLoader(dataset=val_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=1, shuffle=True)

Here, we use a batch size of 32 for both the PyTorch and TensorFlow models. The other hyperparameters are also set the same for both frameworks.

Step 3. Models

Let's create classification models using TensorFlow and PyTorch to see how they differ. You can click below to see the corresponding implementation.

Here, I will omit the training results and Step 4 (model evaluation) since there isn't much to discuss. I highly recommend you try it yourself as practice.

Conclusion

In general, it is said that TensorFlow is for quick production because of how fast it is to write code using predefined functions, objects, and features like TensorBoard for monitoring training, as well as how easily it can be deployed on multiple platforms using features like TensorFlow Lite, TensorFlow.js, etc.

It is also said that PyTorch is for research, where we build new custom models, layers, and other custom components, due to its high customizability. However, both frameworks are constantly improving, and both are fast to write and easy to customize (at least, that is my opinion after using them both). Therefore, it is completely up to you to decide which one to use. Personally, I would recommend learning both at the same time by going back and forth between them since they are similar enough, and you are likely to use both anyway.