Introduction
Heatmaps can be used as informative figures to convey quantitative data. They can be used to convey data in an easy-to-read format providing a concise data summary.
Python has a number of tools to facilitate the production of publication quality heatmaps. These include the Seaborn and Matplotlib libraries, in addition to the subplot2grid libraries which can provide a convenient way to organise data in a heatmap.
In this tutorial, I will detail the steps required to produce a heatmap which focuses on the presence/absence of key elements. To do this, I will use a CSV file containing fictitious data about a selection of bacterial isolates. These bacterial strains have a number of features including antibiotic resistance genes, virulence genes, and certain capsule types. A heatmap will allow the quick inspection and comparison between the various strains.
While the example used focuses on bacterial strains, the techniques applied can be used more broadly for other datasets to help you visualised your data using a heatmap. Throughout the following tutorial, all images are by the author.
Objective
To create a publication quality heatmap displaying the presence/absence of key genes from fictitious bacterial strains.
This tutorial will use the following csv file, ‘Bacterial_strain_heatmap_tutorial_data.csv’ available from the Github repository.
Getting started
To begin, a few imports are necessary to read in the data and stylise the figure later. We will begin by including all of the import statements together.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import seaborn as sns
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
from matplotlib.patches import Rectangle
Next, we read in the dataframe, set the index using the column ‘Strain’ and view the first 5 rows.
df = pd.read_csv('Bacterial_strain_heatmap_tutorial_data.csv').set_index('Strain')…