Tag Archives: matplotlib

A Single stacked bar chart in Matplotlib

Somewhat recently I wanted to generate a chart that had just a single bar in it, but with different segments on the bar to denote the amount of time spent in different parts of an overall process. This is variant of the nested horizontal bar chart I wrote about last time.

The particular things I was after in this chart were:

  • I didn’t want a y-axis label — there’s only one bar in this case (albeit sub-divided).  So having a label seems redundant.
  • I did still want a larger shadow bar to show overall time — sometimes I might have segments that don’t quite add up to the total amount of time.  In my scenario that was okay but I still want to see that overall time.
  • I wanted to put the legend somewhere convenient — in this case I opted to spread the items across the top of the chart.
  • I also wanted to minimize the vertical size of the chart since I only had one bar in it.

Sample Data

To start with let’s say we have this as our sample data set.

segment_values = [
 {'value': 12, 'label': 'A', 'color': '#FF0000'},
 {'value': 8, 'label': 'B', 'color': '#00FF00'},
 {'value': 5, 'label': 'C', 'color': '#0000FF'},
 {'value': 5, 'label': 'D', 'color': '#33A6CC'},
 {'value': 16, 'label': 'E', 'color': '#A82279'}
 ]

Adding the bars

As in the previous post when embedding bars within bars, we add each segment using the same vertical offset (y_pos) but a different horizontal position.

left_pos = 0
for idx in range(len(segment_values)):
 segdata = segment_values[idx]
 seglabel = segdata['label']
 segval = segdata['value']
 segcol = segdata['color']

chart_ax.barh(y_pos, [segval], width, align='center', color=segcol, label=seglabel, left=left_pos, edgecolor=['black', 'black'], linewidth=0.5)
 left_pos += segval

The one problem though is that our single bar will take up all the vertical space in the chart, which just doesn’t look good:

To prevent that we need to trick Matplotlib just a bit.  The way we do that is to add a bar with a zero length.

chart_ax.barh(y_pos, 0, 1.0, align='center', color='white', ecolor='black', label=None)

Omitting the y-axis labels.

If we set up our xlabel and title but don’t do anything about the y axis labels then we’ll get something that looks like this:

To omit that we just set y_ticks.

chart_ax.set_yticks([1])

Adding the legend

Normally adding in a legend is pretty trivial. Getting the legend in this particular format — spread across the top — is a little more involved. We need to set up a set of anchor values.  But what are anchor values?

The anchor values in this example is a tuple of floating point values ranging from 0 to 1.0. The actual parameters are:

(x0, y0, width, height)

# Set up the legend so it is arranged across the top of the chart.
anchor_vals = (0.01, 0.6, 0.95, 0.2)
plt.legend(bbox_to_anchor=anchor_vals, 
          loc=4, 
          ncol=4, 
          mode="expand", 
          borderaxespad=0.0)

There is a good StackOverflow discussion about the 4-tuple anchor values. The Matplotlib legend location docs are also pretty good and authoritative legend options.  In practice it probably still takes a bit of playing around to get a feel for how this is working.

Summary

The resulting chart served my limits. The are limits to the number of items that can comfortably fit in this kind of chart though. If you are going to have the legend identify every segment then that is going to continue to use more room of course.  But in addition I found that you have to pay attention to color selection when adding more segments.

The resulting sample code in full is shown below.

import matplotlib.pyplot as plt

# Set the vertical dimension to be smaller.. 
# 3.5 seems to work after a bit of experimenting.
plt.rcParams["figure.figsize"] = [10, 3.5]
fig, chart_ax = plt.subplots()
plt.rcdefaults()

# Sample Data
# -------------------

segment_values = [ {'value': 12, 'label': 'A', 'color': '#FF0000'},
 {'value': 8, 'label': 'B', 'color': '#00FF00'},
 {'value': 5, 'label': 'C', 'color': '#0000FF'},
 {'value': 5, 'label': 'D', 'color': '#33A6CC'},
 {'value': 16, 'label': 'E', 'color': '#A82279'}
 ]

# Sum up the value total.
outer_bar_length = 0
for segitem in segment_values:
 outer_bar_length += segitem['value']
outer_bar_label = 'Total Time'

# In this case we expect only 1 item in the entries list.
y_pos = [0]
width = 0.05

# Set the 'empty' bar .. this is here to coerce Matplotlib
# to keep the size of the bar smaller on our actual data.
# Otherwise the bar will use all available space.

chart_ax.barh(y_pos, 0, 1.0, align='center', color='white', ecolor='black', label=None)

# Is there an 'outer' or container bar?
if outer_bar_length != -1:
 chart_ax.barh(y_pos, outer_bar_length, 0.12,
 align='center', color='#D9DCDE', label=outer_bar_label, left=0)


# Now go through and add in the actual segments of data.
left_pos = 0
for idx in range(len(segment_values)):
 segdata = segment_values[idx]
 seglabel = segdata['label']
 segval = segdata['value']
 segcol = segdata['color']

chart_ax.barh(y_pos, [segval], width, align='center', color=segcol, label=seglabel, left=left_pos, edgecolor=['black', 'black'], linewidth=0.5)
 left_pos += segval

chart_ax.set_yticks([1])
chart_ax.invert_yaxis()
chart_ax.set_xlabel('Time')
chart_ax.set_title('Single Stacked Bar Chart')
plt.tight_layout()

# Set up the legend so it is arranged across the top of the chart.
anchor_vals = (0.01, 0.6, 0.95, 0.2)
plt.legend(bbox_to_anchor=anchor_vals, loc=4, ncol=4, mode="expand", borderaxespad=0.0)

plt.show()

 

Nested horizontal bar charts in Matplotlib

Matplotlib is pretty much the primary library for creating charts in Python. This library has been around for a while and there is a good amount of documentation on using it. However, there are some examples that aren’t very common.

One kind of chart I am using for some data visualization work is to embed a smaller set of bars inside a larger set. This is similar to a stacked bar chart kinda but with a couple of differences that provide some other features in terms of being able to represent relationships in the actual data. This type of chart allows you to convey information about an overall metric and then break it down into component metrics, or perhaps show a related set of metrics right on top of it.

And truth be told another motivating factor is that it can buy you some vertical space in a chart.

Since I haven’t found many examples of this I thought I would post one.

A basic horizontal bar chart

import matplotlib.pyplot as plt

title = "A Basic Horizontal Bar Chart"
bar_labels = ["A", "B", "C", "D", "E"]

# some random data...
mock_data = [1764.498, 819.882, 300.161, 452.0789, 305.345]

fig, ax = plt.subplots()
plt.rcdefaults()
y_pos = range(len(bar_labels))

primary_bar_height = 0.8
ax.barh(y_pos,
        mock_data,
        primary_bar_height,
        align='center',
        color='green',
        ecolor='black')

ax.set_yticks(y_pos)
ax.set_yticklabels(bar_labels)

# Order the labels on the 
# chart from top to bottom
ax.invert_yaxis()

ax.set_xlabel('Count of ... stuff')
ax.set_title(title)
plt.tight_layout()
plt.show()

This will produce the following chart.

basic horizontal bar chart in matplotlib
A basic horizontal bar chart in matplotlib.

Embedding bars within bars.

From the basic horizontal bar chart above it isn’t that far a leap to embedding bars. All we are doing is adding the additional bars with a smaller ‘height’ (you can think of it as width if you prefer) but using the same vertical position as the first bar.

ax.barh(y_pos, <-- Don't add an offset here
        nested_data_p2,
        nested_bar_height,   <-- Make the nested bar smaller.
        color="#29a1d5",
        label="Records Query",
        left=nested_data_p1    <-- Offset the starting point in the x axis
 )

Additionally, for each nested bar after the first one we need to adjust the horizontal starting point.

To illustrate this we’ll add two smaller bars inside each of the main bars from the first chart above. The additional code is highlighted in blue.

import matplotlib.pyplot as plt
from pprint import pprint

title = "A Basic Horizontal Bar Chart"
bar_labels = ["A", "B", "C", "D", "E"]

# some random data...
mock_data = [1764.498, 819.882, 300.161, 452.0789, 305.345]

nested_data_p1 = [134.5, 45.0, 75.4, 128.0, 68.3]
nested_data_p2 = [435.0, 345.0, 105.4, 30.0, 55.3]

fig, ax = plt.subplots()
plt.rcdefaults()
y_pos = range(len(bar_labels))

# Set up the primary bars.
primary_bar_height = 0.8
ax.barh(y_pos,
        mock_data,
        primary_bar_height,
        align='center',
        color='green',
        ecolor='black')

# Now add the first set of nested bars.
nested_bar_height = 0.4
ax.barh(y_pos, 
        nested_data_p1,
        nested_bar_height, 
        color="#f6ff33",
        label="Count Query")

# Add the second set. Adjust the 
# starting point on the left
# side by setting left appropriately. 
# In this case we can just
# use the size of the first set of
# nested bars above.
ax.barh(y_pos,
        nested_data_p2,
        nested_bar_height,
        color="#33ffff",
        label="Records Query",
        left=nested_data_p1)

# And finally complete the chart. This part
# is the same as the basic bar chart.
ax.set_yticks(y_pos)
ax.set_yticklabels(bar_labels)

# Order the labels on the chart from top to bottom
ax.invert_yaxis()

ax.set_xlabel('Count of ... stuff')
ax.set_title(title)
plt.tight_layout()
plt.show()

The resulting chart looks like this:

And that’s it.  There are some other variations that can be done as well.  You aren’t obligated to start the inner bar at the left for instance — you can position that more in the middle of the bar if that makes sense for your data.

Finally, this kind of bar chart is similar to a Bullet chart, as described by Stephen Few. With a bit more work the code above could produce a bullet chart too.