Time Series Visualization

What you’ll learn in this module

This module explores how to visualize time series data effectively, moving beyond simple line charts to reveal true underlying patterns.

You’ll learn:

  • How line plots create continuity illusions and when to use discrete representations instead.
  • Why small multiples solve the spaghetti problem when comparing multiple time series.
  • The crucial difference between linear and log scales and when each tells the honest story.
  • How moving averages balance noise and trend to reveal underlying patterns.
  • Techniques for showing uncertainty through ribbon plots and prediction intervals.
  • How heatmaps and cycle plots expose temporal rhythms like daily patterns and seasonality.
  • What lag plots reveal about autocorrelation and the memory of time series data.

The nature of time

Let’s talk about time. In March 2020, charts of COVID-19 cases told vastly different stories depending on how they were visualized. Some used linear scales, showing a terrifying vertical wall. Others used log scales, showing a straight line.

Why did the same data produce such different narratives? Time series data is special because it implies causality and momentum. Unlike other variables, time flows in one direction. Your choices of scale, aggregation, and geometry determine whether you reveal a genuine pattern or manufacture a misleading narrative.

Line plots and the continuity illusion

The most fundamental choice is whether to connect the dots. A line plot suggests continuity, implying that a value exists at every moment between your measurements. This works for temperature or stock prices, where the variable has momentum.

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style("white")
sns.set(font_scale=1.2)

# Generate synthetic time series with trend and seasonality
np.random.seed(42)
n_points = 365
dates = pd.date_range('2023-01-01', periods=n_points, freq='D')
trend = np.linspace(100, 150, n_points)
seasonal = 10 * np.sin(2 * np.pi * np.arange(n_points) / 365 * 4)  # Quarterly seasonality
noise = np.random.normal(0, 3, n_points)
values = trend + seasonal + noise

df = pd.DataFrame({'date': dates, 'value': values})

# Create line plot
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df['date'], df['value'], linewidth=1.5, color=sns.color_palette()[0])
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.set_title('Daily Time Series: Line Plot Shows Trend and Seasonality')
ax.grid(True, alpha=0.3)
sns.despine()

Basic line plot showing a time series with trend and seasonality

But what if your data is discrete? If you plot distinct sales events or email arrivals as a line, you create a false narrative of values existing in the gaps. In those cases, let the silence between points speak.

Code
# Generate sparse discrete event data
np.random.seed(123)
event_dates = pd.to_datetime(['2023-01-15', '2023-03-10', '2023-05-22',
                               '2023-07-08', '2023-09-30', '2023-11-15'])
event_values = np.random.randint(20, 80, len(event_dates))

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Line plot (implies continuity - misleading for discrete events)
axes[0].plot(event_dates, event_values, marker='o', linewidth=2, markersize=8)
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Event Count')
axes[0].set_title('Line Plot: Implies Values Between Events (Misleading)')
axes[0].grid(True, alpha=0.3)

# Scatter plot (appropriate for discrete events)
axes[1].scatter(event_dates, event_values, s=100, alpha=0.7)
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Event Count')
axes[1].set_title('Scatter Plot: Shows Only Observed Events (Honest)')
axes[1].grid(True, alpha=0.3)

for ax in axes:
    sns.despine(ax=ax)

plt.tight_layout()

Line plot vs scatter plot: connecting points implies continuity

Comparing series: The spaghetti problem

Often you need to compare multiple series. The natural instinct is to overlay them on the same plot. This works for two or three variables.

But what happens when the count rises? You fall into the spaghetti trap where individual trends get lost in the tangle.

Code
# Generate three related time series
np.random.seed(42)
dates = pd.date_range('2023-01-01', periods=200, freq='D')

series_a = 100 + np.linspace(0, 30, 200) + np.random.normal(0, 5, 200)
series_b = 95 + np.linspace(0, 20, 200) + np.random.normal(0, 4, 200)
series_c = 110 + np.linspace(0, 10, 200) + np.random.normal(0, 6, 200)

df_multi = pd.DataFrame({
    'date': dates,
    'Product A': series_a,
    'Product B': series_b,
    'Product C': series_c
})

# Overlay plot
fig, ax = plt.subplots(figsize=(12, 6))
for column in ['Product A', 'Product B', 'Product C']:
    ax.plot(df_multi['date'], df_multi[column], linewidth=2, label=column, alpha=0.8)

ax.set_xlabel('Date')
ax.set_ylabel('Sales')
ax.set_title('Multiple Time Series: Overlaid Comparison')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
sns.despine()

Multiple time series overlaid with different colors

The solution is small multiples (or faceting). By giving each series its own stage while locking the axes, you preserve both the individual trends and the ability to compare them.

Code
# Generate multiple time series
np.random.seed(42)
n_series = 6
dates = pd.date_range('2023-01-01', periods=150, freq='D')

data_list = []
for i in range(n_series):
    values = 50 + np.random.randn(150).cumsum() + 10 * np.sin(2 * np.pi * np.arange(150) / 30)
    data_list.append(pd.DataFrame({
        'date': dates,
        'value': values,
        'series': f'Region {i+1}'
    }))

df_many = pd.concat(data_list, ignore_index=True)

# Small multiples using seaborn FacetGrid
g = sns.FacetGrid(df_many, col='series', col_wrap=3, height=3, aspect=1.5, sharey=True)
g.map_dataframe(sns.lineplot, x='date', y='value', linewidth=2, color=sns.color_palette()[0])
g.set_axis_labels('Date', 'Value')
g.set_titles('Region {col_name}')
for ax in g.axes.flat:
    ax.grid(True, alpha=0.3)
    sns.despine(ax=ax)

plt.tight_layout()

Small multiples avoid spaghetti plots when comparing many time series

The power of scale: Linear vs Log

Perhaps the most consequential choice in time series visualization is the y-axis scale. Your choice defines the question you are answering.

A linear scale asks “How much did it increase?” A log scale asks “How fast is it growing?” In the example below, the linear scale suggests an explosive crisis at the end. The log scale reveals that the growth rate has been constant the entire time.

Code
# Generate exponential growth data (e.g., epidemic spread)
np.random.seed(42)
days = np.arange(0, 100)
cases = 10 * np.exp(0.05 * days) * (1 + np.random.normal(0, 0.1, len(days)))

df_exp = pd.DataFrame({'day': days, 'cases': cases})

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Linear scale
axes[0].plot(df_exp['day'], df_exp['cases'], linewidth=2, color=sns.color_palette()[0])
axes[0].set_xlabel('Days')
axes[0].set_ylabel('Cases')
axes[0].set_title('Linear Scale: Exponential Growth Looks Explosive')
axes[0].grid(True, alpha=0.3)

# Log scale
axes[1].plot(df_exp['day'], df_exp['cases'], linewidth=2, color=sns.color_palette()[1])
axes[1].set_xlabel('Days')
axes[1].set_ylabel('Cases (log scale)')
axes[1].set_yscale('log')
axes[1].set_title('Log Scale: Exponential Growth Appears Linear')
axes[1].grid(True, alpha=0.3, which='both')

for ax in axes:
    sns.despine(ax=ax)

plt.tight_layout()

The same exponential growth looks different on linear vs. log scales
When to use log scales

Log scales are essential for data spanning orders of magnitude or when percentage changes matter more than absolute units. However, they can downplay absolute magnitude.

A jump from 100 to 1,000 looks the same as 10,000 to 100,000. This equality can be honest or deceptive depending on your question.

Showing uncertainty

Predicting the future is an exercise in humility. A forecast without an error bar is a lie of precision. Use ribbon plots to visualize the widening cone of uncertainty as time moves forward.

Code
# Generate data with trend
np.random.seed(42)
n = 150
x = np.arange(n)
true_trend = 50 + 0.3 * x
observed = true_trend + np.random.normal(0, 5, n)

# Simple linear forecast
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x[:100], observed[:100])

# Forecast period
x_future = np.arange(100, 150)
y_pred = slope * x_future + intercept

# Estimate prediction interval (simplified)
residuals = observed[:100] - (slope * x[:100] + intercept)
std_residual = np.std(residuals)
margin = 1.96 * std_residual  # 95% prediction interval

# Plot
fig, ax = plt.subplots(figsize=(12, 6))

# Historical data
ax.plot(x[:100], observed[:100], linewidth=2, label='Historical Data', color=sns.color_palette()[0])

# Forecast with uncertainty
ax.plot(x_future, y_pred, linewidth=2, label='Forecast', color=sns.color_palette()[1], linestyle='--')
ax.fill_between(x_future, y_pred - margin, y_pred + margin,
                alpha=0.3, color=sns.color_palette()[1], label='95% Prediction Interval')

# Actual future (for comparison)
ax.plot(x_future, observed[100:], linewidth=1.5, alpha=0.5, label='Actual (for comparison)',
        color='gray', linestyle=':')

ax.axvline(x=100, color='black', linestyle=':', alpha=0.5, label='Forecast Start')
ax.set_xlabel('Time')
ax.set_ylabel('Value')
ax.set_title('Time Series Forecast with Uncertainty Bands')
ax.legend()
ax.grid(True, alpha=0.3)
sns.despine()

Ribbon plots show uncertainty bands around predictions

The rhythm of time: Heatmaps and cycles

Time often cycles rather than marches. Does your data have a heartbeat? Heatmaps and cycle plots break the linear narrative to reveal patterns like daily lulls, weekend spikes, or seasonal waves.

Code
# Generate synthetic hourly data with daily and weekly patterns
np.random.seed(42)
hours = pd.date_range('2023-01-01', periods=24*7*4, freq='H')  # 4 weeks

# Patterns: higher activity during business hours and weekdays
hour_of_day = hours.hour
day_of_week = hours.dayofweek

# Activity pattern
base_activity = 20
hour_effect = 30 * np.exp(-((hour_of_day - 14)**2) / 20)  # Peak at 2 PM
weekday_effect = np.where(day_of_week < 5, 20, -10)  # Weekdays higher
noise = np.random.normal(0, 5, len(hours))

activity = base_activity + hour_effect + weekday_effect + noise

df_hourly = pd.DataFrame({
    'datetime': hours,
    'activity': activity,
    'hour': hour_of_day,
    'day_name': hours.day_name(),
    'week': (hours.day // 7) + 1
})

# Take first week for heatmap
df_week = df_hourly[df_hourly['week'] == 1].copy()

# Pivot for heatmap
heatmap_data = df_week.pivot_table(values='activity',
                                     index='hour',
                                     columns='day_name',
                                     aggfunc='mean')

# Reorder columns to start with Monday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
heatmap_data = heatmap_data[[day for day in day_order if day in heatmap_data.columns]]

# Plot heatmap
fig, ax = plt.subplots(figsize=(12, 8))
sns.heatmap(heatmap_data, cmap='YlOrRd', annot=False, fmt='.0f',
            cbar_kws={'label': 'Activity Level'}, ax=ax)
ax.set_xlabel('Day of Week')
ax.set_ylabel('Hour of Day')
ax.set_title('Temporal Heatmap: Activity by Hour and Day of Week')
plt.tight_layout()
/var/folders/j7/9dgqq5g53vnbsbmvh2yqtckr0000gr/T/ipykernel_55822/3637488205.py:3: FutureWarning: 'H' is deprecated and will be removed in a future version, please use 'h' instead.
  hours = pd.date_range('2023-01-01', periods=24*7*4, freq='H')  # 4 weeks

Heat map reveals daily and weekly patterns in temporal data
Code
# Generate monthly data with strong annual seasonality
np.random.seed(42)
months = pd.date_range('2020-01-01', periods=48, freq='M')
month_num = np.tile(np.arange(1, 13), 4)  # 4 years of monthly data

# Seasonal pattern (higher in summer, lower in winter)
seasonal_effect = 20 * np.sin(2 * np.pi * (month_num - 3) / 12)
trend_effect = 0.5 * np.arange(48)
noise = np.random.normal(0, 3, 48)

values = 50 + seasonal_effect + trend_effect + noise

df_seasonal = pd.DataFrame({
    'date': months,
    'value': values,
    'month': month_num,
    'year': months.year,
    'month_name': months.month_name()
})

# Create cycle plot
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Traditional time series
axes[0].plot(df_seasonal['date'], df_seasonal['value'], marker='o', linewidth=2)
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Value')
axes[0].set_title('Traditional Time Series: Seasonality Repeats')
axes[0].grid(True, alpha=0.3)

# Cycle plot
month_names_short = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                     'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
for year in df_seasonal['year'].unique():
    year_data = df_seasonal[df_seasonal['year'] == year]
    axes[1].plot(year_data['month'], year_data['value'], marker='o',
                linewidth=2, label=str(year), alpha=0.7)

axes[1].set_xlabel('Month')
axes[1].set_ylabel('Value')
axes[1].set_xticks(range(1, 13))
axes[1].set_xticklabels(month_names_short)
axes[1].set_title('Cycle Plot: Each Year Overlaid to Show Seasonal Pattern')
axes[1].legend(title='Year')
axes[1].grid(True, alpha=0.3)

for ax in axes:
    sns.despine(ax=ax)

plt.tight_layout()
/var/folders/j7/9dgqq5g53vnbsbmvh2yqtckr0000gr/T/ipykernel_55822/3601020590.py:3: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  months = pd.date_range('2020-01-01', periods=48, freq='M')

Cycle plot reveals seasonal patterns by separating each cycle

The memory of the past: Autocorrelation

Does the past predict the future? Lag plots visualize the system’s memory by plotting x_t against x_{t-1}. A tight diagonal implies strong memory (autocorrelation) while a scattered cloud implies random noise.

Code
# Generate time series with autocorrelation
np.random.seed(42)
n = 200

# AR(1) process: strong autocorrelation
ar_series = np.zeros(n)
ar_series[0] = np.random.normal(0, 1)
for i in range(1, n):
    ar_series[i] = 0.7 * ar_series[i-1] + np.random.normal(0, 1)

# Random walk: perfect autocorrelation at lag 1
random_walk = np.random.normal(0, 1, n).cumsum()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Lag-1 plot for AR(1) series
axes[0].scatter(ar_series[:-1], ar_series[1:], alpha=0.6, s=30)
axes[0].set_xlabel('Value at time t')
axes[0].set_ylabel('Value at time t+1')
axes[0].set_title('Lag-1 Plot: Strong Autocorrelation (AR Process)')
axes[0].plot([-3, 3], [-3, 3], 'r--', alpha=0.5, linewidth=1)
axes[0].grid(True, alpha=0.3)

# Lag-1 plot for random walk
axes[1].scatter(random_walk[:-1], random_walk[1:], alpha=0.6, s=30, color=sns.color_palette()[1])
axes[1].set_xlabel('Value at time t')
axes[1].set_ylabel('Value at time t+1')
axes[1].set_title('Lag-1 Plot: Perfect Autocorrelation (Random Walk)')
axes[1].plot([random_walk.min(), random_walk.max()],
            [random_walk.min(), random_walk.max()], 'r--', alpha=0.5, linewidth=1)
axes[1].grid(True, alpha=0.3)

for ax in axes:
    sns.despine(ax=ax)

plt.tight_layout()

Lag plots reveal autocorrelation structure in time series
Summary

Time series visualization is about making choices that honestly represent temporal patterns. By following these principled visualization practices, you ensure your temporal data tells its true story, not the story you wish it told.