Mastering Circular Visualization in Python with Pycirclize
Written on
Chapter 1: Introduction to Circular Visualization
Data visualization is essential for analyzing and conveying complex datasets. While conventional charts such as bar graphs and line plots are commonly employed, circular visualizations provide a distinctive and engaging method for data representation. The Pycirclize library in Python is particularly noteworthy for its capability to generate impressive circular plots.
In the field of data visualization, circular plots have become increasingly popular due to their efficiency in presenting intricate information in an intuitive and aesthetically pleasing format. This article will delve into the features, functionalities, and applications of Pycirclize, equipping you with the knowledge to craft captivating circular visualizations that convey meaningful insights.
Installation
To begin using Pycirclize, ensure that you have Python version 3.8 or higher installed. You can easily install Pycirclize via either pip or conda:
# Install using pip
pip install pycirclize
# Install using conda
conda install -c conda-forge pycirclize
Key Features of Pycirclize
- Circular Layout Design: Pycirclize utilizes a circular layout featuring sectors and tracks, allowing for organized visualization of diverse data types. You can designate sectors for different categories and incorporate multiple tracks within each sector to display various data types. The library offers detailed control over the dimensions, spacing, and aesthetics of sectors and tracks.
- Plotting Data on Tracks: Pycirclize includes a comprehensive array of plotting functions to visualize data on tracks. You can plot lines, points, bars, heatmaps, and more, with intuitive customization options for colors, line styles, and marker shapes. Additionally, you can enhance the clarity of your visualizations with axis ticks, labels, and grids.
- Plotting Links: A standout feature of Pycirclize is its capacity to plot links within or between data points in sectors. Links enable the visualization of relationships or flows among different entities, with flexible customization options for appearance, including color and direction.
- Chord Diagrams: Pycirclize simplifies the creation of chord diagrams from matrix data, effectively illustrating relationships and flows. The library seamlessly converts matrix data into visually appealing chord diagrams, offering customizable color schemes and labels.
- Phylogenetic Trees: For applications in bioinformatics and evolutionary studies, Pycirclize allows for the visualization of phylogenetic trees in a circular format, with options for customizing branch lengths and annotating nodes.
- Genomic Data Visualization: Specialized functions for visualizing genomic data, such as gene positions and sequence alignments, are available in Pycirclize. The library supports data from GFF and GenBank files, enabling informative genomic plots.
- Customization and Flexibility: Pycirclize offers extensive customization features to tailor the appearance of circular visualizations, integrating seamlessly with Matplotlib for enhanced control.
Understanding the Basics of Pycirclize
Pycirclize is a Python package built on the widely-used Matplotlib library. It is inspired by the R package circlize and aims to provide a user-friendly interface for generating various circular visualizations. With Pycirclize, you can create Circos plots, chord diagrams, phylogenetic trees, and more, making it a versatile tool across different fields, including genomics and network analysis.
At its core, Pycirclize employs a circular layout comprising sectors and tracks. Sectors represent distinct categories, while tracks facilitate the plotting of diverse data types. This modular design enables clear organization and visualization of complex datasets.
Designing the Circular Layout
One of the notable strengths of Pycirclize is its flexibility in designing the circular layout. You have detailed control over the size, spacing, and appearance of sectors and tracks. Here's an example of how to set up sectors and tracks:
from pycirclize import Circos
sectors = {"A": 10, "B": 15, "C": 12, "D": 20, "E": 15}
circos = Circos(sectors, space=5)
for sector in circos.sectors:
sector.text(f"Sector: {sector.name}", r=110, size=15)
track1 = sector.add_track((80, 100), r_pad_ratio=0.1)
track1.axis()
track2 = sector.add_track((55, 75), r_pad_ratio=0.1)
track2.axis()
track3 = sector.add_track((30, 50), r_pad_ratio=0.1)
track3.axis()
fig = circos.plotfig()
In this example, we define five sectors (A, B, C, D, E) with their respective sizes and a space of 5 degrees between them. We then iterate over each sector and add three tracks within specific radius ranges. The r_pad_ratio parameter helps control the padding between tracks, and the circular layout is plotted with circos.plotfig().
Plotting Data on Tracks
Pycirclize provides a rich set of functions to visualize various data types on tracks, accommodating lines, points, bars, or heatmaps. Below is an example of plotting lines and points on tracks:
import numpy as np
np.random.seed(0)
sectors = {"A": 10, "B": 20, "C": 15}
circos = Circos(sectors, space=5)
for sector in circos.sectors:
track = sector.add_track((80, 100), r_pad_ratio=0.1)
track.axis()
track.xticks_by_interval(1)
vmin, vmax = 0, 10
x = np.linspace(track.start, track.end, int(track.size) * 5 + 1)
y = np.random.randint(vmin, vmax, len(x))
track.line(x, y)
track.scatter(x, y)
fig = circos.plotfig()
In this example, three sectors (A, B, C) are created, with a track added to each. Random data points are generated using NumPy and plotted as lines and points.
Visualizing Links and Relationships
One of the most powerful features of Pycirclize is its ability to plot links within or between data points in sectors. Links allow you to visualize relationships or flows among different entities. Here’s a sample of how to create links:
sectors = {"A": 10, "B": 20, "C": 15}
name2color = {"A": "red", "B": "blue", "C": "green"}
circos = Circos(sectors, space=5)
for sector in circos.sectors:
track = sector.add_track((95, 100))
track.axis(fc=name2color[sector.name])
track.text(sector.name, color="white", size=12)
track.xticks_by_interval(1)
circos.link(("A", 0, 1), ("A", 7, 8))
circos.link(("A", 1, 2), ("A", 7, 6), color="skyblue")
circos.link(("A", 9, 10), ("B", 4, 3), direction=1, color="tomato")
circos.link(("B", 5, 7), ("C", 6, 8), direction=1, ec="black", lw=1, hatch="//")
circos.link(("B", 18, 16), ("B", 11, 13), r1=90, r2=90, color="violet", ec="red", lw=2, ls="dashed")
circos.link(("C", 1, 3), ("B", 2, 0), direction=1, color="limegreen")
circos.link(("C", 11.5, 14), ("A", 4, 3), direction=2, color="chocolate", ec="black", lw=1, ls="dotted")
fig = circos.plotfig()
This example demonstrates how to define sectors and assign colors, followed by adding tracks and plotting sector names. The circos.link() method is utilized to create links between data points, with various parameters to customize their appearance.
Creating Chord Diagrams
Chord diagrams are effective for visualizing relationships and flows. Pycirclize simplifies the creation of these diagrams from matrix data. Here’s an example:
import pandas as pd
row_names = ["F1", "F2", "F3"]
col_names = ["T1", "T2", "T3", "T4", "T5", "T6"]
matrix_data = [
[10, 16, 7, 7, 10, 8],
[4, 9, 10, 12, 12, 7],
[17, 13, 7, 4, 20, 4],
]
matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)
circos = Circos.initialize_from_matrix(
matrix_df,
space=5,
cmap="tab10",
label_kws=dict(size=12),
link_kws=dict(ec="black", lw=0.5, direction=1),
)
fig = circos.plotfig()
This example utilizes a matrix DataFrame created with Pandas, initializing the Circos instance from the matrix data. The library automatically determines the sector sizes based on matrix values, generating a chord diagram.
Visualizing Phylogenetic Trees
Pycirclize also supports the visualization of phylogenetic trees in a circular layout, which is valuable for evolutionary studies. Here's an example of plotting a phylogenetic tree:
from pycirclize.utils import load_example_tree_file, ColorCycler
from matplotlib.lines import Line2D
tree_file = load_example_tree_file("large_example.nwk")
circos, tv = Circos.initialize_from_tree(
tree_file,
r_lim=(30, 100),
leaf_label_size=5,
line_kws=dict(color="lightgrey", lw=1),
)
group_name2species_list = {
"Monotremata": ["Tachyglossus_aculeatus", "Ornithorhynchus_anatinus"],
"Marsupialia": ["Monodelphis_domestica", "Vombatus_ursinus"],
"Xenarthra": ["Choloepus_didactylus", "Dasypus_novemcinctus"],
"Afrotheria": ["Trichechus_manatus", "Chrysochloris_asiatica"],
"Euarchontes": ["Galeopterus_variegatus", "Theropithecus_gelada"],
"Glires": ["Oryctolagus_cuniculus", "Microtus_oregoni"],
"Laurasiatheria": ["Talpa_occidentalis", "Mirounga_leonina"],
}
ColorCycler.set_cmap("tab10")
group_name2color = {name: ColorCycler() for name in group_name2species_list.keys()}
for group_name, species_list in group_name2species_list.items():
color = group_name2color[group_name]
tv.set_node_line_props(species_list, color=color, apply_label_color=True)
fig = circos.plotfig()
_ = circos.ax.legend(
handles=[Line2D([], [], label=n, color=c) for n, c in group_name2color.items()],
labelcolor=group_name2color.values(),
fontsize=6,
loc="center",
bbox_to_anchor=(0.5, 0.5),
)
In this example, a phylogenetic tree is loaded from a Newick file, initializing the Circos instance with specified properties. Colors are assigned to groups, and the tree is plotted with a central legend.
Visualizing Genomic Data
Pycirclize provides specialized functions for visualizing genomic data, such as gene positions and features. Below is an example of visualizing genomic features from a GenBank file:
from pycirclize.utils import fetch_genbank_by_accid
from pycirclize.parser import Genbank
gbk_fetch_data = fetch_genbank_by_accid("NC_002483")
gbk = Genbank(gbk_fetch_data)
circos = Circos(sectors={gbk.name: gbk.range_size})
circos.text(f"Escherichia coli K-12 plasmid Fnn{gbk.name}", size=14)
circos.rect(r_lim=(90, 100), fc="lightgrey", ec="none", alpha=0.5)
sector = circos.sectors[0]
f_cds_track = sector.add_track((95, 100))
f_cds_feats = gbk.extract_features("CDS", target_strand=1)
f_cds_track.genomic_features(f_cds_feats, plotstyle="arrow", fc="salmon", lw=0.5)
r_cds_track = sector.add_track((90, 95))
r_cds_feats = gbk.extract_features("CDS", target_strand=-1)
r_cds_track.genomic_features(r_cds_feats, plotstyle="arrow", fc="skyblue", lw=0.5)
labels, label_pos_list = [], []
for feat in gbk.extract_features("CDS"):
start = int(str(feat.location.start))
end = int(str(feat.location.end))
label_pos = (start + end) / 2
gene_name = feat.qualifiers.get("gene", [None])[0]
if gene_name is not None:
labels.append(gene_name)
label_pos_list.append(label_pos)
f_cds_track.xticks(label_pos_list, labels, label_size=6, label_orientation="vertical")
r_cds_track.xticks_by_interval(
10000, outer=False, label_formatter=lambda v: f"{v/1000:.1f} Kb")
circos.savefig("example02.png")
In this scenario, a GenBank file is fetched and parsed, initializing the Circos instance with the genome size. Tracks for forward and reverse CDS features are added and plotted, along with gene names as labels.
Conclusion
Pycirclize stands out as a powerful and adaptable library for generating remarkable circular visualizations in Python. With its user-friendly API, extensive customization options, and support for various plot types, Pycirclize simplifies the transformation of complex datasets into visually engaging and informative plots.
This article has explored the essential features and functionalities of Pycirclize, covering circular layout design, data plotting on tracks, link visualization, chord diagram creation, phylogenetic tree plotting, and genomic data visualization. The provided code snippets demonstrate the ease and flexibility of using Pycirclize for diverse circular visualizations.
Whether you are a data scientist, researcher, or developer working with genomic data, network analysis, or other domains that benefit from circular representations, Pycirclize equips you with the tools to craft compelling visualizations that effectively convey your insights.
By harnessing the capabilities of Pycirclize, you can gain fresh perspectives on your data, uncover hidden patterns, and present your findings in a captivating manner. The library's seamless integration with Matplotlib and its expanding ecosystem of extensions further enhance its capabilities for various visualization needs.
Embark on your journey with Pycirclize today and unlock the full potential of circular visualization in Python!
This video titled "Python: Mastering Data Visualization with PyPlot Library in 12 Minutes" provides a quick overview of data visualization techniques in Python.
The video "CS50 Final Project: Pathfinding Algorithm Visualizer (Python, tkinter)" showcases a project that visualizes pathfinding algorithms using Python and tkinter.
Visit us at DataDrivenInvestor.com
Subscribe to DDIntel here.
Have a unique story to share? Submit to DDIntel here.
Join our creator ecosystem here.
DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.
Follow us on LinkedIn, Twitter, YouTube, and Facebook.