hansontechsolutions.com

Mastering Circular Visualization in Python with Pycirclize

Written on

Chapter 1: Introduction to Circular Visualization

Data visualization is essential for analyzing and conveying complex datasets. While conventional charts such as bar graphs and line plots are commonly employed, circular visualizations provide a distinctive and engaging method for data representation. The Pycirclize library in Python is particularly noteworthy for its capability to generate impressive circular plots.

In the field of data visualization, circular plots have become increasingly popular due to their efficiency in presenting intricate information in an intuitive and aesthetically pleasing format. This article will delve into the features, functionalities, and applications of Pycirclize, equipping you with the knowledge to craft captivating circular visualizations that convey meaningful insights.

Installation

To begin using Pycirclize, ensure that you have Python version 3.8 or higher installed. You can easily install Pycirclize via either pip or conda:

# Install using pip

pip install pycirclize

# Install using conda

conda install -c conda-forge pycirclize

Key Features of Pycirclize

  1. Circular Layout Design: Pycirclize utilizes a circular layout featuring sectors and tracks, allowing for organized visualization of diverse data types. You can designate sectors for different categories and incorporate multiple tracks within each sector to display various data types. The library offers detailed control over the dimensions, spacing, and aesthetics of sectors and tracks.
  2. Plotting Data on Tracks: Pycirclize includes a comprehensive array of plotting functions to visualize data on tracks. You can plot lines, points, bars, heatmaps, and more, with intuitive customization options for colors, line styles, and marker shapes. Additionally, you can enhance the clarity of your visualizations with axis ticks, labels, and grids.
  3. Plotting Links: A standout feature of Pycirclize is its capacity to plot links within or between data points in sectors. Links enable the visualization of relationships or flows among different entities, with flexible customization options for appearance, including color and direction.
  4. Chord Diagrams: Pycirclize simplifies the creation of chord diagrams from matrix data, effectively illustrating relationships and flows. The library seamlessly converts matrix data into visually appealing chord diagrams, offering customizable color schemes and labels.
  5. Phylogenetic Trees: For applications in bioinformatics and evolutionary studies, Pycirclize allows for the visualization of phylogenetic trees in a circular format, with options for customizing branch lengths and annotating nodes.
  6. Genomic Data Visualization: Specialized functions for visualizing genomic data, such as gene positions and sequence alignments, are available in Pycirclize. The library supports data from GFF and GenBank files, enabling informative genomic plots.
  7. Customization and Flexibility: Pycirclize offers extensive customization features to tailor the appearance of circular visualizations, integrating seamlessly with Matplotlib for enhanced control.

Understanding the Basics of Pycirclize

Pycirclize is a Python package built on the widely-used Matplotlib library. It is inspired by the R package circlize and aims to provide a user-friendly interface for generating various circular visualizations. With Pycirclize, you can create Circos plots, chord diagrams, phylogenetic trees, and more, making it a versatile tool across different fields, including genomics and network analysis.

At its core, Pycirclize employs a circular layout comprising sectors and tracks. Sectors represent distinct categories, while tracks facilitate the plotting of diverse data types. This modular design enables clear organization and visualization of complex datasets.

Designing the Circular Layout

One of the notable strengths of Pycirclize is its flexibility in designing the circular layout. You have detailed control over the size, spacing, and appearance of sectors and tracks. Here's an example of how to set up sectors and tracks:

from pycirclize import Circos

sectors = {"A": 10, "B": 15, "C": 12, "D": 20, "E": 15}

circos = Circos(sectors, space=5)

for sector in circos.sectors:

sector.text(f"Sector: {sector.name}", r=110, size=15)

track1 = sector.add_track((80, 100), r_pad_ratio=0.1)

track1.axis()

track2 = sector.add_track((55, 75), r_pad_ratio=0.1)

track2.axis()

track3 = sector.add_track((30, 50), r_pad_ratio=0.1)

track3.axis()

fig = circos.plotfig()

In this example, we define five sectors (A, B, C, D, E) with their respective sizes and a space of 5 degrees between them. We then iterate over each sector and add three tracks within specific radius ranges. The r_pad_ratio parameter helps control the padding between tracks, and the circular layout is plotted with circos.plotfig().

Plotting Data on Tracks

Pycirclize provides a rich set of functions to visualize various data types on tracks, accommodating lines, points, bars, or heatmaps. Below is an example of plotting lines and points on tracks:

import numpy as np

np.random.seed(0)

sectors = {"A": 10, "B": 20, "C": 15}

circos = Circos(sectors, space=5)

for sector in circos.sectors:

track = sector.add_track((80, 100), r_pad_ratio=0.1)

track.axis()

track.xticks_by_interval(1)

vmin, vmax = 0, 10

x = np.linspace(track.start, track.end, int(track.size) * 5 + 1)

y = np.random.randint(vmin, vmax, len(x))

track.line(x, y)

track.scatter(x, y)

fig = circos.plotfig()

In this example, three sectors (A, B, C) are created, with a track added to each. Random data points are generated using NumPy and plotted as lines and points.

Creating Chord Diagrams

Chord diagrams are effective for visualizing relationships and flows. Pycirclize simplifies the creation of these diagrams from matrix data. Here’s an example:

import pandas as pd

row_names = ["F1", "F2", "F3"]

col_names = ["T1", "T2", "T3", "T4", "T5", "T6"]

matrix_data = [

[10, 16, 7, 7, 10, 8],

[4, 9, 10, 12, 12, 7],

[17, 13, 7, 4, 20, 4],

]

matrix_df = pd.DataFrame(matrix_data, index=row_names, columns=col_names)

circos = Circos.initialize_from_matrix(

matrix_df,

space=5,

cmap="tab10",

label_kws=dict(size=12),

link_kws=dict(ec="black", lw=0.5, direction=1),

)

fig = circos.plotfig()

This example utilizes a matrix DataFrame created with Pandas, initializing the Circos instance from the matrix data. The library automatically determines the sector sizes based on matrix values, generating a chord diagram.

Visualizing Phylogenetic Trees

Pycirclize also supports the visualization of phylogenetic trees in a circular layout, which is valuable for evolutionary studies. Here's an example of plotting a phylogenetic tree:

from pycirclize.utils import load_example_tree_file, ColorCycler

from matplotlib.lines import Line2D

tree_file = load_example_tree_file("large_example.nwk")

circos, tv = Circos.initialize_from_tree(

tree_file,

r_lim=(30, 100),

leaf_label_size=5,

line_kws=dict(color="lightgrey", lw=1),

)

group_name2species_list = {

"Monotremata": ["Tachyglossus_aculeatus", "Ornithorhynchus_anatinus"],

"Marsupialia": ["Monodelphis_domestica", "Vombatus_ursinus"],

"Xenarthra": ["Choloepus_didactylus", "Dasypus_novemcinctus"],

"Afrotheria": ["Trichechus_manatus", "Chrysochloris_asiatica"],

"Euarchontes": ["Galeopterus_variegatus", "Theropithecus_gelada"],

"Glires": ["Oryctolagus_cuniculus", "Microtus_oregoni"],

"Laurasiatheria": ["Talpa_occidentalis", "Mirounga_leonina"],

}

ColorCycler.set_cmap("tab10")

group_name2color = {name: ColorCycler() for name in group_name2species_list.keys()}

for group_name, species_list in group_name2species_list.items():

color = group_name2color[group_name]

tv.set_node_line_props(species_list, color=color, apply_label_color=True)

fig = circos.plotfig()

_ = circos.ax.legend(

handles=[Line2D([], [], label=n, color=c) for n, c in group_name2color.items()],

labelcolor=group_name2color.values(),

fontsize=6,

loc="center",

bbox_to_anchor=(0.5, 0.5),

)

In this example, a phylogenetic tree is loaded from a Newick file, initializing the Circos instance with specified properties. Colors are assigned to groups, and the tree is plotted with a central legend.

Visualizing Genomic Data

Pycirclize provides specialized functions for visualizing genomic data, such as gene positions and features. Below is an example of visualizing genomic features from a GenBank file:

from pycirclize.utils import fetch_genbank_by_accid

from pycirclize.parser import Genbank

gbk_fetch_data = fetch_genbank_by_accid("NC_002483")

gbk = Genbank(gbk_fetch_data)

circos = Circos(sectors={gbk.name: gbk.range_size})

circos.text(f"Escherichia coli K-12 plasmid Fnn{gbk.name}", size=14)

circos.rect(r_lim=(90, 100), fc="lightgrey", ec="none", alpha=0.5)

sector = circos.sectors[0]

f_cds_track = sector.add_track((95, 100))

f_cds_feats = gbk.extract_features("CDS", target_strand=1)

f_cds_track.genomic_features(f_cds_feats, plotstyle="arrow", fc="salmon", lw=0.5)

r_cds_track = sector.add_track((90, 95))

r_cds_feats = gbk.extract_features("CDS", target_strand=-1)

r_cds_track.genomic_features(r_cds_feats, plotstyle="arrow", fc="skyblue", lw=0.5)

labels, label_pos_list = [], []

for feat in gbk.extract_features("CDS"):

start = int(str(feat.location.start))

end = int(str(feat.location.end))

label_pos = (start + end) / 2

gene_name = feat.qualifiers.get("gene", [None])[0]

if gene_name is not None:

labels.append(gene_name)

label_pos_list.append(label_pos)

f_cds_track.xticks(label_pos_list, labels, label_size=6, label_orientation="vertical")

r_cds_track.xticks_by_interval(

10000, outer=False, label_formatter=lambda v: f"{v/1000:.1f} Kb"

)

circos.savefig("example02.png")

In this scenario, a GenBank file is fetched and parsed, initializing the Circos instance with the genome size. Tracks for forward and reverse CDS features are added and plotted, along with gene names as labels.

Conclusion

Pycirclize stands out as a powerful and adaptable library for generating remarkable circular visualizations in Python. With its user-friendly API, extensive customization options, and support for various plot types, Pycirclize simplifies the transformation of complex datasets into visually engaging and informative plots.

This article has explored the essential features and functionalities of Pycirclize, covering circular layout design, data plotting on tracks, link visualization, chord diagram creation, phylogenetic tree plotting, and genomic data visualization. The provided code snippets demonstrate the ease and flexibility of using Pycirclize for diverse circular visualizations.

Whether you are a data scientist, researcher, or developer working with genomic data, network analysis, or other domains that benefit from circular representations, Pycirclize equips you with the tools to craft compelling visualizations that effectively convey your insights.

By harnessing the capabilities of Pycirclize, you can gain fresh perspectives on your data, uncover hidden patterns, and present your findings in a captivating manner. The library's seamless integration with Matplotlib and its expanding ecosystem of extensions further enhance its capabilities for various visualization needs.

Embark on your journey with Pycirclize today and unlock the full potential of circular visualization in Python!

This video titled "Python: Mastering Data Visualization with PyPlot Library in 12 Minutes" provides a quick overview of data visualization techniques in Python.

The video "CS50 Final Project: Pathfinding Algorithm Visualizer (Python, tkinter)" showcases a project that visualizes pathfinding algorithms using Python and tkinter.

Visit us at DataDrivenInvestor.com

Subscribe to DDIntel here.

Have a unique story to share? Submit to DDIntel here.

Join our creator ecosystem here.

DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.

Follow us on LinkedIn, Twitter, YouTube, and Facebook.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

How to Differentiate Yourself as a Software Engineer

Discover key strategies to stand out and gain recognition as a software engineer in your workplace.

Revolutionizing PDF Generation in C# with QuestPdf

Discover how to effortlessly create PDFs in C# using the open-source library QuestPdf, making report generation a breeze.

Cybersecurity Career Insights: Key Advice for 2024

Discover essential career advice for cybersecurity professionals as we navigate the challenges and opportunities of 2024.