
Bioinformatics with Python Cookbook
By :

Having a genome sequence is interesting, but we will want to extract features from it: genes, exons, and coding sequences. This type of annotation information is made available in GFF and GTF files. GFF stands for Generic Feature Format. In this recipe, we will see how to parse and analyze GFF files, using the annotation of the Anopheles gambiae genome as an example.
We will use the gffutils
library to process the annotation file.
If you do not use the notebook, you need to acquire the annotation file from our datasets page at https://github.com/tiagoantao/bioinf-python/blob/master/notebooks/Datasets.ipynb (file gambiae.gff3.gz
) Rename the annotation file as gambiae.gff.gz
. Preferably, use the 02_Genomes/Annotations.ipynb
notebook, which is provided in the code bundle of the book.
Let's take a look at the following steps:
gffutils
based on our GFF file:import gffutils import...
Change the font size
Change margin width
Change background colour