Protein feature annotation

The InterProScan 5 pipeline [1] is used to annotate the translations of gene models for each genome. The pipeline scans sequences against InterPro [2] signatures to identify protein families and domains. InterPro signatures are predictive models, provided by the different databases that make up the InterPro consortium. In addition, coiled coils, signal peptides, transmembrane domains, and low complexity regions are annotated with ncoils, SignalP, TMHMM, and seg, respectively.

The InterPro families and domains are often associated with Gene Ontology (GO) terms and pathways, and this information is loaded when protein features are annotated. For example, the IPR013483 family is linked, by InterPro, to two GO terms and a KEGG pathway reference; consequently, these will be added as cross-references to any translation that is annotated with the IPR013483 family.

In the genome browser, protein features are displayed on the 'Splice variants' page (when viewing a gene) and on the 'Protein summary' and 'Domains & features' pages (when viewing a transcript).

References

Jones P et al. (2011) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240
Mitchell A et al. (2014) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res.

EMBL-EBI User Survey 2024

Protein feature annotation

References

About Us

Get help

Our sister sites

Follow us