Protein feature annotation

The InterProScan 5 pipeline [1] is used to annotate the translations of gene models for each genome. The pipeline scans sequences against InterPro [2] signatures to identify protein families and domains. InterPro signatures are predictive models, provided by the different databases that make up the InterPro consortium. In addition, coiled coils, signal peptides, transmembrane domains, and low complexity regions are annotated with ncoils, SignalP, TMHMM, and seg, respectively.

The InterPro families and domains are often associated with Gene Ontology (GO) terms and pathways, and this information is loaded when protein features are annotated. For example, the IPR013483 family is linked, by InterPro, to two GO terms and a KEGG pathway reference; consequently, these will be added as cross-references to any translation that is annotated with the IPR013483 family.

In the genome browser, protein features are displayed on the 'Splice variants' page (when viewing a gene) and on the 'Protein summary' and 'Domains & features' pages (when viewing a transcript).

References

  1. Jones P et al. (2011) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240
  2. Mitchell A et al. (2014) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res.