EMBL-EBI User Survey 2024

Do data resources managed by EMBL-EBI and our collaborators make a difference to your work?

Please take 10 minutes to fill in our annual user survey, and help us make the case for why sustaining open data resources is critical for life sciences research.

Survey link: https://www.surveymonkey.com/r/HJKYKTT?channel=[webpage]

Protein feature annotation

The InterProScan 5 pipeline [1] is used to annotate the translations of gene models for each genome. The pipeline scans sequences against InterPro [2] signatures to identify protein families and domains. InterPro signatures are predictive models, provided by the different databases that make up the InterPro consortium. In addition, coiled coils, signal peptides, transmembrane domains, and low complexity regions are annotated with ncoils, SignalP, TMHMM, and seg, respectively.

The InterPro families and domains are often associated with Gene Ontology (GO) terms and pathways, and this information is loaded when protein features are annotated. For example, the IPR013483 family is linked, by InterPro, to two GO terms and a KEGG pathway reference; consequently, these will be added as cross-references to any translation that is annotated with the IPR013483 family.

In the genome browser, protein features are displayed on the 'Splice variants' page (when viewing a gene) and on the 'Protein summary' and 'Domains & features' pages (when viewing a transcript).

References

  1. Jones P et al. (2011) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236-1240
  2. Mitchell A et al. (2014) The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res.