Ensembl Variant Effect Predictor Download and install

Note

To address potential installation challenges with Ensembl VEP dependencies on newer systems, we recommend using Docker or Singularity containers.

These containerised environments enable to access all the latest Ensembl VEP features without the hassle of installing dependencies directly on your system. Additionally, they are highly compatible with computer clusters.

Download

Download ensembl-vep package (see below the different ways to download it) and then follow the installation instructions.

Using Git

Clone the Git repository

Use git to download the ensembl-vep package:

git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep

Update to a newer version

To update from a previous version:

cd ensembl-vep
git pull
git checkout release/115
perl INSTALL.pl

Use an older version

To use an older version (this example shows how to set up release 87):

cd ensembl-vep
git checkout release/87
perl INSTALL.pl

Download the Zipped package file

Users without the git utility installed may download a zip file from GitHub, though we would always recommend using git if possible.

curl -L -O https://github.com/Ensembl/ensembl-vep/archive/release/115.zip
unzip 115.zip
cd ensembl-vep-release-115/

Previous versions (ensembl-tools)

Previously, Ensembl VEP was available as part of the ensembl-tools package (see the Ensembl archive site for documentation). The following downloads are available for archival purposes. Show versions

What's new?

New in version 115 (September 2025)

Added Ensembl VEP support for annotating structural variants with allele frequencies from gnomAD and clinical significance (CLINSIG) from ClinVar.
Added Ensembl VEP and Ensembl Variation API support for the new ClinVar somatic classifications.
We have enabled support for GENCODE promoters, variants falling within them can now be annotated with details of the promoter
New plugin (on CLI):

MechPredict

Previous version history - from version 88: Show

New in version 114 (May 2025)

MAVE data has been updated from the latest version of MaveDB, representing a nearly 6.5 fold increase in variants covered (~1.2 million to ~7.7 million).
Support for https protocol when downloading FTP files and adding GitHub Token to increase rate limit in Ensembl VEP install script.
Allele frequency from NIH AllOfUs study is now available in the web Ensembl VEP.
Plugin support added to REST for:

Paralogues

Plugin data version updated:

dbNSFP (from 4.7c to 4.9c)
LOEUF (from gnomAD v2.1.1 to gnomAD v4.1)

Plugin deprecated:

DisGeNET
Mastermind (Only from REST)

New in version 113 (October 2024)

gnomAD frequency data updated to v4.1 for both genomes and exomes.
Support for GENCODE primary transcript set added. See, --gencode_primary and --flag_gencode_primary.
Support added for --mane, --mane_select, and --canonical when GFF/GTF file used as annotation source.
Nextflow Ensembl VEP now supports other input data formats besides VCF. For supported formats see - Data formats.
Plugin support added to REST and Web for:

Plugin support added to Web for:

Paralogues

Plugin support added to REST for:

LOEUF

Plugin data version updated for CADD (v1.6 to v1.7) and dbNSFP (4.5c to 4.7c).

New in version 112 (May 2024)

Enhanced Structural Variant Support:

Added support for CNV:TR
Enabled the use of chromosome synonyms in breakends
Report consequences for each breakend and enable the input of single breakends

New plugins (supported on CLI, Web and REST):

AlphaMissense - annotates missense variants with the pre-computed AlphaMissense pathogenicity scores. AlphaMissense is a deep learning model developed by Google DeepMind that predicts the pathogenicity of single nucleotide missense variants.

New plugins (supported on CLI and Web):

RiboseqORFs - uses a standardized catalog of human Ribo-seq ORFs to re-calculate consequences for variants located in these translated regions

New plugins (supported on CLI):

Paralogues - fetches variants overlapping the genomic coordinates of amino acids aligned between paralogue proteins
AVADA - Automatic VAriant evidence DAtabase is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates
GeneBe - A plugin kindly contributed by the GeneBe team, it retrieves automatic ACMG variant classification data from https://genebe.net/
PhenotypeOrthologous A VEP plugin that retrieves phenotype information associated with orthologous genes from model organisms

Plugin support added to REST and Web for:

CADD_SV
CADD scores for Sus scrofa
Dosage Sensitivity
Enformer

New in version 111 (January 2024)

New option --individual_zyg returns a single list of individuals and their zygosity (instead of a separate line of output for each individual and variant combination like in --individual)
Custom annotation has been improved with the following options:

num_records to limit the number of matching records (50 by default)
summary_stats to calculate summary statistics (min, mean, max, count, sum) using annotation scores (not used by default)

New plugin (supported on CLI, REST and web):

OpenTargets - adds locus-to-gene (L2G) scores to predict causal genes at GWAS loci from Open Targets Genetics

New plugin (supported on CLI and REST):

Enformer - adds pre-calculated predictions of variant impact on gene expression

New plugins (supported on CLI):

BayesDel - adds a deleteriousness meta-score combining multiple deleteriousness predictors
DeNovo - identifies de novo variants in a VCF file. This plugin requires a pedigree (.ped) file
SpliceVault - predicts exon-skipping events and activated cryptic splice sites based on the most common mis-splicing events around a splice site
DosageSensitivity - annotates the likelihood of a gene being haploinsufficient or triplosensitive
VARITY - adds pre-calculated pathogenicity scores of rare human missense variants

New in version 110 (July 2023)

New plugins (supported on CLI):

TranscriptAnnotator - an Ensembl VEP plugin that annotates variant-transcript pairs

New Plugins (supported on CLI, REST and web):

Geno2MP - adds information from Geno2MP, a web-accessible database of rare variant genotypes linked to phenotypic information
MaveDB - adds information from MaveDB, a database that holds experimentally determined measures of variant effect

New in version 109 (February 2023)

Ensembl VEP Docker image now includes all Ensembl VEP plugins
New plugin (supported on CLI):

GWAS - reports genome-wide association study data from GWAS catalog

Plugins now available in REST and web:

UTRAnnotator - annotates the effect of 5' UTR variant especially for variant creating/disrupting upstream ORFs

Plugins now available in REST:

NMD - predicts if a variant allows transcript to escape nonsense-mediated mRNA decay based on certain rules

Plugin LOEUF replaces Loftool in the web with more recent ‘loss-of-function’ score for variants
Deprecated Plugins:

miRNA - this plugin was fully deprecated in favour of --mirna flag (in web and REST)
ExAC - this plugin was deprecated given that Ensembl VEP cache includes ExAC data as part of gnomAD

SIFT version has been updated from 5.2.2 to 6.2.1 (except for human GRCh37)
PolyPhen-2 version has been updated from 2.2.2 to 2.2.3 (except for human GRCh37)

New in version 108 (October 2022)

New plugin (supported on CLI, REST, and web):

mutfunc - predicts destabilization of protein structure, interaction and others features by a variant (GRCh38 only)

Plugin feature extension:

IntAct - 4 new species are now supported - rat, chicken (red jungle fowl), yeast, and arabidopsis

New in version 107 (July 2022)

New plugin (supported on CLI, REST, and web):

EVE - annotates human variants using EVA classification method based solely on evolutionary sequences (GRCh38 only)

Plugins now available in REST and web (already available in CLI):

GO - retrieves Gene Ontology terms associated with transcripts/translations
IntAct - annotates human variants which fall in interaction sites, as described in the IntAct database

Plugins now available in web (already available in CLI):

NMD - predicts if a stop_gained variant allows transcript to escape nonsense-mediated mRNA decay based on certain rules

Readthrough transcripts are now removed from cache
Transcripts of biotype ‘artifact’ which are artifactual duplication are now removed from cache and not accessible using database
gnomaAD allele frequencies are now available for exomes and genomes separately through —af_gnomade and —af_gnomadg options respectively. The —af_gnomad option have same function as --af_gnomade.

New in version 106 (April 2022)

New plugins for command line use:

IntAct - annotates human variants which fall in interaction sites, as described in the IntAct database
CAPICE - integrates scored from a machine-learning-based method for prioritizing pathogenic variants (GRCh37 only)

Nextflow pipeline:

A new configurable pipeline is available to run Ensembl VEP efficiently on large scale VCF

New in version 105 (December 2021)

3 new Sequence Ontology terms are reported for more detailed splice consequence annotation

splice_donor_5th_base_variant (SO:0001787)
splice_donor_region_variant (SO:0002170)
splice_polypyrimidine_tract_variant (SO:0002169)

New plugins

ClinPred - adds pre-calculated scores from ClinPred which helps identify disease-relevant missense variants
NMD - predicts whether a stop-gained variant will allow a transcript to escape nonsense-mediated decay

Condel scores are no longer available via the Ensembl VEP web interface as they have not been updated since 2014 and newer scores like CADD and REVEL are available

New in version 104 (May 2021)

Human GRCh37 cache files now include dbSNP 154!
--var_synonyms output structure has been altered when used with --json

Ensembl VEP Plugins:

dbNSFP - now supports matching by peptides
SpliceAI - now compares gene symbols to improve score accuracy

New in version 103 (February 2021)

New: Variant Recoder is now available as a web tool
Variant Recoder output is now allele specific

Web Ensembl VEP Options:

Variant Synonyms are now available through the web interface
MasterMind results are available through the REST and web interfaces

Ensembl VEP Options:

--mane : Now provides additional MANE Plus Clinical annotations alongside MANE Select
--mane_select : Returns MANE Select annotations

New in version 102 (November 2020)

Ensembl VEP options:
- --uniprot: Now we report precise Ensembl translation to UniProt isoform mappings.
- --spdi - new: Add genomic SPDI notation.
Web Ensembl VEP options:
- Shifting variants in the 3' direction with --shift_3prime and --shift_genomic is now supported through the web interface.
- SpliceAI - new: SpliceAI pre-calculated scores are available through the web interface.
Ensembl VEP filter options:
- --soft_filter - new: Option to only flag the failing variation in the FILTER column and keep the entries in the output VCF file.

New in version version 101 (August 2020)

New options:
- --var_synonyms: Report known synonyms for colocated variants. Must be used with --cache.
Ensembl VEP plugins:
- neXtProt - new: neXtProt retrieves comprehensive human-centric protein-related data for missense variants

New in version 100 (April 2020)

Human GRCh37 variant and phenotype data has been updated with multiple data sets including dbSNP153, ClinVar's 201912 release and COSMIC release 90
The GRCh37 RefSeq transcript set has been updated to NCBI's 1st November 2019 release (initially annotated on GCF_000001405.25)!
New options:
- --shift_3prime: Right aligns all variants relative to their associated transcripts prior to consequence calculation
- --shift_genomic: Right aligns all variants, including intergenic variants, before consequence calculation and updates the Location field
Ensembl VEP plugins:
- SpliceAI - new: SpliceAI is a deep neural network, developed by Illumina, Inc that predicts splice junctions from an arbitrary pre-mRNA transcript sequence.

New in version 99 (January 2020)

Human GRCh38 cache files now contain variants from dbSNP153
New options have been added to REST:
- vcf_string: Ensembl VEP can now provide a VCF-like string representing the input variant
- transcript_version: Add version numbers to Ensembl transcript identifiers
- SpliceRegion: Provides granular predictions of splicing effects (Details)
- LoF: LOFTEE implements a set of filters to predict LoF (loss-of-function) variants. (Details)

New in version 98 (September 2019)

Human GRCh38 cache files now contain variants from dbSNP152
This employs a new clustering strategy which may result in different rsIDs being reported as known variants for some insertions and deletions - for more information see here
--clin_sig_allele has been updated to be used by default
New options:
- --custom_multi_allelic: prevents Ensembl VEP from assuming that comma separated lists in custom annotations are allele specific
MANE attributes are now included within Ensembl VEP cache files, web Ensembl VEP and REST
Ensembl VEP plugins:
- satMutMPRA - new: measures variant effects on gene RNA expression for 21 regulatory elements
Ensembl VEP Installer:

HTSLib v1.9 is now installed by default (previously v1.3.2)
Bio::DB::HTS v2.11 is now installed by default (previously v2.9)
New option 'PLUGINSDIR' allows you to specify the installation directory for plugins

New in version 97 (July 2019)

Allele-specific clinical significance reported (it was previously variant-specific).
New options:
- --clin_sig_allele: report allele specific clinical significance.
- --mane: report if a transcript is the MANE Select.
- --max_sv_size: extend the maximum Structural Variant size Ensembl VEP can process.
- --no_check_variants_order: permit the use of unsorted input files (WARNING - this is slow and requires more memory).
- --overlaps: report the proportion and length of a transcript overlapped by a structural variant in VCF format.
Include the --mane option into the --everything group option.
Update --pick and --pick_order to support MANE Select transcripts.
Check if the input variants are ordered: non ordered variants slow down Ensembl VEP and require more memory.
Skip annotation of complex and long structural variants and display a warning message.
Variant recoder: add an option --vcf_string to return results in VCF format.
Ensembl VEP plugins:
- FunMotifs - new: provide information about overlapping tissue-specific transcription factor motifs.
- Mastermind - new: reports variants that have clinical evidence cited in the medical literature.
- StructuralVariantOverlap - new: provide information from overlapping structural variants.
- G2P - update: now the plugin can be run offline.
- Phenotypes - update: change the format of the data file (from BED to GVF).
Ensembl VEP web tool: the transcript identifiers are now returned with versions unless otherwise specified.
Ensembl VEP installer: tabix-indexed variant cache files are now installed by default.

New in version 96 (April 2019)

Add SPDI format for Ensembl VEP (input) and Variant Recoder (input and output).
Update Ensembl VEP cache with gnomAD 2.1 (human).
Update the Docker Ensembl VEP base image to Ubuntu 18.04.
Retire deprecated flags: --gmaf, --maf_1kg, --maf_esp, --maf_exac, --check_alleles, --html, --gvf.
Retire legacy code about the pileup input format, which is no longer supported.
Deprecate the installation flag "--VERSION"
Force numbers to be encoded as numbers in JSON output
Ensembl VEP plugins:
- NearestExonJB - new: find the nearest exon junction boundary to a coding sequence variant.
- Conservation - update: can use BigWig files instead of the Ensembl Compara database.
- dbNSFP - update: support of the dbNSFP data version 4.
- Phenotypes - update: possibility to report the phenotype description(s) and other information.
- PostGAP - update: replace the plugin name POSTGAP to PostGAP.

New in version 95 (January 2019)

The Ensembl VEP parser is now more permissive for the GFF files (ID attribute only required for genes and transcripts)
Add new option --show_ref_allele to include the allele reference in the VEP default output and the tab output formats
Add a warning message when the Ensembl VEP annotations INFO field hasn't been found/recognised in the VCF input file
Ensembl VEP Docker image:
- Reduce the size of the Ensembl VEP Docker image by about 45%.
- Include the Linkage disequilibrium script in the Ensembl VEP Docker image, making possible to run the LD plugin
New Ensembl VEP plugins:

New in version 94 (October 2018)

RefSeq transcript version updated.
Minor updates on the Ensembl VEP web tool interface.
When the input data format is not specified on the command line, Ensembl VEP attempts to detect it. The assumed format is now reported in verbose mode (--verbose).
Ensembl VEP assigns assigned the consequence types TF_binding_site_variant, TFBS_ablation, TFBS_fusion, TFBS_amplification and TFBS_translocation to human and mouse variants which overlapped motif features. These annotations will not be available in VEP caches for human in release 94 so must be added as a custom annotation.

New in version 93 (July 2018)

Update the JSON output format (allele frequencies) for the Ensembl REST - Ensembl VEP endpoints. See more information.
The new Ensembl release brings more frequency data from gnomAD.
Add the possibility to print the content of the FILTER column (from the VCF custom annotation files) in the output.
Include the Ensembl/ensembl-xs repository in Docker image to speed up the Ensembl VEP container.
Add a new consequence 'extended_intronic_splice_region_variant' in the SpliceRegion Ensembl VEP plugin.

New in version 92 (April 2018)

New Ensembl VEP plugin REVEL (see REVEL plugin).
Get ambiguity code with --ambiguity.
GFF/GTF files with exons assigned to multiple transcripts are now supported.
Improved 1000 Genomes Project frequencies.

New in version 91 (December 2017)

New input format "region" allows REST-style input to Ensembl VEP.
Replace your input variant reference allele with the correct one from the genome with --lookup_ref.
Add version numbers to Ensembl transcripts with --transcript_version.

New in version 90 (August 2017)

gnomAD exomes allele frequencies now available with --af_gnomad, replacing ExAC. gnomAD genomes and ExAC are available via custom annotation.
Ensembl VEP is now available as a Docker image.
RefSeq transcripts in Ensembl VEP cache files are now "corrected" from the reference genome sequence.
Ensembl VEP's algorithm for matching colocated known variants has been overhauled - details.
Change Ensembl VEP's default (5kb) up/downstream distance with --distance. This supercedes the functionality of the UpDownDistance Ensembl VEP plugin.
Feed input directly to Ensembl VEP with --input_data.
Suppress header output with --no_headers.
Detailed installation instructions for Bio::DB::BigFile to access bigWig custom annotation files.

New in version 89 (May 2017)

exclude known variants with unknown (null) alleles with --exclude_null_alleles.
write compressed output with --compress_output.
improved matching of alleles in custom VCF files.
API perldoc documentation added.

New in version 88 (March 2017)

ensembl-vep is now the officially supported version of Ensembl VEP
Documentation updated to reflect switch to ensembl-vep. See the Ensembl archive site for documentation of the obsolete ensembl-tools Ensembl VEP.
The Ensembl VEP script is now named simply vep (formerly variant_effect_predictor.pl or vep.pl)
Directly use tabix-indexed GFF/GTF files as annotation sources
Allele-specific reporting of frequencies (--af and more) and custom VCF annotations
--check_existing now compares alleles by default, disable with --no_check_alleles
Report the highest allele frequency observed in any population from 1000 genomes, ESP or ExAC using --max_af
Get genomic HGVS nomenclature with --hgvsg
Find the gene or transcript with the nearest transcription start site (TSS) to each input variant with --nearest
filter_vep supports field/field comparisons e.g. AFR_AF > #EUR_AF
Exclude predicted (XM and XR) transcripts when using RefSeq or merged cache with --exclude_predicted
Filter transcripts used for annotation with --transcript_filter
pileup input format no longer supported

Older versions (ensembl-tools) - until version 87: Show

Versions of Ensembl VEP up to and including 87 were released as part of the ensembl-tools package. See download links above.

New in version 87 (December 2016)

Shiny new code available for beta testing!
Some minor speed optimisations
Improve checks for valid chromosome names in input
Haplosaurus beta released - generate whole-transcript haplotype sequences from phased genotype data

New in version 86 (October 2016)

Chromosome synonyms supported when using Ensembl VEP caches; may be loaded manually with --synonyms

New in version 85 (July 2016)

--pick now uses translated length instead of genomic transcript length
Support for epigenomes in regulatory features

New in version 84 (March 2016)

Add tab-delimited output option
Add transcript flags indicating if the transcript is 5'- or 3'-incomplete
Improve annotation of long variants where invariant parts of the alternate allele overlap splice regions

New in version 83 (December 2015)

Speed:
- Basic consequence calculations up to 2x faster than version 82
- HGVS calculations up to 10x faster
- FASTA sequence retrieval implements caching
Add ExAC project frequencies with --af_exac
APPRIS isoform annotations now available with --appris and used by --pick and others to prioritise VEP annotations

New in version 82 (September 2015)

Faster FASTA file access using Bio::DB::HTS/htslib and bgzipped FASTA files
Flag genes with phenotype associations
Some plugins now available for use via the web and REST interfaces

New in version 81 (July 2015)

Plugin registry means plugins can be installed from the Ensembl VEP installer
GFF format now supported by Ensembl VEP's cache converter
Fixes and improvements for sequence retrieval from FASTA files

New in version 80 (May 2015)

Flag added indicating if an overlapping known variant is associated with a phenotype, disease or trait
HGVS notations are now 3'-shifted by default (use --shift_hgvs to force enable/disable)
Source version information added to caches; see output file headers or use --show_cache_info
Get the variant class using --variant_class
CCDS status added to categories used by --pick flag (and others)

New in version 79 (March 2015)

Focus on performance and stability: ~100% faster than version 78 and a new test suite
New guide to getting Ensembl VEP running faster
1000 Genomes Phase 3 data available in GRCh37 cache download (GRCh38 coming soon, see docs to access now)
VCF output has changed slightly to match output from other tools
Impact modifier added for each consequence type

New in version 78 (December 2014)

Customise --pick using --pick_order
Get transcript support level using --tsl

New in version 77 (October 2014)

Get the SO feature type of regulatory features using --regulatory and --biotype

New in version 76 (August 2014)

Ensembl VEP now supports caches from multiple assemblies (--assembly) on the same software version - e.g. human builds GRCh37 and GRCh38
Protein identifiers from UniProt (SWISSPROT, TrEMBL and UniParc) now available using --uniprot
Ensembl VEP can generate JSON output using --json
Two new analysis set options - --gencode_basic and the merged Ensembl/RefSeq cache (--merged)
Non-RefSeq transcripts now excluded by default when using the RefSeq or merged cache; use --all_refseq to include them
Let Ensembl VEP pick one consequence per variant allele using --pick_allele
Allele now included alongside frequency for 1000 Genomes (--af_1kg) and ESP (--af_esp) data
Not strictly script-related, but the Ensembl VEP REST API has come out of beta!

New in version 75 (February 2014)

let Ensembl VEP pick one consequence per variant for you using --pick; includes all transcript-specific data
gene symbol available in RefSeq cache and when using --refseq
Installation and use of RefSeq cache improved - remember to use --refseq with your RefSeq cache!
Added --cache_version option, primarily to aid Ensembl Genomes users.

New in version 74 (December 2013)

retrieve the humDiv PolyPhen prediction instead of humVar using --humdiv
source for gene symbol available with --symbol

New in version 73 (August 2013)

NHLBI-ESP frequencies available in cache (--af_esp)
Pubmed IDs for cited existing variants available in cache (--pubmed)
Convert your cache to use tabix - much faster when retrieving co-located existing variants!
The installer can now update the Ensembl VEP to the latest version and install FASTA files
--hgnc replaced by --symbol for non-human compatibility
HGVS strings are now part URI-escaped to avoid "=" sign clashes
use --allele_number to identify input alleles by their order in the VCF ALT field
use --total_length to give the total length of cDNA, CDS and protein sequences
add data from VCF INFO fields when using custom annotations

New in version 72 (June 2013)

Speed and stability improvements when using forking
Filter Ensembl VEP results using filter_vep.pl

New in version 71 (April 2013)

SIFT predictions now available for Chicken, Cow, Dog, Human, Mouse, Pig, Rat and Zebrafish
View summary statistics for Ensembl VEP runs in [output]_summary.html
Generate HTML output using --html
Support for simple tab-delimited format for input of structural variant data
Cache now contains clinical significance statuses from dbSNP for human variants
NOTE: Ensembl VEP version numbers have now (from release 71) changed to match Ensembl release numbers.

New in version 2.8 (December 2012)

Easily filter out common human variants with --filter_common
1000 Genomes continental population frequencies now stored in cache files

New in version 2.7 (October 2012)

build Ensembl VEP cache files offline from GTF and FASTA files
support for using FASTA files for sequence lookup in HGVS notations in offline/cache modes

New in version 2.6 (July 2012)

support for structural variant consequences
Sequence Ontology (SO) consequence terms now default
script runtime 3-4x faster when using forking
1000 Genomes global MAF available in cache files
improved memory usage

New in version 2.5 (May 2012)

SIFT and PolyPhen predictions now available for RefSeq transcripts
retrieve cell type-specific regulatory consequences
consequences can be retrieved based on a single individual's genotype in a VCF input file
find overlapping structural variants
Condel support removed from main script and moved to a plugin

New in version 2.4 (February 2012)

offline mode and new installer script make it easy to use the Ensembl VEP without the usual dependencies
output columns configurable using the --fields flag
VCF output support expanded, can now carry all fields
output affected exon and intron numbers with --numbers
output overlapping protein domains using --domains
enhanced support for LRGs
plugins now work on variants called as intergenic

New in version 2.3 (December 2011)

add custom annotations from tabix-indexed files (BED, GFF, GTF, VCF, bigWig)
add new functionality to the Ensembl VEP with user-written plugins
filter input on consequence type

New in version 2.2 (September 2011)

SIFT, PolyPhen and Condel predictions and regulatory features now accessible from the cache
support for calling consequences against RefSeq transcripts
variant identifiers (e.g. dbSNP rsIDs) and HGVS notations supported as input format
variants can now be filtered by frequency in HapMap and 1000 genomes populations
script can be used to convert files between formats (Ensembl/VCF/Pileup/HGVS to Ensembl/VCF/Pileup)
large amount of code moved to API modules to ensure consistency between web and script Ensembl VEP
memory usage optimisations
Ensembl VEP script moved to ensembl-tools repo
Added --canonical, --per_gene and --no_intergenic options

New in version 2.1 (June 2011)

ability to use local file cache in place of or alongside connecting to an Ensembl database
significant improvements to speed of script
whole-genome mode now default (no disadvantage for smaller datasets)
improved status output with progress bars
regulatory region consequences now reinstated and improved
modification to output file - Transcript column is now Feature, and is followed by a Feature_type column

New in version 2.0 (April 2011)

support for SIFT, PolyPhen and Condel missense predictions in human
per-allele and compound consequence types
support for Sequence Ontology (SO) and NCBI consequence terms
modified output format
- support for new output fields in Extra column
- header section contains information on database and software versions
- codon change shown in output
- CDS position shown in output
- option to output Ensembl protein identifiers
- option to output HGVS nomenclature for variants
support for gzipped input files
enhanced configuration options, including the ability to read configuration from a file
verbose output now much more useful
whole-genome mode now more stable
finding existing co-located variations now ~5x faster

Requirements

Ensembl VEP requires:

gcc, g++ and make
Perl version 5.22 or above recommended (tested on 5.22, 5.26, 5.32, 5.34, 5.38)
Perl packages:
- Archive::Zip
- DBD::mysql (version <=4.050)
- DBI
See this guide for more information on how to install perl modules.
Additional libraries can be installed for extra features and enhancements but they are not required to run Ensembl VEP in most of the use cases.

Ensembl VEP's INSTALL.pl script will install required components of Ensembl API for you, but Ensembl VEP may also be used with any pre-existing API installations you have, provided their versions match the version of VEP you are using.

Ensembl VEP is available in the following platforms:

Linux (e.g., Ubuntu, Debian, Mint)
macOS
Windows (requires a more involved installation process)

Ensembl VEP is also available as Docker and Singularity images, allowing to skip the complex installation steps.

Installation

Ensembl VEP's INSTALL.pl makes it easy to set up your environment for using the Ensembl VEP. It will download and configure a minimal set of the Ensembl API for use by the Ensembl VEP, and can also download cache files, FASTA files and plugins.

Run the following, and follow any prompts as they appear:

perl INSTALL.pl

Additional non-essential components and enhancements must be installed manually.

Software components installed

If you already have the latest version of the API installed you do not need to run the installer, although it can be used to simply update your API version (with post-release patches applied), and retrieve cache and FASTA files. The installer downloads the API within the Ensembl VEP directory and will not affect any other Ensembl API installations.

The script will also attempt to install a Perl::XS module, Bio::DB::HTS, for rapid access to bgzipped FASTA files. If this fails, you may add the --NO_HTSLIB flag when running the installer; Ensembl VEP will fall back to using Bio::DB::Fasta for this functionality (more details).

Running the installer

The installer is run on the command line as follows:

 perl INSTALL.pl [options]

Follow on-screen prompts and note warnings of any files which will be deleted/overwritten

You should not need to add any options, but configuration of the installer is possible with the flags below. Options can also be set by exporting environment variables prefixed with VEP_ before running the installer (for instance, export VEP_NO_HTSLIB=1 and export VEP_DIR_PLUGINS="/plugins").

Flag	Alternate	Description
--ASSEMBLY	-y	Assembly version to use when using `--AUTO`. Most species have only one assembly available on each software release; currently this is only required for human on release 76 onwards.
--AUTO	-a	Run installer without prompts. Use the following options to specify parts to install: a (API + Bio::DB::HTS/htslib) l (Bio::DB::HTS/htslib only) c (cache) f (FASTA) p (plugins) — Require the use of the --PLUGINS flag to list the plugin(s) to install. e.g. for API and cache: perl INSTALL.pl --AUTO ac
--CACHE_VERSION [version]		By default the installer will download the latest version of Ensembl VEP caches and FASTA files (currently 115). You can force the script to install a different version, but there is no guarantee that a version of the API will be compatible with a different version of the cache.
--CACHEDIR [dir]	-c	By default the script will install the cache files in the ".vep" subdirectory in your home area. This option configures where cache files are installed. The --dir_cache flag must be passed when running Ensembl VEP if a non-default cache directory is given: ./vep --dir_cache [dir]
--DESTDIR [dir]	-d	By default the script will install the API modules in a subdirectory of the current directory named "Bio". Using this option you can configure where the Bio directory is created. If something other than the default is used, this directory must either be added to your PERL5LIB environment variable when running Ensembl VEP, or included using perl's -I flag: perl -I [dir] vep
--NO_HTSLIB	-l	Don't attempt to install Bio::DB::HTS/htslib
--NO_TEST		Don't run API tests - useful if you know a harmless failure will prevent continuation of the installer
--NO_UPDATE	-n	By default the script will check for new versions or updates of Ensembl VEP. Using this option will skip this check.
--PLUGINS	-g	Comma-separated list of plugins to install when using `--AUTO`. To install all available plugins, use `--PLUGINS all`. # List the available plugins: perl INSTALL.pl -a p --PLUGINS list # Download/install all the available plugins: perl INSTALL.pl -a p --PLUGINS all # Download/install a defined list of plugins, e.g.: perl INSTALL.pl -a p --PLUGINS dbNSFP,CADD,G2P
--PLUGINSDIR [dir]	-r	By default the script will install the plugins files in the "Plugins" subdirectory of the `--CACHEDIR` directory. This option configures where the plugins files are installed. The --dir_plugins flag must be passed when running Ensembl VEP if a non-default plugins directory is given: ./vep --dir_plugins [dir]
--PREFER_BIN	-p	Use this if the installer fails with "out of memory" errors.
--SPECIES	-s	Comma-separated list of species to install when using `--AUTO`. To install the RefSeq cache, add "_refseq" to the species name, e.g. "homo_sapiens_refseq", or "_merged" to install the merged Ensembl/RefSeq cache. Remember to use --refseq or --merged when running the VEP with the relevant cache! Use `all` to install data for all available species.
--USE_HTTPS_PROTO		Download cache and FASTA file using HTTPs protocol instead of FTP. Useful for networks where FTP port is blocked by firewall.
--GITHUBTOKEN		Set token to use for authentication when querying GitHub API. Authenticated user have increased rate-limit. NOTE: use token with read-only access.
--QUIET	-q	Don't write any status output when using `--AUTO`.

Additional components

INSTALL.pl will set up the minimum requirements Ensembl VEP. Some features and enhancements, however, require the installation of additional components. Most are perl modules that are easily installed using cpanm; see this guide for more information on how to install perl modules.

Typically, you will use cpanm to install modules locally in your home directories; this shows how to set up a path for perl modules and install one there:

mkdir -p $HOME/cpanm
export PERL5LIB=$PERL5LIB:$HOME/cpanm/lib/perl5
cpanm -l $HOME/cpanm Set::IntervalTree

To make the change to PERL5LIB permanent, it is recommended to add the export line to your $HOME/.bashrc or $HOME/.profile.

Additional features
- JSON - required to produce JSON format output
- Set::IntervalTree - used to find overlaps between entities in coordinate space. Required to use --nearest
- Bio::DB::BigFile - required to use bigWig format custom annotation files. See Bio::DB::BigFile instructions.
Speed enhancements - these modules can improve Ensembl VEP runtime
- PerlIO::gzip - marginal gains in compressed file parsing as used by the Ensembl VEP cache
- ensembl-xs - provides pre-compiled replacements for frequently used routines in Ensembl VEP. Requires manual installation, see README for details

Bio::DB::BigFile

In order for Ensembl VEP to be able to access bigWig format custom annotation files, the Bio::DB::BigFile perl module is required. Installation involves downloading and compiling the kent source tree. The current version of the kent source tree does not work correctly with Bio::DB::BigFile, so it is necessary to install an archive version known to work (v335).

Download and unpack the kent source tree

wget https://github.com/ucscGenomeBrowser/kent/archive/v335_base.tar.gz
tar xzf v335_base.tar.gz

Set up some environment variables; these are required only temporarily for this installation process

export KENT_SRC=$PWD/kent-335_base/src
export MACHTYPE=$(uname -m)
export CFLAGS="-fPIC"
export MYSQLINC=`mysql_config --include | sed -e 's/^-I//g'`
export MYSQLLIBS=`mysql_config --libs`

Modify kent build parameters

cd $KENT_SRC/lib
echo 'CFLAGS="-fPIC"' > ../inc/localEnvironment.mk

Build kent source
```
make clean && make
cd ../jkOwnLib
make clean && make
```
If either of these steps fail, you may have some missing dependencies. Known common missing dependencies are libpng and libssl; these may be installed, for example, with apt-get on Ubuntu. If you do not have sudo access you may have to ask your sysadmin to install any missing dependencies.
```
sudo apt-get install libpng-dev libssl-dev
```
On macOS you may use brew; the openssl libraries also need to be symbolically linked to a different path:
```
brew install libpng openssl
cd /usr/local/include
ln -s ../opt/openssl/include/openssl .
cd -
```
On some systems (e.g. macOS), a compiled file is placed in a path that Bio::DB::BigFile cannot find. You can correct this with:
```
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/
```
We'll now use cpanm to install the perl module for Bio::DB::BigFile itself. See above for guidance on this. In this example we're going to install the module to a path within your home directory. In order to do this we must modify the paths that perl looks in to find modules by adding to the PERL5LIB environment module. To make this change permanent you must add the export line to your $HOME/.bashrc or $HOME/.profile.
```
mkdir -p $HOME/cpanm
export PERL5LIB=$PERL5LIB:$HOME/cpanm/lib/perl5
cpanm -l $HOME/cpanm Bio::DB::BigFile
```
If you are prompted for the path to the kent source tree, that means something didn't go right in the compilation above. Double check that $KENT_SRC/lib/jkweb.a exists and is not found instead at e.g. $KENT_SRC/lib/x86_64/jkweb.a. You may copy or link the file (and the other files in that directory) to the former path.
```
ln -s $KENT_SRC/lib/x86_64/* $KENT_SRC/lib/
```
You should now be able to successfully run the appropriate test in the Ensembl VEP package:
```
perl -Imodules t/AnnotationSource_File_BigWig.t
```

Using Ensembl VEP in macOS

Installing Ensembl VEP on macOS is slightly trickier than other Linux-based systems, and will require additional dependancies.
These instructions will guide you through the setup of Perlbrew, Homebrew, MySQL and other dependancies that will allow for a clean installation of Ensembl VEP on your macOS system.

These instructions have been tested on macOS High Sierra (10.13) and macOS Sierra (10.12).
Older versions may require additional tweaks, however we shall endeavouXcoder to keep these instructions up to date for future versions of MacOS.

Installation issues with M-series Macs

We advise using the Docker or Singularity images for Ensembl VEP if you are having issues installing Ensembl VEP in Apple Silicon (M1 upwards) Macs.

Prerequisite Setup

List of prerequisites: Xcode, GCC, Perlbrew, Cpanm, Homebrew, mysql, DBI, DBD::mysql (version <=4.050)

Xcode and GCC

Ensembl VEP requires Xcode and GCC for installation purposes. Fortunately, recent versions of macOS will look for (and attempt to install if required) both of these when you run the following command:

gcc -v

Perlbrew

We recommend using Perlbrew to install a new version of Perl on your mac, to prevent messing with the vendor perl too much. This can be done with the following command:

curl -L http://install.perlbrew.pl | bash

echo 'source $HOME/perl5/perlbrew/etc/bashrc' >> ~/.bash_profile

At this point, PLEASE RESTART YOUR TERMINAL WINDOW to allow for the perlbrew changes to take effect.

We recommend installing Perl version 5.26.2 to run Ensembl VEP, and installing cpanm to handle the installation of perl modules.
These steps can be completed with the commands:

perlbrew install -j 5 --as 5.26.2 --thread --64all -Duseshrplib perl-5.26.2 --notest
perlbrew switch 5.26.2
perlbrew install-cpanm

Homebrew

This package management system for macOS would make the installation of the next prerequisite (i.e. xs) easier.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

xz

Ensembl VEP requires the installation of xz, a data-compression utility. The easiest way to install the xz package is through homebrew:

brew install xz

MySQL

In order to connect to the Ensembl databases, a collection of MySQL related dependancies are required. Fortunately, these can be installed neatly with Homebrew and Cpanm:

brew install mysql
cpanm DBI
cpanm DBD::mysql@4.050

Installing BioPerl

On some versions of macOS, the Ensembl VEP installer fails to cleanly install BioPerl, so a manual install will prevent issues:

curl -O https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.924.tar.gz
tar zxvf BioPerl-1.6.924.tar.gz
echo 'export PERL5LIB=${PERL5LIB}:##PATH_TO##/bioperl-1.6.924' >> ~/.bash_profile

where ##PATH_TO##/bioperl-1.6.924 refers to the location of the newly unzipped BioPerl directory.

Final Dependancies

Installing the following Perl modules with cpanm will allow for full Ensembl VEP functionality:

cpanm Test::Differences Test::Exception Test::Perl::Critic Archive::Zip PadWalker Error Devel::Cycle Role::Tiny::With Module::Build LWP List::MoreUtils

export DYLD_LIBRARY_PATH=/usr/local/mysql/lib/:$DYLD_LIBRARY_PATH

Installing Ensembl VEP

And that should be that! You should now be able to install Ensembl VEP using the installer:

git clone https://github.com/ensembl/ensembl-vep
cd ensembl-vep
perl INSTALL.pl --NO_TEST

Using Ensembl VEP in Windows

Ensembl VEP was developed as a command-line tool, and as a Perl script its natural environment is a Linux system. However, there are several ways you can use Ensembl VEP on a Windows machine.

You may also consider using Ensembl VEP's web or REST interfaces.

Virtual machines

Using a virtual machine you can run a virtual Linux system in a window on your machine. There are two ways to do this:

Use the Ensembl virtual machine image
Use Docker

Perl

If Perl is installed on Windows, Ensembl VEP can be setup. However this may require installation of dependent modules. We recommend using Docker to run Ensembl VEP on Windows.

Check Perl is installed
Download and unpack the zip of the ensembl-vep package
Open a Command Prompt (search for Command Prompt in the Start Menu)
Navigate to the directory where you unpacked the Ensembl VEP package, e.g.
```
cd Downloads/ensembl-vep-release-115
```
Run INSTALL.pl with --NO_HTSLIB and --NO_TEST; you will see some warnings about the "which" command not being available (these will also appear when running Ensembl VEP and can be ignored).
```
perl INSTALL.pl --NO_HTSLIB --NO_TEST
```

Docker

Docker allows running applications in virtualised containers. The Ensembl VEP Docker image is available from DockerHub: Ensembl VEP in DockerHub

After installing Docker, download the Ensembl VEP Docker image:

docker pull ensemblorg/ensembl-vep

To download cache files and other data with Ensembl VEP Docker, we recommend mounting a directory from your local (host) machine to folder /data from the Docker image. For instance:

mkdir $HOME/vep_data
docker run -t -i -v $HOME/vep_data:/data ensemblorg/ensembl-vep

In the example above, data in $HOME/vep_data will be accessible by both the local machine and Ensembl VEP Docker. The Ensembl VEP API, plugins maintained by Ensembl VEP and their dependencies (e.g. Perl APIs, Bio::DB::HTS, htslib, ...) are already installed in the image.

Read/Write access from the container

In some distributions (e.g. CentOS, Fedora, Red Hat Enterprise Linux) Docker daemon requires root privileges (i.e. needs to prefix the command with sudo), which might cause read/write issues to the mounted volume.

One solution is to use the option :Z within the Docker -v option (only from docker 1.7.0):

sudo docker run -t -i -v $HOME/vep_data:/data:Z ensemblorg/ensembl-vep

An other solution is to change the read/write access of the mounted volume ($HOME/vep_data):

chmod -R a+rwx $HOME/vep_data

Cache and FASTA files installation

You can run the INSTALL.pl script to install the cache and FASTA files:

docker run -t -i -v $HOME/vep_data:/data ensemblorg/ensembl-vep INSTALL.pl

You will be asked to install cache data. Type the comma-separated numbers for the species/assembly of interest and press enter. Your data will download and unpack; this may take a while.
If you wish to retrieve HGVS annotations, please download the FASTA files for your species. To do this, at the next prompt type 0 and press enter.

The above process may also be performed in one command; for example, to set up the cache and corresponding FASTA for human GRCh38:

docker run -t -i -v $HOME/vep_data:/data ensemblorg/ensembl-vep INSTALL.pl -a cf -s homo_sapiens -y GRCh38

The installer downloads Ensembl VEP data to the mounted directory (e.g., $HOME/vep_data). The downloaded data will be automatically detected as long as its folder is mounted when running VEP:

docker run -v $HOME/vep_data:/data ensemblorg/ensembl-vep vep -i examples/homo_sapiens_GRCh38.vcf --cache

Running Ensembl VEP with data from local folder

Here is an example on running Ensembl VEP with data from folder $HOME/vep_data in the local machine (provided that the cache has been downloaded to that folder):

docker run -v $HOME/vep_data:/data ensemblorg/ensembl-vep \
  vep --cache --offline --format vcf --vcf --force_overwrite \
      --input_file input/my_input.vcf \
      --output_file output/my_output.vcf \
      --custom file=custom/my_extra_data.bed,short_name=BED_DATA,format=bed,type=exact,coords=1 \
      --plugin NMD

Please avoid using absolute paths to data as the paths inside the container differ from your local machine.

Update from a previous version

Update your Docker container
```
docker pull ensemblorg/ensembl-vep
```

Update your cache

# Install the new cache through the Ensembl VEP INSTALL.pl script (see "Cache installation" section above)
docker run -t -i -v $HOME/vep_data:/data ensemblorg/ensembl-vep INSTALL.pl -a c

# Or install the cache manually
cd $HOME/vep_data
curl -O http://ftp.ensemblgenomes.org/pub/fungi/release-115/variation/vep/homo_sapiens_vep_115_GRCh38.tar.gz
tar xzf homo_sapiens_vep_115_GRCh38.tar.gz

Singularity

Due to root requirements for the Docker daemon, using the Docker container for Ensembl VEP is not always possible to HPC users. Singularity, an alternative containerisation tool, does not assume that you have a system where you are the root user. This has led to increased popularity in HPC contexts due to increased access rights flexibility.

After installing Singularity, Ensembl VEP may be used with Singularity based on the VEP Docker image from DockerHub:

singularity pull --name vep.sif docker://ensemblorg/ensembl-vep

The following is a brief example showing how to use a directory on your local (host) machine to store cache data for VEP.

mkdir $HOME/vep_data
singularity exec vep.sif vep --dir $HOME/vep_data --help

The Ensembl VEP API, plugins and their dependencies (e.g. Perl APIs, Bio::DB::HTS, htslib, ...) are already installed in the image.

Cache and FASTA files installation

You can run the INSTALL.pl script to install the Cache data and FASTA files. For example, to set up the cache and corresponding FASTA for human GRCh38 in your local folder $HOME/vep_data:

singularity exec vep.sif INSTALL.pl -c $HOME/vep_data -a cf -s homo_sapiens -y GRCh38

The installer downloads data to the specified directory (e.g., $HOME/vep_data). When running Ensembl VEP via Singularity, point to this directory using --dir:

singularity exec vep.sif vep --dir $HOME/vep_data -i examples/homo_sapiens_GRCh38.vcf --cache

Running Ensembl VEP with data from local folder

Here is an example on running Ensembl VEP with data from folder $HOME/vep_data in the local machine (provided that the cache has been downloaded to that folder):

singularity exec vep.sif \
  vep --dir $HOME/vep_data \
      --cache --offline --format vcf --vcf --force_overwrite \
      --input_file input/my_input.vcf \
      --output_file output/my_output.vcf \
      --custom file=custom/my_extra_data.bed,short_name=BED_DATA,format=bed,type=exact,coords=1 \
      --plugin NMD

Update from a previous version

Update your docker container

singularity pull --name vep.sif docker://ensemblorg/ensembl-vep

Update your cache

# Install the new cache through the VEP INSTALL.pl script (see "Cache installation" section above)
singularity exec vep.sif INSTALL.pl -c $HOME/vep_data -a c

# Or install the cache manually
cd $HOME/vep_data
curl -O http://ftp.ensemblgenomes.org/pub/fungi/release-115/variation/vep/homo_sapiens_vep_115_GRCh38.tar.gz
tar xzf homo_sapiens_vep_115_GRCh38.tar.gz

Nextflow

We offer a Nextflow Ensembl VEP pipeline that aims to run Ensembl VEP using simple parallelisation. The pipeline is deployable on an individual Linux machine or on computing clusters running LSF, SLURM or other workload managers.

The process can be summarised briefly by the following steps:

Splitting the input data into multiple files using a given number of bins
Running Ensembl VEP on the split files in parallel
Merging Ensembl VEP outputs into a single file

To run the pipeline in a system with Nexflow installed, you will need to prepare a vep.ini config file. Here are some examples commands to run the Nextflow Ensembl VEP pipeline:

# Run Nextflow Ensembl VEP using local Ensembl VEP installation
# NB: Nextflow automatically downloads the GitHub repository
nextflow run Ensembl/ensembl-vep -r main \
  --input input.vcf \
  --vep_config vep.ini

# Run latest Ensembl VEP version using Docker
nextflow run Ensembl/ensembl-vep -r main \
  -profile docker \
  --input input.vcf \
  --vep_config vep.ini

# Run Ensembl VEP 115.0 using Docker
nextflow run Ensembl/ensembl-vep -r main \
  -profile docker \
  --input input.vcf \
  --vep_config vep.ini \
  --vep_version 115.0

# Run Ensembl VEP 115.0 using SLURM and Singularity
nextflow run Ensembl/ensembl-vep -r main \
  -profile slurm,singularity \
  --input input.vcf \
  --vep_config vep.ini \
  --vep_version 115.0

For a full list of supported profiles, as well as more instructions on setting up and running the pipeline, please refer to the Nextflow Ensembl VEP instructions.