How to run PHANS-C

We offer 3 ways you can get and run PHANS-C.

  1. Amazon AWS AMI
  2. Docker Image
  3. Local Installation

Amazon AWS AMI

Start the instance from an AMI

Spin up an EC2 instance using the Amazon AMI PHANS-C v 1.0.2 ami-09bbe25d9d7608ea5 hosted in us-west-2. Recommend using the c4.4xlarge instance type

Will need to ssh into instance using user id “ubuntu” (as opposed to ec2-user, etc.).:

$ ssh -i "your-ec2-key.pem" ubuntu@##.###.###.##

Activate the appropriate conda environment as follows:

$ conda activate phansc
  • AWS users will find the tutorial folder located in the ubuntu user’s home directory.

Docker Image

Start the Docker container from the image:

On a machine with Docker already installed:

First download and expand the PHANS-C_Tutorial Folder, then do this:

$ docker run -v /Local/path/to/work/folder:/Tutorial -it rocaplab/phansc

The '-v' option allows you to designate a directory that is shared between the host machine, and the docker image.

The '-it' option opens the Docker image with an interactive shell.

Initiating Docker this way calls upon the latest Docker Image available for PHANS-C in Docker Hub at rocaplab/phansc.

Any changes inside the container’s /Tutorial directory will instanty be reflected in the host’s /Local/path/to/work/folder directory as well. In this way, a directory can be shared between the host and the container.

  • A note on running from a Windows OS: it is recommended to specify the path in quotes, for example, -v "C:\Local\path\to\work\folder":/Tutorial.

This will drop you into a command prompt inside a centos based container (similar to a virtual machine). You can then navigate to the tutorial folder:

cd /Tutorial

Local Installation

The PHANS-C Pipeline scripts can be found in our Bitbucket repository and have been written and implemented using Python. However, the Pipeline depends on multiple other open source software written in multiple languages with their own dependencies and requirements. the PHANS-C scripts themselves should be installed somehere in your PATH, and made executable, so that they can be called from anywhere. The following are some very brief tips that may help you install dependencies locally.

Build with all dependencies using .yml file

You can build a conda environment using the provided .yml file.

conda env create -f phans-c_environment.yml

Alternatively, you can add packages individually as described below.

Biopython

Biopython can be found through the Biopython website. The PHANS-C Pipeline was developed with version 1.68

There are many ways to install Biopython, including MacPorts, Homebrew, Pip, and Bioconda. We installed ours with Bioconda.

conda install python
conda install biopython

Muscle

The PHANS-C Pipeline was created using Muscle version 3.8.31. The latest version of Muscle as well as older versions can be found at the Muscle website. We installed ours with Bioconda.

conda install muscle

RAxML

RAxML and the RAxML Evolutionary Placement Algorithm are located in the same executable. The PHANS-C Pipeline was created using RAxML version 8.2.8 and has been tested successfully with RAxML version 8.2.11. A typical install of RAxML may install several different binaries. Which is best may depend on your architecture. Try raxmlHPC-PTHREADS-AVX first, as this is the fastest and multi-core capable. If that doesn’t work, experiment with some of the other options. The most recent version of RAxML can be found from from the Exelixis Lab GitHub repository or from the Exelixis Lab homepage. We installed ours with Bioconda.

conda install raxml

proteinModelSelection.pl

A perl script, proteinModelSelection.pl, for identifying the best AA substitution model for tree generation was created by the RAxML author Alexis Stamatakis and can be found at the RAxML site. You may have to replace the shebang (“#!”) line at the top of the script to point to your perl location. You may also have the change the raxml executable name to match the one on your system. Remember to make it executable after you put it into your PATH.

BLAST

We installed BLAST with Bioconda.

conda install blast

PaPaRa

PaPaRa 2 can be found through the Exelixis Lab homepage or through Simon Berger’s GitHub repository. The PHANS-C Pipeline uses PaPaRa version 2.5.

We had a lot of trouble getting PaPaRa working on MacOS High Sierra (10.13). The precompiled version of PaPaRa offered by the Exelixis Lab did not work for us. Compiling manually did not work at first because of boost compatibility problems. Manually installing boost, as well as installing with homebrew and macports failed. We were finally able to get PaPaRa to compile by installing a particular version of boost with conda. This was how we were successful, your mileage may vary.

#Assumes you have Bioconda and an environment setup and activated
conda install boost=1.64
cd /path/to/papara/source/directory
ln -s /path/to/your/conda/installation/your_env/include/boost/ ./boost
sh build_papara2.sh
cp ./papara /path/to/your/bin/dir

The PaPaRa precompiled binary worked our our Ubuntu Linux 18.04 system, but bear in mind it is not compiled for multiple threads. For that, you must compile your own.

Bokeh

Bokeh can be found on the Bokeh website. We installed ours with Bioconda.

conda install bokeh

Pandas

Pandas can be found on the Pandas website. We installed ours with Bioconda.

conda install pandas

Access the tutorial files:

The Tutorial folder is included in the repository.