===================================================== Instructions on how to run Metagenome Orchestra (Orchestra or Mago for short) within a Docker container. Orchestra homepage: http://mago.fe.uni-lj.si ===================================================== ------------------------- IMPORTING ORCHESTRA IMAGE ------------------------- Initially, it is necessary to import Orchestra image into your Docker system by performing the following steps. 1. Download Orchestra image from Orchestra web site http://mago.fe.uni-lj.si/mago_V2_2_docker.gz and save it in a local directory of your choice. The file can be downloaded by your favourite web browser, or by executing the following command within Linux terminal located in the target download directory: wget http://mago.fe.uni-lj.si/mago_V2_2_docker.gz 2. Import downloaded Orchestra image into your Docker system by opening Linux terminal in the directory with the downloaded image, and execute: sudo docker load -i mago_V2_2_docker.gz The process takes a while and during this time your terminal often appears to be frozen. There are also some messages about loading layers (which do not appear immediately). After import is completed, the system prompts with message: Loaded image: mago:2.2 3. Verify that Orchestra has been successfully imported by executing: sudo docker image ls The obtained list of images should contain entry "mago" under the first column REPOSITORY, and "2.2" under the second column TAG. Orchestra is now ready to be executed. Skip to the steps below under sections "RUNNING ORCHESTRA DEMO" or "RUNNING ORCHESTRA PIPELINE" to run it. ============================================================== If the above procedure did not succeed, then probably Docker is not properly installed or configured. 1. Check whether Docker is installed by executing the following command in Linux terminal: docker --version If Docker is present, it responds by printing its version. Otherwise, please search the internet on how to install Docker on your Linux distribution (it is fairly easy). Also make sure that Docker daemon is running by executing the following command in a terminal: ps aux | grep docker which should return at least one line with docker's daemon processes: dockerd and/or docker-containerd. Note that there should be a process with the trailing "d" in its name. There may also appear another line with plain "docker" without a trailing "d"; this line does not count. If the daemon "d" process(es) does/do not get listed, then probably your Docker daemon is not configured to start automatically when the operating system boots. Please refer to internet resources, e.g. the following one-line link: https://docs.docker.com/install/linux/linux-postinstall/ #configure-docker-to-start-on-boot for instructions on how to configure Docker's auto start. Often (but not always) one of the following commands resolves the issue: sudo systemctl enable docker or sudo chkconfig docker on Some Linux distributions also require that users, which handle Docker containers are members of docker group. On many (but not all) Linux distributions a user is added to group "docker" by one of the following commands: sudo usermod -aG docker or sudo useradd -g docker This will succeed only, if your Linux system defines group "docker". Otherwise, this step is probably not needed. After making any of the above changes, it may be necessary to logout/login or even restart the system for the changes to become effective. ============================================================== It is possible that Docker is correctly installed, but image importing terminates with error "no space left on device". There are at least two different reasons for this. 1. Your Linux setup does not have enough disk space on /var or /tmp partitions. Some Linux setups fragment the available disk space into several partitions or subvolumes. In such scenarios of use, it is fairly likely that some of these partitions/subvolumes run out of disk space, although there is plenty of free space available globally on a disk. The available space can be checked by executing: df -h Depending on the system, the output may be fairly long with the majority of entries not relevant for the present discussion. File system "/" is always present. It should have at least 30 GB free space to be on a safe side. If there are separate entries for /var and /tmp partitions, each of these should also have at least 20 GB of free space. In the opposite case, Orchestra image importing (and for that matter any Docker container of a comparable size) cannot be completed without reconfiguring your disk partitioning scheme. 2. The second reason for error "no space left on device" is that your Docker setup reserves too little space for a default new image. This should be fairly unlikely in modern Docker setups. If this is the case, please search the internet for instructions on "How to increase default Docker image size". Please note that some of the solutions may destroy your existing Docker containers, by means of which you will need to re-pull them again from their respective repositories. ------------------------------- RUNNING ORCHESTRA DEMO PIPELINE ------------------------------- To test Orchestra setup and to get the idea about how Orchestra works, we prepared a simple demo setup. It can be downloaded from: http://mago.fe.uni-lj.si/mago_demo_V2_2.zip The file can be downloaded by your favourite web browser, or by executing the following command within Linux terminal located in the target download directory: wget http://mago.fe.uni-lj.si/mago_demo_V2_2.zip Save file to a directory of your preference. In the following it is assumed that this is directory /home/user_name, and consequently the downloaded file is referred to as /home/user_name/mago_demo_V2_2.zip. When following the instructions below, please adjust commands to reflect the actual path on your system. Open Linux terminal in the directory, where the downloaded file is located. Execute command: unzip mago_demo_V2_2.zip This will unpack the contents of the file to directory: /home/user_name/mago_demo_V2_2 Within directory /home/user_name/mago_demo_V2_2 there are four files with DNA sequences (*.fastq), a directory with reference genomes (FastANI_ref), as well as configuration file demo_docker.txt for executing Orchestra within a Docker container (there are also two other configuration files for Orchestra Singularity container and for Orchestra VirtualBox virtual machine). Start Orchestra processing by executing the following one-line(!) command: sudo docker run -it -v /home/user_name/mago_demo_V2_2:/data mago:2.2 /data/demo_docker.txt This command maps physical directory /home/user_name/mago_demo_V2_2 into Docker's internal directory /data. E.g. Docker sees physical file /home/user_name/mago_demo_V2_2/Rsphaeroides.frag_1.fastq as file /data/Rsphaeroides.frag_1.fastq Orchestra should start processing. File demo_docker.txt instructs Orchestra to output results of processing to directory /data/demo_docker_out (as it can be revealed by examining demo_docker.txt with a text editor), which is mapped to physical directory /home/user_name/mago_demo_V2_2/demo_docker_out In this directory, the newly created files should start appearing as a result of Orchestra execution. ----------------------------------- RUNNING YOUR OWN ORCHESTRA PIPELINE ----------------------------------- 1. Create directory at a preferred location, which is typically in your home directory. For example: mkdir /home/user_name/test where "user_name" is your Linux user name. Have in mind that later on this directory will be mapped into the internal Docker's file system at location /data. 2. Copy or move your fastq files into the above created directory. Alternatively, create in it symbolic links to your files. 3. If you already have a scaffold file that is associated with your input reads, and you intend to use it instead of building it as a part of Orchestra processing, then put it in this directory as well (copy, move or symlink). 4. Download config_template.txt from http://mago.fe.uni-lj.si/config_template_V2_2.txt into the above directory. Optionally rename it to something meaningful (e.g. config_test.txt). 5. Set parameters within config_test.txt according to your preferences. Recall that Orchestra will access your files at its internal path /data. Therefore, if your input R1 reads file is /home/user_name/test/test_R1.fastq then Orchestra will access it by filename /data/test_R1.fastq so this is the name that you should specify for it within the configuration file. Similarly, for R2 reads file (as well as a scaffold file, if used). In the same way, set Orchestra's output directory to e.g. /data/out. Files that Orchestra produces during its processing will appear on your Linux system in directory /home/user_name/test/out. For example, specify the following parameters in config_test.txt (but substitute actual names): out_directory_root = /data/out input_R1_reads_file = /data/test_R1.fastq input_R2_reads_file = /data/test_R2.fastq (if scaffold file is used) input_scaffold_file = /data/scaffold.fasta Set other parameters as well. At least you need to specify steps to be performed (a selection of binners, ...) 6. Open Linux terminal and run Orchestra by issuing the following one-line(!) command: sudo docker run -it -v /home/user_name/test:/data mago:2.2 /data/config_test.txt