======================================================================== Metagenome Orchestra V2_2b 2020_03_08 Change Log ======================================================================== 1. BUGFIX: Prokka stopped working in V2_2a due to an outdated dependency. Prokka is now upgraded to version 1.14.6 from 1.14.0. 2. UPGRADE: Various pieces of software have been upgraded due to new versions in BioConda and other repositories. E.g. MaxBin has been upgraded from version 2.2.6 to 2.2.7. ======================================================================== Metagenome Orchestra V2_2a 2019_08_12 Change Log ======================================================================== 1. BUGFIX: fixes of several issues that were discovered during testing Orchestra on HPC computing systems. 2. NEW FEATURE: Now it is possible to skip DasTool analysis and directly feed bins from binners to CheckM and further analysis steps (ezTree, Prokka-Roary, FastANI). This is achieved by not enabling DasTool (i.e. by not setting parameter include_DasTool = yes) If DasTool is enabled, then pipeline proceeds as in version V2.1 3. MINOR: Execution order of programs ezTree, Prokka+Roary and FastANI has been changed. Now FastANI is executed first, then Prokka+Roary, and ezTree at the end. The new order reflects the relative execution speeds of these programs. The aim is to complete faster programs first, so that a user can consume as much results as possible as soon as possible. 4. NEW FEATURE: Possibility to collect bins based on a CheckM lineage workflow report. For this step, the following new parameters are introduced: collect_CheckM_filtered_bins copy_CheckM_filtered_bins filter_CheckM_lineage_completeness_min filter_CheckM_lineage_contamination_max filter_CheckM_lineage_completeness_max filter_CheckM_lineage_contamination_min 5. NEW FEATURE: Possibility of a special FastANI analysis on CheckM collected bins. No external references need to be provided to FastANI for this step, since all resulting bins constellate query as well as reference for this multi-query multi-reference FastANI run. For this step the following two new parameters are introduced: filter_CheckM_FastANI_analysis params_CheckM_FastANI_analysis 6. NEW FEATURE: MaxBin can now use its own abundance file or the one produced by the BBMap's method (from pileup file). This is controlled by a new parameter MaxBin_own_abundance_file. Concoct can now use MaxBin's abundance file or the BBMap's one, as controlled by a new parameter: Concoct_abundance_from_MaxBin ======================================================================== Metagenome Orchestra V2_1 2019_06_05 Change Log ======================================================================== 1. IMPROVEMENT: Concoct step now produces bins' fasta patterns. 2. BUGFIX: Previously, Orchestra did not check status of some minor steps like generation of TSV files. Although no failures of these steps have been observed so far, they are now properly monitored and their potential future failures are going to be properly reported for a post-mortem inspection. 3. NEW FEATURE: Possibility of imposing an upper limit on a memory that Orchestra is allowed to consume. For that matter two new configuration parameters are introduced: memory_limit_type and manual_memory_limit_GB. 4. NEW FEATURE: Possibility of processing an arbitrary number of external scaffolds. ======================================================================== Metagenome Orchestra V2_0 2019_05_22 Change Log ======================================================================== 1. NEW FEATURE: Prokka annotation of DasTool bins. 2. NEW FEATURE: Roary building of pan genomes from DasTool bins and Prokka annotations. 3. NEW FEATURE: FastANI alignment-free computing of whole-genome Average Nucleotide Identity between genomes. 4. IMPROVEMENT: Orchestra does not insist any more that a config file contains all sections. Now, only sections where a user actually specifies some parameters need to be present. Section [Global] must always be present, since at least some of its parameters are mandatory. 5. UPGRADE: FastP upgraded from 0.19.7 to 0.20.0 6. BUGFIX: Previously, Orchestra did not notice, if at a certain point in the past FastP was run with option FastP_filter = Yes, which was later on changed to FastP_filter = No. In such cases the influenced steps of pipeline processing were not rerun, since their already available results were not considered outdated. 7. BUGFIX: Previously, Orchestra considered available FastP results obsolete, if option FastP_filter was changed, even if no other FastP parameter was changed. This caused an unnecessary FastP rerun. 8. IMPROVEMENT: Previously, upon a pipeline rerun, only successfully completed steps were not re-executed. Failed steps were always repeated, although relevant circumstances had not changed, and consequently the same step was doomed to fail again. Now, previously failed steps are re-executed only, if circumstances change (e.g. their parameters are changed). 9. IMPROVEMENT: More informative messages about reasons why a certain subprocess is rerun or skipped. 10. IMPROVEMENT: More informative messages about unfulfilled subprocess' interdependencies. 11. BUGFIX: When Orchestra compared new and old parameters of a certain processing step to detect changes, it sometimes stored new parameters even, if associated subprocess was not (re)executed (e.g. because dependencies were not fulfilled). This led to a mix-up by means of which wrong stored parameters were associated with the already present results on a disk. 12. MINOR: Subprocess exit code is written to screen only in case of a non-zero value. Some other messages are shortened to make screen less cluttered. 13. IMPROVEMENT: more comprehensive inspection of subprocess' results and better detection of their failures. 14. BUGFIX: Orchestra did not detect changes of Bwa Index parameters, and consequently it did not consider the respective results obsolete upon changes of these parameters. 15. NEW FEATURE: Orchestra can now delete some intermediate files during pipeline processing to save disk space. 16. MINOR: When joining R1 and R2 files Orchestra renders visual clue of an ongoing progress. 17. NEW FEATURE: Orchestra enables several assemblers to be executed in a single pipeline processing. In addition, it is possible to include external scaffold file along with the ones that are generated by built-in assemblers. 18. NEW FEATURE: Orchestra now offers three methods for SAM file generation (Bwa, Bowtie2 and BBMap). Previously, only Bwa method was available. It then executes a Cartesian product of analyses with all enabled assemblers (plus potential externally provided scaffold) with all enabled SAM generation methods. This opens up a possibility of easy performance comparison of these algorithms. 19. NEW FEATURE: Orchestra now preserves several sets of intermediate results, like scaffold indices, SAM and BAM files. It differentiates between them based on their generation origin (like selected assembler and method of building a SAM file). This way comparative re-executions of the pipeline are much faster. For example, during the first run, a user chooses IDBA_UD for assembler and Bowtie2 as a SAM file creation method. On the second run, it tries IDBA_UD in combination with BBMap. On the third run a combination of MetaSPAdes + Bwa is selected. If later on, a user returns to combination IDBA_UD + Bowtie2 (or any other previous one), there is no need to regenerate scaffold and its indices as well as SAM and BAM files, since all the previously generated versions of these files are preserved and taken into account. This holds only, if parameters for building these files do not change. Otherwise, the influenced files are regenerated from scratch. 20. IMPROVEMENT: Redesigned output directory structure for easier navigation. This is especially important now that Orchestra potentially stores several sets of files from consecutive runs. 21. IMPROVEMENT: Simplified and improved conversion of SAM files to sorted BAM files. The process now takes less steps, and is more flexible (for example, option "-F 4" was forced in Orchestra V1.x, whereas now it is under the control of an operator). 22. IMPROVEMENT: Abundance files are created independently of MaxBin step. Previously, MaxBin created abundance files for which it needed to create its own version of SAM file. This resulted in an unnecessary double creation of SAM file, which wasted noticeable amount of time. In addition, if MaxBin is re-executed (e.g. due to changes of its parameters), only the binning part is actually re-executed. Previously, MaxBin own SAM file generation was re-executed as well, which wasted a lot of processing time. 23. IMPROVEMENT: MaxBin step does not need conversion of fastq reads into fasta format anymore. Consequently, conversion can be skipped, if no other Orchestra step needs it as well. 24. BUGFIX: All files scaffold2bin.tsv, which are linking results of binners to DasTool, have the same name in Mago V1.x, which may confuse DasTool. In Mago V2.x, each TSV file has different name, which is guaranteed to be unique within an isolated pipeline run. 25. IMPROVEMENT: Concoct does not rely on MaxBin anymore to obtain abundance file, which has several advantages. A. MaxBin does not have to run, if its results are not needed. Previously, MaxBin always had to run if Concoct was enabled. B. If MaxBin fails, Concoct in Mago V1.x considers this a failure of a dependency, and consequently does not run in this case. In Mago V2.x Concoct runs independently of MaxBin, so it can proceed even if MaxBin fails during its step. C. If a user changes parameters of MaxBin, Concoct does not need to be re-run any more; previously such change outdated Concoct results., since MaxBin was considered a dependency. 26. IMPROVEMENT: Concoct step does not regard conversion of input fastq reads to a fasta format as a dependency any more. Consequently, conversion can be skipped, if no other Orchestra step needs it as well. 27. BUGFIX: Orchestra did not select proper coverage file for BinSanity binners, if parameter BinSanity_profile_transform was not specified in config file. 28. IMPROVEMENT: New solution to the BinSanity problem of not being able to deal with whitespace characters in contigs' names. Previously, Orchestra built a dedicate dictionary to translate simplified names back to original ones. Now, every scaffold file is inspected, and if it contains whitespaces in contig names, these are replaced with underscores. This way, the full informativeness of original names is preserved, whereas at the same time the complications with dedicated dictionaries and name translations are avoided. 29. IMPROVEMENT: Similarly to the previous point: Prokka cannot deal with names longer than 37 characters. Consequently, Orchestra inspects contig names and trims them, if necessary. 30. NEW FEATURE: BinSanity_lc for lesser memory consumption. Both BinSanity incarnations (plain and workflow) that were offered in Mago V1.x, are fairly memory demanding and can quickly exhaust even available memory of a powerful computer, by means of which two of five binners in Mago V1.x were frequently failing. ======================================================================== Metagenome Orchestra V1_2 2019_04_15 Change Log ======================================================================== 1. NEW FEATURE: Integration of FastP for inspecting and filtering of input sequences. 2. UPGRADE: BinSanity upgraded to version 0.2.8. 3. BUGFIX: In certain circumstances Orchestra did not properly detect that some dependent step was not completed successfully, and it tried to run further steps although the intermediate results were not ready. 4. BUGFIX: Orchestra failed to detect that no binner is enabled, if DasTool was not enabled in configuration file, but it was forced to run by other steps (e.g. ezTree) 5. IMPROVEMENT: Better tracking of status of subprocesses by combining monitoring of exit codes and inspection of their STDOUT and STDERR streams. 6. Improvements and corrections of documentation. ======================================================================== Metagenome Orchestra V1_1 2019_02_25 Change Log ======================================================================== 1. BUGFIX: ezTree did not correctly generate NWK file. 2. UPGRADE: MaxBin upgraded from version 2.2.4 to 2.2.6. 3. UPGRADE: some other included software pieces upgraded to their latest versions. 4. IMPROVEMENT: pipeline now detects ezTree's outcome "Cannot find any PFAM families that exist once and only once in all genomes.", and reports it as such instead of reporting a generic abnormal termination. 5. IMPROVEMENT: pipeline now renders on screen number of DasTool bins that fulfill user's selection criteria for inclusion in ezTree analysis (according to results of CheckM lineage workflow). 6. IMPROVEMENT: Previously CheckM lineage workflow had to be run whenever ezTree was enabled. Now, ezTree forces running of CheckM lineage workflow only, if a user filters packets to be inputted to ezTree by parameters ezTree_CheckM_lineage_completeness_min, ezTree_CheckM_lineage_contamination_max, ezTree_CheckM_lineage_completeness_max or ezTree_CheckM_lineage_contamination_min; i.e. if at least one of these parameters is set in config file. 7. IMPROVEMENT: pipeline now renders number of generated bins (after running individual binners and DasTool) on the screen for easier monitoring of progress during Orchestra running. 8. IMPROVEMENT: results of individual binners are now fed to DasTool only, if a binner produces at least one bin. Previously results were passed to DasTool as long as a binner completed its execution in a normal way, although it did not produce any bins. 9. IMPROVEMENT: clearer and more specific reporting of errors in configuration file. A. IMPROVEMENT: start and end times as well as duration of execution of subprocesses (assemblers, binners, ...) is now rendered on the screen for easier monitoring of progress during Orchestra running. B. IMPROVEMENT: more graceful handling of keyboard (Ctrl-C) error during running of external subprocesses. C. Expanded and clarified documentation.