========================================================================
Metagenome Orchestra V2_2b 2020_03_08 Change Log
========================================================================


1. BUGFIX:  Prokka stopped working in V2_2a due to an outdated dependency.
            Prokka is now upgraded to version 1.14.6 from 1.14.0.
           
2. UPGRADE: Various pieces of software have been upgraded due to new
            versions in BioConda and other repositories.
            E.g. MaxBin has been upgraded from version 2.2.6 to 2.2.7.


========================================================================
Metagenome Orchestra V2_2a 2019_08_12 Change Log
========================================================================


1. BUGFIX:      fixes of several issues that were discovered during
                testing Orchestra on HPC computing systems.

2. NEW FEATURE: Now it is possible to skip DasTool analysis and directly
                feed bins from binners to CheckM and further analysis
                steps (ezTree, Prokka-Roary, FastANI).
                This is achieved by not enabling DasTool
                (i.e. by not setting parameter include_DasTool = yes)
                If DasTool is enabled, then pipeline proceeds as
                in version V2.1

3. MINOR:       Execution order of programs ezTree, Prokka+Roary and
                FastANI has been changed. Now FastANI is executed
                first, then Prokka+Roary, and ezTree at the end.
                The new order reflects the relative execution speeds
                of these programs. The aim is to complete faster
                programs first, so that a user can consume as much
                results as possible as soon as possible.

4. NEW FEATURE: Possibility to collect bins based on a CheckM
                lineage workflow report.
                For this step, the following new parameters
                are introduced:
                collect_CheckM_filtered_bins
                copy_CheckM_filtered_bins
                filter_CheckM_lineage_completeness_min
                filter_CheckM_lineage_contamination_max
                filter_CheckM_lineage_completeness_max
                filter_CheckM_lineage_contamination_min

5. NEW FEATURE: Possibility of a special FastANI analysis
                on CheckM collected bins. No external
                references need to be provided to FastANI
                for this step, since all resulting bins
                constellate query as well as reference
                for this multi-query multi-reference
                FastANI run. For this step the following
                two new parameters are introduced:
                filter_CheckM_FastANI_analysis
                params_CheckM_FastANI_analysis

6. NEW FEATURE: MaxBin can now use its own abundance file or
                the one produced by the BBMap's method
                (from pileup file). This is controlled by
                a new parameter MaxBin_own_abundance_file.
                Concoct can now use MaxBin's abundance file
                or the BBMap's one, as controlled by a new
                parameter: Concoct_abundance_from_MaxBin


========================================================================
Metagenome Orchestra V2_1 2019_06_05 Change Log
========================================================================


1. IMPROVEMENT: Concoct step now produces bins' fasta patterns.

2. BUGFIX:      Previously, Orchestra did not check status of
                some minor steps like generation of TSV files.
                Although no failures of these steps have been
                observed so far, they are now properly monitored
                and their potential future failures are going
                to be properly reported for a post-mortem inspection.

3. NEW FEATURE: Possibility of imposing an upper limit on a
                memory that Orchestra is allowed to consume.
                For that matter two new configuration parameters
                are introduced: memory_limit_type and
                manual_memory_limit_GB.

4. NEW FEATURE: Possibility of processing an arbitrary number
                of external scaffolds.


========================================================================
Metagenome Orchestra V2_0 2019_05_22 Change Log
========================================================================


 1. NEW FEATURE: Prokka annotation of DasTool bins.

 2. NEW FEATURE: Roary building of pan genomes from DasTool
                 bins and Prokka annotations.

 3. NEW FEATURE: FastANI alignment-free computing of
                 whole-genome Average Nucleotide Identity
                 between genomes.

 4. IMPROVEMENT: Orchestra does not insist any more that a config
                 file contains all sections. Now, only sections
                 where a user actually specifies some parameters
                 need to be present.
                 Section [Global] must always be present, since
                 at least some of its parameters are mandatory.

 5. UPGRADE:     FastP upgraded from 0.19.7 to 0.20.0

 6. BUGFIX:      Previously, Orchestra did not notice, if
                 at a certain point in the past FastP was
                 run with option FastP_filter = Yes, which
                 was later on changed to FastP_filter = No.
                 In such cases the influenced steps of pipeline
                 processing were not rerun, since their already
                 available results were not considered outdated.

 7. BUGFIX:      Previously, Orchestra considered available FastP
                 results obsolete, if option FastP_filter was
                 changed, even if no other FastP parameter was
                 changed. This caused an unnecessary FastP rerun.

 8. IMPROVEMENT: Previously, upon a pipeline rerun, only
                 successfully completed steps were not
                 re-executed. Failed steps were always repeated,
                 although relevant circumstances had not
                 changed, and consequently the same step was
                 doomed to fail again. Now, previously failed
                 steps are re-executed only, if circumstances
                 change (e.g. their parameters are changed).

 9. IMPROVEMENT: More informative messages about reasons why a
                 certain subprocess is rerun or skipped.

10. IMPROVEMENT: More informative messages about unfulfilled
                 subprocess' interdependencies.

11. BUGFIX:      When Orchestra compared new and old parameters
                 of a certain processing step to detect changes,
                 it sometimes stored new parameters even,
                 if associated subprocess was not (re)executed
                 (e.g. because dependencies were not fulfilled).
                 This led to a mix-up by means of which wrong
                 stored parameters were associated with the
                 already present results on a disk.

12. MINOR:       Subprocess exit code is written to screen only
                 in case of a non-zero value. Some other messages
                 are shortened to make screen less cluttered.

13. IMPROVEMENT: more comprehensive inspection of subprocess'
                 results and better detection of their failures.

14. BUGFIX:      Orchestra did not detect changes of Bwa Index
                 parameters, and consequently it did not consider
                 the respective results obsolete upon changes
                 of these parameters.

15. NEW FEATURE: Orchestra can now delete some intermediate
                 files during pipeline processing to save
                 disk space.

16. MINOR:       When joining R1 and R2 files Orchestra renders
                 visual clue of an ongoing progress.

17. NEW FEATURE: Orchestra enables several assemblers to be
                 executed in a single pipeline processing.
                 In addition, it is possible to include external
                 scaffold file along with the ones that
                 are generated by built-in assemblers.

18. NEW FEATURE: Orchestra now offers three methods for SAM
                 file generation (Bwa, Bowtie2 and BBMap).
                 Previously, only Bwa method was available.
                 It then executes a Cartesian product of
                 analyses with all enabled assemblers
                 (plus potential externally provided scaffold)
                 with all enabled SAM generation methods.
                 This opens up a possibility of easy
                 performance comparison of these algorithms.

19. NEW FEATURE: Orchestra now preserves several sets of
                 intermediate results, like scaffold indices,
                 SAM and BAM files. It differentiates between
                 them based on their generation origin (like
                 selected assembler and method of building a
                 SAM file). This way comparative re-executions
                 of the pipeline are much faster. For example,
                 during the first run, a user chooses IDBA_UD
                 for assembler and Bowtie2 as a SAM file creation
                 method. On the second run, it tries IDBA_UD in
                 combination with BBMap. On the third run a
                 combination of MetaSPAdes + Bwa is selected.
                 If later on, a user returns to combination
                 IDBA_UD + Bowtie2 (or any other previous one),
                 there is no need to regenerate scaffold and
                 its indices as well as SAM and BAM files,
                 since all the previously generated versions of
                 these files are preserved and taken into account.
                 This holds only, if parameters for building
                 these files do not change. Otherwise, the
                 influenced files are regenerated from scratch.

20. IMPROVEMENT: Redesigned output directory structure
                 for easier navigation. This is especially
                 important now that Orchestra potentially stores
                 several sets of files from consecutive runs.

21. IMPROVEMENT: Simplified and improved conversion of SAM files
                 to sorted BAM files. The process now takes less
                 steps, and is more flexible (for example, option
                 "-F 4" was forced in Orchestra V1.x, whereas now
                 it is under the control of an operator).

22. IMPROVEMENT: Abundance files are created independently of
                 MaxBin step. Previously, MaxBin created
                 abundance files for which it needed to
                 create its own version of SAM file. This
                 resulted in an unnecessary double creation
                 of SAM file, which wasted noticeable amount
                 of time. In addition, if MaxBin is re-executed
                 (e.g. due to changes of its parameters), only
                 the binning part is actually re-executed.
                 Previously, MaxBin own SAM file generation was
                 re-executed as well, which wasted a lot of
                 processing time.

23. IMPROVEMENT: MaxBin step does not need conversion of fastq
                 reads into fasta format anymore. Consequently,
                 conversion can be skipped, if no other Orchestra
                 step needs it as well.

24. BUGFIX:      All files scaffold2bin.tsv, which are linking results
                 of binners to DasTool, have the same name in Mago V1.x,
                 which may confuse DasTool. In Mago V2.x, each TSV file
                 has different name, which is guaranteed to be unique
                 within an isolated pipeline run.

25. IMPROVEMENT: Concoct does not rely on MaxBin anymore to obtain
                 abundance file, which has several advantages.

                 A. MaxBin does not have to run, if its results are
                    not needed. Previously, MaxBin always had to run
                    if Concoct was enabled.

                 B. If MaxBin fails, Concoct in Mago V1.x considers
                    this a failure of a dependency, and consequently
                    does not run in this case. In Mago V2.x Concoct
                    runs independently of MaxBin, so it can proceed
                    even if MaxBin fails during its step.

                 C. If a user changes parameters of MaxBin, Concoct
                    does not need to be re-run any more; previously
                    such change outdated Concoct results., since
                    MaxBin was considered a dependency.

26. IMPROVEMENT: Concoct step does not regard conversion of input
                 fastq reads to a fasta format as a dependency any more.
                 Consequently, conversion can be skipped, if no other
                 Orchestra step needs it as well.

27. BUGFIX:      Orchestra did not select proper coverage file for
                 BinSanity binners, if parameter BinSanity_profile_transform
                 was not specified in config file.

28. IMPROVEMENT: New solution to the BinSanity problem of not being able
                 to deal with whitespace characters in contigs' names.
                 Previously, Orchestra built a dedicate dictionary to
                 translate simplified names back to original ones.
                 Now, every scaffold file is inspected, and if it contains
                 whitespaces in contig names, these are replaced with
                 underscores. This way, the full informativeness
                 of original names is preserved, whereas at the same time
                 the complications with dedicated dictionaries and
                 name translations are avoided.

29. IMPROVEMENT: Similarly to the previous point: Prokka cannot deal with
                 names longer than 37 characters. Consequently, Orchestra
                 inspects contig names and trims them, if necessary.

30. NEW FEATURE: BinSanity_lc for lesser memory consumption. Both
                 BinSanity incarnations (plain and workflow) that
                 were offered in Mago V1.x, are fairly memory demanding
                 and can quickly exhaust even available memory of a
                 powerful computer, by means of which two of five
                 binners in Mago V1.x were frequently failing.


========================================================================
Metagenome Orchestra V1_2 2019_04_15 Change Log
========================================================================


1. NEW FEATURE: Integration of FastP for inspecting and
                filtering of input sequences.

2. UPGRADE: BinSanity upgraded to version 0.2.8.

3. BUGFIX: In certain circumstances Orchestra did not properly
           detect that some dependent step was not completed
           successfully, and it tried to run further steps
           although the intermediate results were not ready.

4. BUGFIX: Orchestra failed to detect that no binner is enabled,
           if DasTool was not enabled in configuration file, but
           it was forced to run by other steps (e.g. ezTree)

5. IMPROVEMENT: Better tracking of status of subprocesses by
                combining monitoring of exit codes and
                inspection of their STDOUT and STDERR streams.

6. Improvements and corrections of documentation.


========================================================================
Metagenome Orchestra V1_1 2019_02_25 Change Log
========================================================================


1. BUGFIX: ezTree did not correctly generate NWK file.

2. UPGRADE: MaxBin upgraded from version 2.2.4 to 2.2.6.

3. UPGRADE: some other included software pieces upgraded to
            their latest versions.

4. IMPROVEMENT: pipeline now detects ezTree's outcome
                "Cannot find any PFAM families that exist
                once and only once in all genomes.",
                and reports it as such instead of reporting a
                generic abnormal termination.

5. IMPROVEMENT: pipeline now renders on screen number of DasTool bins
                that fulfill user's selection criteria for inclusion
                in ezTree analysis (according to results of CheckM
                lineage workflow).

6. IMPROVEMENT: Previously CheckM lineage workflow had to be run
                whenever ezTree was enabled. Now, ezTree forces
                running of CheckM lineage workflow only, if a user
                filters packets to be inputted to ezTree by parameters
                ezTree_CheckM_lineage_completeness_min,
                ezTree_CheckM_lineage_contamination_max,
                ezTree_CheckM_lineage_completeness_max or
                ezTree_CheckM_lineage_contamination_min;
                i.e. if at least one of these parameters is set
                in config file.

7. IMPROVEMENT: pipeline now renders number of generated bins
                (after running individual binners and DasTool)
                on the screen for easier monitoring of progress
                during Orchestra running.

8. IMPROVEMENT: results of individual binners are now fed to DasTool
                only, if a binner produces at least one bin.
                Previously results were passed to DasTool as long as
                a binner completed its execution in a normal way,
                although it did not produce any bins.

9. IMPROVEMENT: clearer and more specific reporting of errors in
                configuration file.

A. IMPROVEMENT: start and end times as well as duration of execution
                of subprocesses (assemblers, binners, ...) is now
                rendered on the screen for easier monitoring of
                progress during Orchestra running.

B. IMPROVEMENT: more graceful handling of keyboard (Ctrl-C) error
                during running of external subprocesses.

C. Expanded and clarified documentation.