Files present in PARANOiD

This is an overview of the files and directories that come with PARANOiD.

  1. bin

  2. dockerfiles

  3. docs

  4. modules

  5. build_docker.sh

  6. featuretypes-from-gtfgff.awk

  7. LICENSE

  8. LICENSE.pybam

  9. main.nf

  10. PARANOiD-deprecated-DSL1.nf

  11. pull_images.sh

  12. README.md

bin

Directory that mainly consists of custom scripts that are needed for several PARANOiD steps. This directory is only necessary if no containers are used to execute PARANOiD. Typically there is no need for users to interact with files in this directory.

dockerfiles

Directory that contains dockerfiles from which container images can be built if necessary. Images built from these dockerfiles can be used to generate containers for every step executed by PARANOiD (except PureCLIP). Typically there is no need for users to interact with files in this directory.

docs

Directory that contains files necessary to build and display this documentation. Typically there is no need for users to interact with files in this directory.

modules

Directory that contains all nextflow modules included by PARANOiD. These modules are a collection of processes that can be included in nextflow. Each process describes the implementation of a specific step together with the necessary and optional inputs and the generated outputs. Typically there is no need for users to interact with files in this directory.

build_docker.sh

Shell script that can be used to automatically build images from all docker files included in the correspondent directory and upload them to docker hub. Typically there is no need for users to interact with files in this directory.

featuretypes-from-gtfgff.awk

Short awk script that can be used to get all feature-types described within a gtf or gff file. Can be useful for the RNA subtype analysis as it needs the exact subtype names. Usage can be found here.

LICENSE

MIT copyright declaration. Basically says that PARANOiD can be used however you please. You can copy, change and publish this software or parts of it as long as it is under MIT copyright.

LICENSE.pybam

Apache copyright declaration which is only valid for pybam, which is used in the process of generating cross-link pile ups from bam files after the alignment. The Apache copyright allows you to use or change the software as much as you want, as long as you do it under the Apache copyright and make notices on all altered files.

main.nf

Nextflow script to run when starting a PARANOiD anaylsis. Uses processes described within the modules directory and connects them in the right order and with the correct logic to form the pipeline.

nextflow.config

Config file that is automatically used by PARANOiD (given that it is present in the same directory as the main.nf script). Consists of 3 parts:

Parameters

A list of all parameters usable when running PARANOiD together and their default values. Default parameters can be adapted by users to better suit their needs.

Profiles

Describes usage of container executors and cluster distribution. The specifications should work on most systems but there is a possibility that they need to be adapted if errors related to the profiles arise.

Resource allocations

Describes the computational resources that will be required to run each process. The current resource requirements are chosen in order to work for most datasets and might not be necessary for all datasets. In some cases they might even be set too low; it depends on the size of the read file and the reference. However, they can (and in some cases should) be adapted if the used system does not meet the required resources which are currently set to 8 cores and 100 GB RAM. If PARANOiD will be executed on a local computer with less resources available than necessary, the resource requirements can be adapted in this file. Lowering the required resources can also increase the computing speed as more processes are allowed to be run in parallel. In this case the file nextflow.config can be opened via a text editor and the relevant resource requirements changed. The most relevant processes will be 'build_index_STAR|mapping_STAR' as they require the highest amount of resources. When opening the config file the relevant entry looks like this:

withName: 'build_index_STAR|mapping_STAR' {
            cpus = 8
            memory = '100 GB'
            container = 'docker://pbarth/star:1.0'
    }

To change the required cores the number after cpus = ** needs to be changed - to lower it to 4 cores it should be **cpus = 4. To change the required memory the number after memory = ** needs to be changed - to lower it to 50 GB it should be **memory = '50 GB'.

PARANOiD-deprecated-DSL1.nf

An older version of PARANOiD that uses DSL1 instead of the later DSL2. Should not be used as it is already deprecated and will not receive any updates in future.

pull_images.sh

Shell script that can be used to download all images used to build containers by PARANOiD into a specific directory. Can be used as preparation if PARANOiD is supposed to be run without internet connection. Additional information on how t run the script can be found here.

README.md

Readme displayed on github. Typically there is no need for users to interact with this file.