Snakemake TIPS

Similar to GUN make, you specify target in terms of a persodu rule at the top.
Input & output files can contain multiple named wildcards.
DAG (Directed Acyclic Graph)
Of course, the input file might have to be generated by another rule with different
wildcards.
Multiple wildcards in one filename can cause ambiguity.
Note that shell commands in Snakemake use the bash shell in strict mode by default.

Modularization

Includes:include: "path/to/other/snakefile"
Another Snakefile with all its rules can be included into the current:
Sub-Workflows：In addition to including rules of another workflow, Snakemake allows to depend on the output of other workflows as sub-workflows

subworkflow otherworkflow:
    workdir:
        "../path/to/otherworkflow"
    snakefile:
        "../path/to/otherworkflow/Snakefile"
    configfile:
        "path/to/custom_configfile.yaml"

rule a:
    input:
        otherworkflow("test.txt")
    output: ...
    shell:  ...

Arguments

Cluster:
- --cluster, -c:
Execution:
- --dryrun,-n:Do not execute anything, and display what would be done.
  --dryrun --quiet :Just print a summary of the DAG of jobs.
- --forcerun [TARGET [TARGET ...]], -R[TARGET [TARGET ...]]
  Force the re-execution or creation of the given rules or files.
- --keep-going, -k:Go on with independent jobs if a job fails.
Utilities
- --rulegraph:print the dependency graph of rules in the dot language ; each rule is displayed once ;
  Usage
  common smk command + --rulegraph |dot -Tpng/-Tpdf > output.file
  notice:the Building DAG of jobs... line should be deleted to make sure flow chart generated correctly;there is also a functionally similar argument called --dag,which could generate DAG for each sample,it's redundancy in common application scenario to my mind.
- --list, -l/--list-target-rules, --lt :print the dependency graph of rules in the dot language.
  Show available rules/ available target rules in given Snakefile.
  Working with given config info.
- --unlock ：Remove a lock on the working directory.[default:false]
- --delete-all-output/--delete-temp-output :
  Remove all/all temporary files generated by the workflow.
  Use together with --dryrun to list files without actually deleting anything.
  not recurse into subworkflows ;Write-protected files are not removed.
Output
- --quiet, -q: Do not output any progress or rule information.
- --printshellcmds, -p :Print out the shell commands that will be executed.[default:false]
Behavior:
- --latency-wait, --output-wait, -w :wait seconds for these files to be present before executing the workflow. This option is used internally to handle filesystem latency in cluster environments. [Default:5s]
- --restart-times :Number of times to restart failing jobs [defaults:0].

Rules

run: a rule can run some python code instead of a shell command
Protected and Temporary Files

output:
  #A protected file will be write-protected after the rule that
  #produces it is completed against accidental deletion or overwriting.
  protected("path/to/outputfile")
  #An output file marked as temp is deleted 
  #after all rules that use it as an input are completed:
  temp("path/to/outputfile")

Ignoring timestamps
For determining whether output files have to be re-created, Snakemake checks whether the file modification date (i.e. the timestamp) of any input file of the same job is newer than the timestamp of the output file.

rule NAME:
  input:
    #marking an input file as ancient to assume it's older than output file
    ancient("path/to/inputfile")

Flag files

rule mytask:
  #touches (i.e. creates or updates) the file mytask.done after mycommand has finished successfully.
  output: touch("mytask.done")

Log-Files
log: "logs/abc.{dataset}.log"
Multiple log files supported:log: log1="logs/abc.log", log2="logs/xyz.log"
Can be used as input for other rules/are not deleted upon error.
You may always use 2> {log} to redirect standard output to a file (here, the log file) in Linux-based systems.
Dynamic Files

rule cluster:
    input: "afile.csv"
    output: dynamic("{clusterid}.cluster.csv")
    run: ...
#The number of output files is unknown before the rule was excuted.
#Snakemake determines the input files for the rule all after the rule cluster was executed, 
#and then dynamically inserts jobs of the rule plot into the DAG to create the desired plots.
rule all:
  input: dynamic("{clusterid}.cluster.plot.pdf")
rule plot:
  input: "{clusterid}.cluster.csv"
  output: "{clusterid}.cluster.plot.pdf"

Useless INFO

The name of a rule is optional and can be left out, creating an anonymous rule. = =