Getting started¶

Prerequisites¶

Hardware¶

scprocess is designed to run on Linux-based high-performance computing (HPC) clusters but can also be used on powerful standalone workstations.

For users intending to perform ambient RNA correction with CellBender¹ or the integration step using RAPIDS-singlecell², access to GPU with CUDA support is required.

Software¶

scprocess requires Conda. If you do not already have Conda installed, refer to the Conda user guide for detailed installation instructions.

If you plan to use CellBender for ambient RNA correction, you will also need Apptainer. For guidance on installing Apptainer, see the installation instructions.

Installation¶

Clone the repository:

 git clone https://github.com/marusakod/scprocess.git

Create a Conda environment:

Navigate into the scprocess directory and create a Conda environment. Choose the appropriate command based on your operating environment (local machine, SLURM, or LSF cluster).
localSLURMLSF
conda env create -n scprocess -f envs/scprocess_local.yaml
conda env create -n scprocess -f envs/scprocess_slurm.yaml
conda env create -n scprocess -f envs/scprocess_lsf.yaml
If you are using LSF or SLURM, remember to also review the Cluster setup section below.
Add scprocess to your PATH.

Open your ~/.bashrc file and add the following line:
```
export PATH=/PATH/TO/YOUR/FOLDER/scprocess:${PATH}
```
Verify the installation by:
- Reloading your ~/.bashrc file:
```
source ~/.bashrc
```
- Activating the scprocess Conda environment and checking for help messages:
```
conda activate scprocess
scprocess
```

scprocess data directory setup¶

scprocess requires a dedicated directory to store all necessary data, such as reference genomes.

Create the data directory:
```
mkdir /path/to/scdata
```

Add the following line to your .bashrc file:

export SCPROCESS_DATA_DIR=/path/to/scdata

Create a configuration file scprocess_setup.yaml in the $SCPROCESS_DATA_DIR directory you just created, with the contents as follows:
```
user:
  local_cores: 8
ref_txomes:
  tenx:
  - name: human_2024 
  - name: mouse_2024 
```
This will ask the setup process to download and prepare the most recent pre-built human and mouse reference transcriptomes from 10x Genomics. local_cores should be specified if you're running scprocess locally. If you're running scprocess on a cluster you should use profile instead (see Cluster setup section bellow). For more information on how to structure the scprocess_setup.yaml see the Reference section.
Save some space by removing the reference transcriptome used for the tutorial

Quick start tutorial section demonstrates how to run scprocess on example mouse datasets. In order for users to be able to follow the tutorial scprocess setup will automatically download the mouse_2024 reference transcriptome. If you would like to remove it (after running the tutorial) use:
```
rm -rf $SCPROCESS_DATA_DIR/reference_transcriptomes/mouse_2024
rm -rf $SCPROCESS_DATA_DIR/alevin_fry_home/mouse_2024

# optionally modify the index_parameters.csv file where all genomes available in $SCPROCESS_DATA_DIR are listed

awk -F',' '$1 != "mouse_2024"' $SCPROCESS_DATA_DIR/index_parameters.csv > $SCPROCESS_DATA_DIR/temp.csv && mv $SCPROCESS_DATA_DIR/temp.csv $SCPROCESS_DATA_DIR/index_parameters.csv 
```
Finish setting up the scprocess data directory:

To download all required data and index reference transcriptomes use the scprocess setup command. The first time you run scprocess setup you need to specify a -c/--rangerurl flag and provide a valid download link for Cellranger (v9.0.0 or higher) available on the 10x Genomics Cell Ranger download & installation page :
```
scprocess setup -c "https://cf.10xgenomics.com/releases/cell-exp/cellranger-10.0.0.tar.gz..." 
```
Note that scprocess only requires barcode whitelists from Cell Ranger, therefore the full Cell Ranger installation will not be retained after the setup process.

Once the inital setup is complete, you do not need to provide the Cell Ranger link again i.e if you modify the scprocess_setup.yaml file to add additional reference genomes, simply run scprocess setup.

Cluster setup¶

scprocess is intended to be used on a cluster with a job scheduler such as SLURM or LSF (although it will also work without a job scheduler). To set up a job scheduler in Snakemake, it is common to define a configuration profile with cluster settings e.g. resource allocation. scprocess comes with two predefined configuration profiles stored in the profiles directory: profiles/slurm_default and profiles/lsf_default for SLURM and LSF respectively.

To use scprocess with a job scheduler, you need to add a profile parameter to your scprocess_setup.yaml file:

SLURMLSF

user:
  profile: slurm_default

user:
  profile: lsf_default

If you want to make a profile that is specific to your cluster, we recommend that you make a copy one of the default profile folders, e.g. to profiles/slurm_my_cluster, then edit the corresponding config.yaml file. Once you are happy with it, edit the scprocess_setup.yaml file to point to this profile like before, e.g.

user:
  profile: slurm_my_cluster

scprocess setup and scprocess run will then run in cluster mode with the specifications in this profile.

Stephen J Fleming, Mark D Chaffin, Alessandro Arduini, Amer-Denis Akkad, Eric Banks, John C Marioni, Anthony A Philippakis, Patrick T Ellinor, and Mehrtash Babadi. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat. Methods, 20(9):1323–1335, September 2023. ↩
Corey Nolet, Avantika Lal, Rajesh Ilango, Taurean Dyer, Rajiv Movva, John Zedlewski, and Johnny Israeli. Accelerating single-cell genomic analysis with GPUs. bioRxiv, May 2022. ↩