Getting started¶
Prerequisites¶
Hardware¶
scprocess is designed to run on Linux-based high-performance computing (HPC) clusters but can also be used on powerful standalone workstations.
For users intending to perform ambient RNA correction with CellBender1 or the integration step using RAPIDS-singlecell2, access to GPU with CUDA support is required.
Software¶
scprocess requires Conda. If you do not already have Conda installed, refer to the Conda user guide for detailed installation instructions.
If you plan to use CellBender for ambient RNA correction, you will also need Apptainer. For guidance on installing Apptainer, see the installation instructions.
Installation¶
-
Clone the repository:
-
Create a Conda environment:
Navigate into the scprocess directory and create a Conda environment. Choose the appropriate command based on your operating environment (local machine, SLURM, or LSF cluster).
If you are using LSF or SLURM, remember to also review the Cluster setup section below.
-
Add scprocess to your PATH.
Open your
~/.bashrcfile and add the following line:Verify the installation by:
-
Reloading your
~/.bashrcfile: -
Activating the scprocess Conda environment and checking for help messages:
-
scprocess data directory setup¶
scprocess requires a dedicated directory to store all necessary data, such as reference genomes.
-
Create the data directory:
-
Add the following line to your
.bashrcfile: -
Create a configuration file scprocess_setup.yaml in the
$SCPROCESS_DATA_DIRdirectory you just created, with the contents as follows:This will ask the setup process to download and prepare the most recent pre-built human and mouse reference transcriptomes from 10x Genomics.
local_coresshould be specified if you're running scprocess locally. If you're running scprocess on a cluster you should useprofileinstead (see Cluster setup section bellow). For more information on how to structure the scprocess_setup.yaml see theReferencesection.Save some space by removing the reference transcriptome used for the tutorial
Quick start tutorial section demonstrates how to run scprocess on example mouse datasets. In order for users to be able to follow the tutorial scprocess setup will automatically download the
mouse_2024reference transcriptome. If you would like to remove it (after running the tutorial) use:rm -rf $SCPROCESS_DATA_DIR/reference_transcriptomes/mouse_2024 rm -rf $SCPROCESS_DATA_DIR/alevin_fry_home/mouse_2024 # optionally modify the index_parameters.csv file where all genomes available in $SCPROCESS_DATA_DIR are listed awk -F',' '$1 != "mouse_2024"' $SCPROCESS_DATA_DIR/index_parameters.csv > $SCPROCESS_DATA_DIR/temp.csv && mv $SCPROCESS_DATA_DIR/temp.csv $SCPROCESS_DATA_DIR/index_parameters.csv -
Finish setting up the scprocess data directory:
To download all required data and index reference transcriptomes use the scprocess setup command. The first time you run scprocess setup you need to specify a
-c/--rangerurlflag and provide a valid download link for Cellranger (v9.0.0 or higher) available on the 10x Genomics Cell Ranger download & installation page :Note that scprocess only requires barcode whitelists from Cell Ranger, therefore the full Cell Ranger installation will not be retained after the setup process.
Once the inital setup is complete, you do not need to provide the Cell Ranger link again i.e if you modify the scprocess_setup.yaml file to add additional reference genomes, simply run scprocess setup.
Cluster setup¶
scprocess is intended to be used on a cluster with a job scheduler such as SLURM or LSF (although it will also work without a job scheduler). To set up a job scheduler in Snakemake, it is common to define a configuration profile with cluster settings e.g. resource allocation. scprocess comes with two predefined configuration profiles stored in the profiles directory: profiles/slurm_default and profiles/lsf_default for SLURM and LSF respectively.
To use scprocess with a job scheduler, you need to add a profile parameter to your scprocess_setup.yaml file:
If you want to make a profile that is specific to your cluster, we recommend that you make a copy one of the default profile folders, e.g. to profiles/slurm_my_cluster, then edit the corresponding config.yaml file. Once you are happy with it, edit the scprocess_setup.yaml file to point to this profile like before, e.g.
scprocess setup and scprocess run will then run in cluster mode with the specifications in this profile.-
Stephen J Fleming, Mark D Chaffin, Alessandro Arduini, Amer-Denis Akkad, Eric Banks, John C Marioni, Anthony A Philippakis, Patrick T Ellinor, and Mehrtash Babadi. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat. Methods, 20(9):1323–1335, September 2023. ↩
-
Corey Nolet, Avantika Lal, Rajesh Ilango, Taurean Dyer, Rajiv Movva, John Zedlewski, and Johnny Israeli. Accelerating single-cell genomic analysis with GPUs. bioRxiv, May 2022. ↩