Tutorial - Convert multiple participants in parallel⚓︎
Motivation⚓︎
Instead of manually converting one participant after the other, one could be tempted to speed up the process. There are many ways to speed up the process and using GNU parallel is one of them. GNU parallel provides an intuitive and concise syntax, making it user-friendly even for those with limited programming experience, just like dcm2bids 😄. By utilizing multiple cores simultaneously, GNU parallel significantly speeds up the conversion process, saving time and resources. In sum, by using GNU parallel, we can quickly and easily convert our data with minimal effort and maximum productivity.
Prerequisites⚓︎
Before proceeding with this tutorial, there are a few things you need to have in place:
- Be familiar with
dcm2bids
or, at least, have followed the First steps tutorial; - Have a dcm2bids config file ready or know how to make one;
- Have more than one participant's data to convert;
- Each participant's DICOM files should be organized into separate directories or archives.
- Since version 3.1.0,
dcm2bids
can use compressed archives or directories as input, it doesn't matter.
Setup⚓︎
dcm2bids and GNU parallel must be installed
If you have not installed dcm2bids yet, now is the time to go to the installation page and install dcm2bids with its dependencies. This tutorial does not cover the installation part and assumes you have dcm2bids properly installed.
GNU parallel may be already installed
on your computer. If you can't run the command parallel
, you can download it on
their website. Note that if you installed
dcm2bids in a conda environment you can also install parallel in it through the
conda-forge channel. Once your env is activated, run conda install -c conda-forge parallel
to install it.
Verify dcm2bids and parallel version⚓︎
First thing first, let's make sure our software are usable.
1 2 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
If you don't see a similar output, it is likely an installation issue or the software were not added to your system's PATH. This allows you to easily execute dcm2bids commands without specifying the full path to the executables. If you are using a virtual env or conda env, make sure it is activated.
Create scaffold⚓︎
We will first use the dcm2bids_scaffold
command to create basic BIDS files and
directories. It is based on the material provided by the BIDS starter
kit. This ensures we have a valid BIDS structure to start
with.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Populate the sourcedata
directory⚓︎
This step is optional but it makes things easier when all the data are within
the same directory. The sourcedata
directory is meant to contain your DICOM
files. It doesn't mean you have to duplicate your files there but it is nice to
symlink them there. That being
said, feel free to let your DICOM directories wherever they are, and use that
as an input to your dcm2bids command.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Now that I can access all the punk subjects from within the sourcedata
as
sourcedata/punk_proj/
points to its target.
Get your config file ready and test it⚓︎
You can either run dcm2bids_helper
to help build your config file or import
one if your already have one. The config file is necessary for specifying the
conversion parameters and mapping the metadata from DICOM to BIDS format.
Because the tutorial is about parallel
, I simply copied a config file I
created for my data to code/config_dcm2bids_t1w.json
. This config file aims to
BIDSify and deface T1w found for each participant.
config_dcm2bids_t1w.json | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Make sure that your config file runs successfully on one participant at least before moving onto parallelizing.
In my case,
dcm2bids --auto_extract_entities -c code/config_dcm2bids_t1w.json -d sourcedata/punk_proj/PUNK041.tar.bz2 -p 041
ran without any problem.
Running parallel⚓︎
Running pydeface takes quite a long time to run on a single participant. Instead
of running participant serially as with a for loop
, parallel
can be used to
run as many as your machine can at once.
From a single subject to several at once⚓︎
If you have never heard of parallel, here's how the maintainers describes the tool:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
Understanding how parallel works⚓︎
In order to use parallel, we have to give it a list of our subjects we want to convert. You can generate this list by hand, in a text file or through a first command that you will pipe to parallel.
Here's a basic example to list all the punk_proj participants and run echo
on
each of them.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
However, if you want to do something with the files, you have to be more specific, otherwise the program won't find the file because the relative path is not specified as shown below. However, keep in mind that having just the filenames is also worth it as they contains really important information that we will need, namely the participant ID. We will eventually extract it.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
You can solve this by simply adding the path to the ls command (e.g.,
ls sourcedata/punk_proj/*
) or by using the parallel :::
as input source:
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Extracting participant ID with parallel⚓︎
Depending on how standardized your participants' directory name are, you may have spend a little bit of time figuring out the best way to extract the participant ID from the directory name. This means you might have to read the parallel help pages to dig through examples to find your case scenario.
If you are lucky, all the names are already standardized in addition to being BIDS-compliant already.
In my case, I can use the --plus
flag directly in parallel to extract the
alphanum pattern I wanted to keep by using {/..}
(basename only) or a perl
expression to perform string replacements. Another common case if you want only
the digit from file names (or compressed archives without number) would be to
use {//[^0-9]/}
.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Building the dcm2bids command with parallel⚓︎
Once we know how to extract the participant ID, all we have left to do is to
build the command that will be used in parallel. One easy way to build our
command is to use the --dry-run
flag.
1 |
|
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Launching parallel⚓︎
Once you are sure that the dry-run is what you would like to run, you simply
have to remove the --dry-run
flag and go for walk since the wait time may be
long, especially if pydeface has to run.
If you want to see what is happening, you can add the --verbose
flag to the
parallel command so you will see what jobs are currently running.
Parallel will try to use as much cores as it can by default. If you need to
limit the number of jobs to be parallelize, you can do so by using the
--jobs <number>
option. <number>
is the number of cores you allow parallel
to use concurrently.
1 |
|
Verifying the logs⚓︎
Once all the participants have been converted, it is a good thing to analyze the
dcm2bids logs inside the tmp_dcm2bids/log/
. They all follow the same pattern,
so it is easy to grep
for specific error or warning messages.
1 2 |
|
Created: 2023-09-13