forum.indaco.unimi.it

Posted: **Thu Oct 03, 2019 9:11 am**

Cura questo topic Thierry Nieus

Posted: **Wed Dec 01, 2021 3:20 pm**

Hi all,

when running a process with sbatch a problem relative to missing files might occur.

To be more concrete suppose we want to run a python program with sbatch that uses some functions of the module mymodule and we run in the following message:

ModuleNotFoundError: No module named 'mymodule'

Such an error might occur because the script you want to run is copied to a Slurm specific directory of another node and run from there.

Therefore, in order to let the process know about the directory hosting mymodule you have to insert the path of origin with:

Code: Select all

import sys,os
sys.path.append(os.getcwd())

right before importing mymodule, that is:

Code: Select all

import sys,os
sys.path.append(os.getcwd()) # append the path of the original folder to the system folders  
import mymodule  # now the module can be retrieved

Further information and examples can be found online, see for example:
https://stackoverflow.com/questions/392 ... d-in-slurm

Posted: **Thu Dec 02, 2021 9:51 am**

In the first posts I am covering here I would like to address some common problems that arise when running a process on Indaco. In particular I will focus on some specific examples based on the software I know better, such as the neural simulators Neuron https://neuron.yale.edu/neuron/, Nest https://www.nest-simulator.org/, the Python language and the Python modules (e.g. MNE https://mne.tools/stable/index.html for EEG data analysis), though I believe the solutions discussed here can be of general usage.

Note: I would like to keep the posts in English for a broad audience although ... conosco meglio l'italiano

Posted: **Fri Dec 03, 2021 12:02 pm**

Hi all,
here I am covering the general case of an analysis (the same!) that has to be performed repetitively on a bunch of files.
In order to perform such an analysis on INDACO you can use the script below. The example is given for the Python ambient but it can easily be adapted to any other language (I have tested the same approach on Matlab, R, etc...).
Here, I am supposing:
1) the list of files to analyze has been saved beforehand in a file called file_list_name (below).
2) all scripts needed for the analysis have been copied from your local folder path_scripts_laptop to path_scripts_cluster
3) the main procedure to launch my_procedure is well defined in main_file.py

Once the main variables are set (between # from here <---- and # to here ----> ) a simple run of the procedure batch(file_list_name) generates all the scripts needed to prepare your run on Indaco

Code: Select all

''' 
Prototypical example that generates all scripts needed to launch an analysis on a set of files.
The procedure batch (below) will generate a set of files in the folder <<path_slurm_scripts_laptop>> of your computer. 

For any problem/doubts on this please write to: thierry.nieus@unimi.it
'''
import os # module for calls to the operating system 

# change the following variables according to your run

# from here <---- 
path_scripts_laptop='/home/myuser/scripts/'                   # path to the scripts on your laptop 
path_scripts_cluster='/home/users/name.surname/my_project/'   # path to the scripts on the cluster  
path_data_cluster='/home/users/name.surname/data/'            # path to the folder of your data 
account='my_account' # name of the project you registered 
sTIME='06:00:00' # how much time the process is supposed to run (keep it slightly higher than the max time you expect)
ram_memory=8000 # ram allocated to the process, 8000 MB here 
# to here ----> 

name_process='first_batch' # name of the process, it is a fancy and easy name to remember what your are doing
fname_execall='execall.sh' # file to manually execute the batch process  (e.g. on a shell "sh execall.sh")

def core_procedure(file_to_process,count,path_pyslurm_local):
    '''
        file_to_process         file name to process 
        count                   file number to process 
        path_pyslurm_local      path of the .py and .slurm files 
        
        note:
            The file "main_file.py" (below) should contain all variables and procedures to let "my_procedure" properly run on the cluster. 
            This can normally tested in advance on your computer before moving to the cluster.
    '''
    # names of the python and the slurm files
    fnPY=os.path.join(path_scripts_laptop,path_pyslurm_local,'run_%d.py'%count)     # the python file to run 
    fnSLURM=fnPY.replace('.py','.slurm')                	                  # the slurm file to run 
    fnPYcluster=fnPY.replace(path_scripts_laptop,path_scripts_cluster)        # the full path of the python file to run on the cluster 

    # generates the Python file to execute     
    f = open(fnPY, 'w')
    f.write('import sys,os\n')
    f.write('sys.path.append(os.getcwd())\n') # the folder of the main_file.py is appended to the global path (see my other post)
    f.write('exec(open(\'main_file.py\').read())\n') # load the main_file containing the procedures to launch
    f.write('my_procedure(\'%s\')\n'%file_to_process)
    f.close()
   
    # generates the SLURM file to execute
    f = open(fnSLURM, 'w')
    f.write('#!/bin/bash \n')
    # *.out - the log files to keep trace of the outcome of the runs 
    f.write('#SBATCH -o %s/%s%s.%s.%s.out  \n'%(path_scripts_cluster,path_pyslurm_local,'out.out','%j','%N')) 
    f.write('#SBATCH -D %s/%s  \n'%(path_scripts_cluster,path_pyslurm_local))
    f.write('#SBATCH -J %s  \n'%name_process)
    f.write('#SBATCH --get-user-env  \n')
    f.write('#SBATCH -p light\n')
    f.write('#SBATCH --nodes=1\n')
    f.write('#SBATCH -c 1\n')  
    f.write('#SBATCH --mem-per-cpu %d\n'%ram_memory)
    f.write('#SBATCH --account=%s\n'%account)
    f.write('#SBATCH --time=%s  \n'%sTIME)
    f.write('module load python3/intel/2019  \n') 
    f.write('cd %s \n'%path_scripts_cluster)
    f.write('python3 /gpfs%s \n'%fnPYcluster)
    f.write('seff $SLURM_JOBID   \n')
    f.close()

    return fnSLURM

def batch(file_list_name,path_pyslurm_local='slurm/batch1/'):
    '''
        path_pyslurm_local  subfolder of path_scripts_laptop to move to the cluster 
    '''
    # load the list of files to process 
    g = open(file_list_name,'r')
    file_list=g.read().splitlines()
    g.close()
    
    num_processes=len(file_list) # number of files/processes to execute 
    
    path_slurm_scripts_laptop = os.path.join(path_scripts_laptop,path_pyslurm_local) # execall.sh will be in the folder of the PY and SLURM files 
    if not(os.path.exists(path_slurm_scripts_laptop)): os.makedirs(path_slurm_scripts_laptop)
    
    fname_execall_fullpath = os.path.join(path_slurm_scripts_laptop,fname_execall)
    
    g = open(fname_execall_fullpath, 'w')  
    for count in range(num_processes):
        file2process=os.path.join(path_data_cluster,file_list[count])
        fnSLURM=core_procedure(file2process,count,path_pyslurm_local)
        g.write('sbatch %s\n'%fnSLURM.split(os.sep)[-1])   # os.sep yields the separator used for the path / (linux) \ (windows) 
    g.close()

    # the files are now ready
    print('')    
    print('Everything is ready now to be executed!')
    print('')
    print('1. copy the folder << %s >> and its content '%path_pyslurm_local.split('/')[0])
    print('     to: %s'%path_scripts_cluster)
    print('2. go to the folder %s'%os.path.join(path_scripts_cluster,path_pyslurm_local))
    print('     run the command: sh %s'%fname_execall)

Posted: **Mon Dec 20, 2021 1:48 pm**

Hello! I would like to explore the possibility of running a specific analysis using INDACO facility.
I have a large set (about 400 recordings) of electroencephalographic (EEG) potentials evoked by transcranial magnetic stimulation (TMS). These data, which have been collected from about 60 EEG electrodes, have been used to compute a synthetic index of brain complexity (PCI) that found useful clinical applications.
In order to possibly simplify the experimental setup, I would like to evaluate whether PCI can be successfully computed also from a smaller number of electrodes.
However, the performance of the pre-processing steps (e.g., independent component analysis – ICA) required to reduce common artifacts of ocular and muscular origin before the computation of PCI may vary with the number of channels. Thus, I need to re-run the same analysis previously performed on 60-channels, also on 32 and 19 channels. In particular, I would like to apply the same ICA decomposition, i.e. runica as implemented in the EEGLAB software package (https://sccn.ucsd.edu/eeglab/index.php) that runs within the Matlab® environment. Then, for each dataset, I would like to perform a manual selection of artifact-contaminated independent components by visual inspection of their time course and scalp topography.

I kindly ask whether this project can be performed exploiting the INDACO features and how to implement this analysis pipeline.

Thank you in advance and best regards,

Silvia Casarotto
Dept. Biomedical and Clinical Sciences "L. Sacco"

Posted: **Mon Dec 20, 2021 5:21 pm**

Hello Silvia,
I suggest to split this process in three parts:
1) you run the ICA on 19 and 32 channels for each recording and save some outputs (see also 2) ). This can be done extrapolating the relevant information of my previous post "Need to analyze a bunch of files!"https://forum.indaco.unimi.it/viewtopic.php?p=38#p38
2) since you need to perform a manual selection I suggest to download the outputs on your local laptop/PC for the post-processing part. Of course you can also consider to perform this operation on Indaco with a graphical interface, though I believe the first option might be more convenient for your goal.
3) the post-processed files will be used to compute PCI with the same strategy outlined before at point 1)

If you need some help to solve this please send me an email thierry.nieus@unimi.it

Best regards
Thierry Nieus

silvia.casarotto wrote: ↑Mon Dec 20, 2021 1:48 pm Hello! I would like to explore the possibility of running a specific analysis using INDACO facility.
I have a large set (about 400 recordings) of electroencephalographic (EEG) potentials evoked by transcranial magnetic stimulation (TMS). These data, which have been collected from about 60 EEG electrodes, have been used to compute a synthetic index of brain complexity (PCI) that found useful clinical applications.
In order to possibly simplify the experimental setup, I would like to evaluate whether PCI can be successfully computed also from a smaller number of electrodes.
However, the performance of the pre-processing steps (e.g., independent component analysis – ICA) required to reduce common artifacts of ocular and muscular origin before the computation of PCI may vary with the number of channels. Thus, I need to re-run the same analysis previously performed on 60-channels, also on 32 and 19 channels. In particular, I would like to apply the same ICA decomposition, i.e. runica as implemented in the EEGLAB software package (https://sccn.ucsd.edu/eeglab/index.php) that runs within the Matlab® environment. Then, for each dataset, I would like to perform a manual selection of artifact-contaminated independent components by visual inspection of their time course and scalp topography.

I kindly ask whether this project can be performed exploiting the INDACO features and how to implement this analysis pipeline.

Thank you in advance and best regards,

Silvia Casarotto
Dept. Biomedical and Clinical Sciences "L. Sacco"

forum.indaco.unimi.it

Simulatori neuronali e tools per l'analisi

Simulatori neuronali e tools per l'analisi

my program is not running because of missing files ...

Re: Simulatori neuronali e tools per l'analisi

Need to analyze a bunch of files!

feasibility with Matlab

Re: feasibility with Matlab