Short instructions to Jyväskylä Clusters puck (frontend puck.it.jyu.fi) and oberon (frontend oberon.it.jyu.fi)
Version 0.2

    
Queue batch system Slurm
========================
 see  https://research.csc.fi/documents/48467/85840/taito_user_guide.pdf
 (general CSC computing information page https://docs.csc.fi/computing/overview)

 - A job needs a batch script file
   Sample batch scripts:
   
A single core job
-------------------------------------
#!/bin/bash
#SBATCH -J testjob
#SBATCH -o testoutput
#SBATCH -n 1
#SBATCH --ntasks=1
echo "Current working directory is `pwd`"
echo "Running on `hostname`"
### load modules that your program needs 
## module add puck_...
### run program "myprogram"
## myprogram 

## test command:
uptime 
-------------------------------------

A 48 core job, 2 nodes in puck using OpenMPI
--------------------------------------
#!/bin/bash
#SBATCH -J testjob
#SBATCH -o testoutput
#SBATCH -e erroroutput
#SBATCH -n 48  
## reserves 2 nodes = 48 cores ; same as #SBATCH -N 2
## load modules the program needs 
## module "puck_openmpi" tells communication to go via fast 10.0.40.xx network 
module add puck_openmpi
## don't specify number of tasks, mpirun knows you reserved 48 cores and does "mpirun -np 48"
# test command is hostname:
mpirun hostname	
---------------------------------------

=============================
 - Contents of a script file     
     lines beginning with #SBATCH  are Slurm instructions 
     lines beginning with ## or ### are comments
     
       Description of some slurm instructions:
       ----------------------------------------
     #!/bin/bash        : run the job commands in bash shell
             (this is done only when the job actually starts execution, not before)
     #SBATCH -J mytest        : job name is "mytest"
     #SBATCH -o out           : output to file "out" in the working directory
     #SBATCH -e errorout      : error message output 
     #SBATCH -n 1             : serial job, reserve a single core, same as --ntasks=1
     #SBATCH -n 4             : parallel job, reserve 4 cores, same as --ntasks=4
     #SBATCH -N 1             : reserve one node (24 cores), same as   --nodes=1 
     #SBATCH --time=45:00     : optional; jobs takes max 45 minutes (it will be killed if it's not finished in 45 minutes)

- Send the job to queue. The following command puts the job in script file "slurm.job" to queue: 
     sbatch slurm.job
   in return, you get a job number, "jobid"    
- Monitor current queue situation, "R" means running, "PD" means pending
     squeue
 - Graphical 
     sview
 - View the list of free/occupied nodes and their queue assignments
     sinfo
     sinfo -l    ; more information
     sinfo -a    ; view all queues, also ones you have no right to submit to
     
 - Job queues: just one (old test queue is no longer available)
     normal        default queue, max 3-day jobs (sinfo shows TIMELIMIT 3-00:00:00)
     

- Cancel the job :
    scancel jobid


Environment modules
===================
 Modules set up the correct environment for you job, all paths to executables and libraries.
 A module can set up an environment for using a specific version of a program.
 All but the system libraries and compilers are loaded as modules.

 Modules made specifically for puck are named
   puck_name

  some module commands:
    module avail           : list of modules 
    module load puck_gpaw  : load the default GPAW environment, specifically made for puck
    module add puck_gpaw   : same thing
    module list            : list of modules that are currently loaded
    module purge           : unload all modules 

 Example: command
        gcc 
     will invoke the usually quite old system gcc compiler.  
     You want to use the GCC version 6.1.0, do
         module add puck_gcc/6.1.0
    then the compiler and its libraries are visible,
         gcc --version
      returns
         gcc (GCC) 6.1.0
	 Copyright (C) 2016 Free Software Foundation, Inc.
	 This is free software; see the source for copying conditions.  There is NO
	 warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
        
        
 Some programs are licensed to a certain group of users.  
   

Things not to do
================
 - Don't run jobs in the login node - except very short data analysis/plotting 
 - Don't run jobs interactively, run only through the queue system Slurm
 - Don't run jobs in your home directory /home/$user,  use your work directory /n/work00/$user
 - Don't reserve more resources than you really use

Multinode parallel jobs
=======================
 - Always use network 10.0.40.0/24 ; for OpenMPI jobs load module local openmpi module

A good idea is to 
=================
	
 - Monitor that your job is actually running.
    * Use "squeue"
    * If in doubt, see what nodes the job uses and login to one of them using ssh
      Try "top" to see if your processes consume 100% of the cpu
       ssh ssh compute-0-10   (if jobs run in node 10)
       top