Matluster - Matlab on the Cluster
programming research matlab cluster ETH Zurich
We all know it, Matlab is not the cleanest programming language, yet many of us use it. Probably mostly because Matlab is quite efficient when it comes to prototyping new ideas. Once you convinced yourself that your new idea is working, you normally would like to submit your work to a scientific conference or a journal. But for this you probably need to perform some more extensive experiments. In machine learning you might have to average over different seeds or choose the regularization weight via cross-validation. Running the algorithm for all the different configurations might become quite time consuming. Wouldn't it be great if you could use a cluster for this? This post documents my best practices to run matlab scripts on a cluster and documents a set of scripts, called matluster, that I have developed to simplify this task.
For this post let's assume you have developed a new machine learning algorithm. You
want to evaluate the algorithm on several datasets for different seeds as your
algorithm involves some randomization. I generally split up the workflow into three matlab scripts:
filenames should be fairly self-explanatory.
A preparation script that generates a shell script to submit jobs for all the different configurations.
The main script which calls the machine learning algorithm and performs the evaluation, e.g. compute a test error.
The collection script which loads all the intermediate results and compiles them to a summary and/or generates plots.
I always compile the main script, this explains the strange call
./run_main.sh. I use variations of the
brutus_compile.sh script in
order to perform the compilation. Here we assume that
is set by the queuing system and corresponds to a local scratch folder.
The example code here might also help to understand the matluster scripts a bit better.
comments powered by Disqus