Below is a (by no means comprehensive) list of various tools and resources that could be useful for transitioning to more open and reproducible science, and specifically for switching from MATLAB to Python
Quick links:
Sharing Code and Data
Git/GitHub
BIDS format
Computing Resources
Learning More
Preregistration and Registered Reports
Python
Jupyter Notebooks
Miscellaneous Links
- If you have a bunch of python scripts and you’re not sure how to share them, this tutorial will guide you through the process of turning them into a shareable Python package.
- Docker – Enables you to create a container with the environment needed to run your code (system settings, dependencies, etc.) such that anyone with Docker installed can run your code and get the same results, regardless of their system. You can then share your docker container publicly on DockerHub.
- Singularity – Similar to Docker, but can be run in environments (like the CBU) that don’t support Docker due to security concerns.
- Neuordocker – Creates a docker container for you, with all the main neuroimaging software. This can also be used as a basis to create the specific docker container for your code.
- ReproZip – Since docker containers can very quickly become huge, this helps reduce the size of the container, packaging only the relevant components.
- Binder, repo2docker – Tools to build a container from a repository
- DataLad – Dataset portal and versioning system (like GitHub for data), with links to dataset hosted on other websites.
- Open Science Framework – A free research project management platform, including version control and data sharing (for smaller projects, such as behavioural studies or statistical maps).
- Neurovault – A public repository where you can place unthresholded statistical maps.
- OpenNEURO – A free platfrom for sharing neuroimaging data in BIDS format. If you agree to make the dataset public within 18 months, you can use free cloud computing to run various workflows such as fMRIprep and MRIQC.
- Code Ocean – A platform for sharing code in multiple programming languages (including MATLAB). The computing itself is cloud-based, so that users don’t have to install anything locally.
- NITRC – Has links to open datasets and tools, as well as a cloud-computing environment
- figshare – Cloud solutions to store, share and manage all possible research outputs (and you can get a DOI for the outputs).
Git/GitHub
Git is a version control system you can use on your own computer to keep track of changes to your code. GitHub is a platform to share code, using git for version control. GitHub has a very detailed help where you can find detailed information, but below are a few useful resources:
- A user-friendly introduction to GitHub with presentations and exercises.
- Useful solutions for fixing mistakes can be found here and here.
- Instructions how to remove sensitive data from your git repository.
- Tips for writing good commit messages.
BIDS format
BIDS is a standardised way of organising neuroimaging data. Formatting data in BIDS makes it easier to share, and enables use of tools that expect data to be in BIDS format.
- The BIDS Starter Kit helps learn how to organise data in BIDS format.
- Dcm2Bids is a useful tool for converting raw data (dicoms) to BIDS format. Within the CBU, cbu2bids is based on Dcm2Bids, but it sets specific parameters for running on the CBU imaging system.
- Once the data is in BIDS, it can be read into a Python data structure using PyBIDs.
- BIDS Apps are various tools you can use to analyse data in BIDS format, including fmriprep, a preprocessing pipeline, MRIQC for data quality control, and BIDSonym for de-identification of anatomical data.
- A few options for cloud computing – AWS (Amazon Web Services), NITRC, Open Science Grid (less generic but free).
- AWS is costly, but has research credits for specific types of research projects
- Several open datasets, such as the Human Connectome Project, are in S3 storage – which can be accessed through AWS. EasyHCP is a python package to help access and handle HCP data on AWS.
- Neurohackademy – A highly-recommended summer school on open neuroimaging. The course website includes most of the course materials, including videos of presentations.
- Online book on reproducible research
- Neurostars – A forum to ask questions about neuroimaging analysis, and specifically about many of the tools listed on this page.
- ReproNim – An initiative to enhance reproducibility in neuroimaging computation, including online modules to learn about reproducibility and training events.
- Software Carpentry and Data Carpentry – Free online lessons and workshops to teach researchers computing skills and data managment.
Preregistration and Registered Reports
Preregistration has been suggested as an important way to combat publication bias, p-hacking and analytic flexibility without preventing exploratory research. Below are some useful resources:
- Some background
- A ‘how to’
- The easiest platform for preregistration is aspredicted
- A more thorough platform for new studies is the Open Science Framework with an example here
- A useful template for preregistration of fMRI studies
- Many journals now accept ‘registered reports’ – in other words, you submit the preregistered plan and paper outline, which is provisionally accepted solely on basis of the methods, not the results. Some resources here
Useful packages for neuroimaging:
Numpy | MATLAB-like arrays in Python |
Scipy | Various tools for scientific computing in Python, including the Scipy library itself (providing numerical functions), pandas for data structures and scikit-learn for machine learning. Scipy also includes a good stats package (scipy.stats) and memory-efficient sparse arrays (scipy.sparse). |
statsmodels | R-like model estimation |
Matplotlib | Widely-used plotting library. A useful feature is the plotting gallery, where you can visually search for the type of plot you’re looking for and see the code that generates it. For higher-level visualisation, seaborn makes it easier to create pretty and informative statistical plots (see gallery). |
itertools | Part of the Python standard library, useful for efficient iteration |
PyMVPA | Multivariate Pattern Analysis tools (with many types of classifiers and support for searchlight analysis) |
Nipy | Both a library called Nipy, for analysis of neuroimaging data, and a collection of projects devoted to analysis of neuroimaging in python (including nipypye and nibabel listed below). |
Nilearn | Machine learning for neuroimaging, as well as convenient data manipulation and nice plotting functions for neuroimaging data. |
Nipype | Enables running algorithms from many different neuroimaging packages in one workflow, using a uniform interface. Porcupine is a graphical interface for building Nipype-based pipelines. |
Nibabel | Reading and writing of common neuroimaging file formats (including nifti) |
PySurfer | A library for visualizing cortical surface representations of neuroimaging, primarily for used with Freesurfer. Instructions to facilitate installation with Python 3.6 can be found here. |
PyBIDS | A library for interacting with datasets in BIDS format |
pydeface | A tool for defacing nifti images. |
Keras | A deep learning library |
pymatbridge | A python interface for calling MATLAB. There’s also an IPython extension called matlab_magic for using pymatbridge in an IPython notebook |
Documentation and Tutorials:
- An interactive Python tutorial to help get started with Python
- An online book on data science in python
- A scikit-learn tutorial can be found here
- Beginners guide to Nipype
- A Keras tutorial can be found here, and and example of running it on MRI data can be found here.
- Useful documentation for strings and regular expressions – A description of the new formatting of strings in Python 3.6, a tool to help build and test regular expressions, and an interactive tutorial on regular expressions in python.
- Python Tips and Tricks
Switching from MATLAB to Python
- A translation of MATLAB commands and logic to numpy arrays can be found here
- A Migration guide from MATLAB to Python, by Enthought
- A style guide for Python code
- Using scipy.io you can load and save MATLAB mat files.
- Spyder is a convenient MATLAB-like development environment (with an editor, a console and dubugging tools). Another editor worth checking out is Visual Studio Code.
Jupyter Notebooks
Jupyter notebooks, a part of Project Jupyter are documents that combine text, live code and visualisation of results. Using JupyterLab (which is replacing Jupyter Notebook), You can write blocks of code within the document and run them interactively, viewing the results within the same document. These documents are especially useful for sharing code and for starting out with python, as you can try things and immediately see their results. A few useful tidbits:
- A demonstration of how to use Jupyter Lab
- Jupyter Notebooks aren’t only for python – you can also combine R within the same document (and you’ll have access to the variables created in Python), as well as shell, by adding the relevant kernels.
- The text parts of the notebook are written in markdown (formatted text, a bit like html, but simpler). A useful guide with tips for markdown can be found here
- A Jupyter Notebook debugger
- Colaboratory is a Jupyter notebook environment that runs on the cloud and doesn’t require any setup. You can use it to open jupyter notebooks directly from github by replacing “http://github.com/” with “http://colab.research.google.com/github/” in the URL of the notebook.
Miscellaneous Links
Below are some useful links for neuroimaging, not particularly related to open science.
- Online machine learning book
- Online deep learning book
- A beginner’s guide to convolutional neural nets.
- NiftyNet – A platform for convolutional neural nets in medical image analysis.
- Stanford course on convolutional neural nets for visual recognition
- Visualisation of the performance of different clustering methods (in scikit-learn) on various artificial data
- brms – Bayesian modeling package for R
- The Virtual Brain – A brain data simulator
- CANlab – collection of tools (primarily MATLAB-based) for interactive neuroimaging analysis.
- Mindboggle – Software that combines Freesurfer and ANTS to optimise analysis of structural data.