Resources
Contents
- Information technologies for life science labs
- Modern lab notebooks
- Time management
- Resource management
- Community Support
- Scientific literature
- Machine learning/Deep learning
- Bioinformatic resources
- R, RStudio and tidyverse resources
- How to select the appropriate chart type
- Colors and fonts
- Markdown
- Citations and bibliography
Information technologies for life science labs
Google Drive An excellent and convenient collection of tools (documents, images, spreadsheets, presentations) with lots of storage space. This is cloud-based so teams can work on documents simultaneously.
Zotero It is really important to develop your library of bibliographic references, but it can certainly be challenging to keep all the references organized. Also, it is time consming to manually enter all the details of reference. There are many good tools out there; we tend to use Zotero as it is well integrated with Google Docs (for text files).
Google Colab A promising cloud-based approach for a team to program together. It doesn’t currently have great support for \({\tt R}\) but perhaps in the near future(?). It is a good option for \({\tt Python}\) currently.
Google Cloud Whereas Google Drive is for storing files and developing documents, Google Cloud provides as much CPU, GPU, and TPU power as you want or can afford. There are other options including \({\tt AWS}\) from Amazon and \({\tt Azure}\) from Microsoft, amongst others.
Note that RStudio Cloud is a kind of restricted combination of Google Drive, Colab and Cloud. It suffices for the teething stage while learning bioinformatics and biodata science.
Slack We will use this non-stop in the course. There are alternatives including open source efforts but for various reasons we choose \({\tt Slack}\) for all of our laboratory-related work. Any of these tools are far far superior than email (yikes).
Git The course will touch on GIT, a very important piece of software that provides version control. GIT tracks all the files in a project as they are modified by different members of a team. It allows you to see the entire history of all modifications to any file and to “roll back” the files if necessary. Once your project is stored in GIT, it’s basically impossible to lose it. Our entire course is maintained under GIT and you have full access to it through GitHub (see next entry).
GitHub GitHub is a cloud resource providing facilities to house and share code persistently and consistently. It is tightly assocaited with GIT. There are many alternatives. We use both GitHub and BitBucket.
Modern lab notebooks
These are interesting short articles on different approaches towards keeping lab notebooks, and the pros and cons of electronic laboratory notebooks (ELNs) versus traditional paper notebooks. E Pain (2019) Science and R Kwok (2018) Nature.
I personally use a tool called Goodnotes. Others in my lab use Evernote. Although I haven’t used it, I hear very good things about Notion. This is software that really integrates many tools into one coherent framework. If everyone in a group is using the software, it is very easy to share notes, images, calendars, and other common information. As it is cloud based, the information is available anywhere to all members of the group.
Time management
There are many software packages for designing, executing and managing large projects that involve many people. Most of these packages are built for companies. However several are particularly useful for modern life science labs.
Trello This is a project planning tool that is quite handy. It is not life science specific and could be used for almost any time of proect or event that is complicated to plan. Some members of my lab use it.
Labscrum This is really more of a technique than softare per se for conducting academic scientific research. It is a way of organizing a group, and provides a method to assist in tracking progress of projects and researchers. Our group does not use this, but it looks interesting.
Toggl This is an app that allows you to manage your time. Basically, you can keep track of how long you spend on different tasks during the day. Many private consultants use this type of tool to record how long they spend on a client’s project for billing purposes. It is also useful if you have trouble managing your study and research time.
Resource management
Especially with wet labs, a consistent effort is needed to order, track and invoice reagents, kits, and other expendables. Moreover, there is a need to precisely and accurately handle samples. For instance, as we process samples donated by cancer patients, we have a moral obligation to ensure that the material is not lost or mislabeled. Our lab certainly finds it challenging to maintain a “chain of custody”. We have experience with several software systems.
Quartzy and Benchling are both nice software systems, each with their pros and cons.
Snipe-It We recently installed this system as a way to generate QR- (or classic bar-) codes. Basically, you can print out QR code stickers and associate each QR code with one item (reagent, buffer, sample, computer, eraser) in your lab. You need only a cell phone with a camera. You can purchase cheap special sticker paper that survives autoclaves and -80 freezers.
Community Support
Finding on-line communities to answer questions as they arise is super imporant.
Stackoverflow This is an amazingly useful website that you will discover many times when asking questions especially with respect to computers and IT issues. It is community driven, meaning that people voluntarily answer each other’s questions and provide advice from the very mundane to the most technical challenges.
Biostars If your question is bioinformatic in nature (eg what software to use? how does a particular piece of software work? how do I design my experiment optimally? etc.), the Biostars website and community is excellent.
RStudio Community If your questions are about operating RStudio or its products (incl. Shiny, blogdown) specific. Not generally the place for assistance with coding in \({\tt R}\).
Scientific literature
It is difficult to keep up to date with new publications in your area(s) of interest. In addition to standard Google searches, I use some of the following strategy each day.
NCBI’s PubMed Provides access to almost all scientific publications.Note that Western has subscriptions to many (non-open-source) journals. You need to connect to the VPN (instructions here).
The NCBI has a tool where you can receive daily updates (by email) of all abstracts containing specific keywords.Each morning I troll through these emails for relevant papers.
bioRxiv preprints It is becoming increasingly common to submit manuscripts before publication to the preprint servers such as \({\tt bioRxiv}\). In fact, my group finds that there is quite often a year’s delay between when a preliminary (unrefereed) version of a manuscript appears as a preprint and when it appears in a journal in its final version.A problem with preprint servers is that it is like drinking from a fire hose. It’s hard to zero in on the papers you want to see.
Twitter specifically #academictwitter In addition to a platform that allows questionable right wing politicians to tweet questionable policy decisions, Twitter cannot be ignored as a valuable tool for following your scientific community.I spend a considerable amount of micro-breaks following links to tweets of interesting look papers. Also, I find that Twitter allows you to expand out a bit and see groups/papers you might otherwise not run into.
I highly recommend that each student open a Twitter account for their “professional” purposes and follow members of your community. This includes journals and preprint servers who regularly tweet new papers.I often rely on journals to tweet new papers rather than going to the journal’s webpage or having them send me an email (yikes).
Google Scholar Not surprisingly, this is a very powerful search engine for academic groups and publications. I recommend every student to establish a profile at Google Scholar.
Otherwise, I greatly rely on my group to post interesting articles to our \({\tt Slack}\) workspace and to give mini-presentations of potentially interesting papers in our lab meetings.
Machine learning/Deep learning
FASTAI An incredible course that teaches deep learning in a very accessible manner for non-mathematicians. Prof. Jeremy Howard’s teaching approach is unique and effective.
PLAIDML If you like Macs and you want to use your AMD GPU to apply deep learning, this is the place to start.
Keras for R is built on Python-Keras which in turn is built on Tensorflow, all within the comfort of RStudio.
Towards Data Science is part of my daily regimine; it is a nice accessile way to keep up on developments in data science.
Bioinformatic resources
National Center for Biotechnology Information (NCBI) A huge resource from the USA that houses a vast spectrum of biological data and datasets, and bioinformatic tools.
European Molecular Biology Labs (EMBL) and EMBL-European Bioinformatics Institute (EBI) are the European analogs to the NCBI with similar mandates and scope.
Joint Genomes Institute (JGI) This is US Department of Energy, Lawrence Berkeley lab intiative that has really excelled recently in many different domains. My lab uses tools such as MycoCosm often for fungal (microbiome) studies.
There are many other databases, often with very specific mandates. For example, many organisms have their own database Saccharomyces cerevisiae Genome DataBase (SGD) and many human diseases have specialized resources. For example, these are two important cancer resources:
The Cancer Genome Atlast (TCGA) program. TCGA, which is USA based, contains high-throughput profiles for several large patient cohorts across different cancer types. The availability of multi-modal high-throughput profiles conducting in a uniform manner across these cohorts opens up many analysis opportunities.
International Cancer Genome Consortium (ICGC) is similar to TCGA but an international consortium across more types of cancers.
R, RStudio and tidyverse resources
There is now likely an uncountably infinite numberwell perhaps not that many but …
of resources for learning R, RStudio and assocaiated packages and extensions.
The R Language This is the main website for the core R language. There is a tonne of useful information including manuals, books and other technical information. You can download (for free) R from this site (if you would like to set up R independently of RStudio Cloud).
The R Manual If you would like to learn the basics of R in a more traditional approach (compared to the data science focus used in this course), this manual is a good place to start.
R and RStudio cheat sheets: A large collection of simple cheat sheets for RStudio,
ggplot2
, and other R-related things. RStudio also has an excellent series of webinars covering almost all aspects of R and RStudio. I highly recommend these easy to follow videos.Bioconductor is a collection of packages and tools for the biosciences. It is an open source effort. Although not strictly limited to R, Bioconductor is certainly well integrated with R.
Introverse is an R package that provides alternative help and instruction for base R and the \({\tt tidyverse}\).
DataCamp has excellent R tutorials where lectures are intermixed with hands-on programming exercises similar to the \({\tt LearnR}\) modules you will experience in this course. You might consider trying the \({\tt tidyverse}\) tutorial that is very high quality in my opinion. There are many competitors to DataCamp, some of which are free. DataCamp allows for limited free access.
Stat 545: Dr. Jenny Bryan at the University of British Columbia has an entire introductory course in R, visualization, and data analysis online.
How to select the appropriate chart type
I find the collections below very useful: when I want to make a plot, I typically search through hundreds of images to find the most suitable/exciting.Thank you to A Heiss for compiling this selection.
Here are some of the best:
- The Data Visualisation Catalogue: Descriptions, explanations, examples, and tools for creating 60 different types of visualizations.
- The Data Viz Project: Descriptions and examples for 150 different types of visualizations. Also allows you to search by data shape and chart function (comparison, correlation, distribution, geographical, part to whole, trend over time, etc.).
- The Chartmaker Directory: xamples of how to create 51 different types of visualizations in 31 different software packages, including Excel, Tableau, and R.
- R Graph Catalog: R code for 124 ggplot graphs.
- Emery’s Essentials: Descriptions and examples of 26 different chart types.
Colors and fonts
- ColorBrewer: Sequential, diverging, and qualitative color palettes that take accessibility into account.
- Google Fonts: Huge collection of free, well-made fonts.
- The Ultimate Collection of Google Font Pairings: A list of great, well-designed font pairings from all those fonts hosted by Google (for when you’re looking for good contrasting or complementary fonts).
Markdown
The slides and website for this course were developed in Markdown, specifically a dialect deigned for R (\({\tt RMarkdown}\)). Fluency in Markdown is a useful skill to have as more and more tools and information migrates to the cloud.
[^markdown]
- The Plain Person’s Guide to Plain Text Social Science: A comprehensive explanation and tutorial about why you should write data-based reports in Markdown.
- Markdown tutorial: An interactive tutorial to practice using Markdown.
- Markdown cheatsheet: Useful one-page reminder of Markdown syntax.
Citations and bibliography
You can open the file in BibDesk on macOS, JabRef on Windows, or Zotero or Mendeley online. I personnally use Zotero with Google Docs.
You can download a BibTeX file of all the non-web-based readings in the course.