Chapter 5 Research practices & resources

5.1 Data management and coding

The Gaynor Lab is committed to principles of reproducible science. Our primary coding language is R, although you are encouraged to explore other tools that better fit your needs (and teach the rest of us!).

All data cleaning and analysis should be conducted in scripts. There will be a learning curve at first, and you’ll be tempted to open up your data files and manipulate them in Excel, but the time investment in learning how to wrangle data in R will pay off in the long run.

Code should be complete and well-documented, including information in a README about what each file does and the workflow to run the code.

5.1.1 GitHub for project management

We use GitHub for managing research projects. We have a lab GitHub organization, which you should join upon entry to the lab. You may create repositories in your personal GitHub pages for regular thesis research work, but all repositories should be forked to the organization upon project completion or hand-off.

In general, one repository should correspond to one thesis chapter/publication, although there are cases where lumping or splitting repositories makes more sense. Create your project repository as soon as you begin managing your data.

You are encouraged to regularly commit all of your code to GitHub, on a daily basis. If it works for you, treat your commit messages and GitHub issues as your lab notebook. This may feel vulnerable, especially for those learning to code, but this is a judgment-free zone and there is no better way to learn than to code collaboratively. The use of GitHub makes it very easy to share feedback with other lab members on coding and analysis.

By the time of completion, each project should have an easily found README text file that provides information for others so they can navigate and use your work, and give contact information for authors (and any data creators/use restrictions if proprietary data). Ideally, the README should also include links to publications and presentations from the work.

5.1.2 Data management

Data used in support of your projects should be:

Saved in appropriate, non-proprietary format with accompanying metadata
Either in a public archive (e.g., the github repo or another public archive, like Borealis), or if data is proprietary, a ‘snapshot’ version of the data used in the project should be saved in a private repository accessible to lab members.
Linked and briefly described in the project README.

The average life expectancy of a hard drive is less than the duration of most graduate programs. Thus it is critical to ensure your data and work are backed up regularly. You may have personal backup solutions (e.g. through Dropbox, Google Drive, etc.) but the lab has dedicated storage space on a university server that is backed up in multiple locations. Your data should be backed up on here. Include information on how to do this

5.1.3 Resources for learning R

From the Faylab Lab Manual

Welcome to R! There are so many learning resources out there it can feel a little overwhelming in terms of what to choose! To get you set up and started, Happy Git and Github for the useR is a great resource, and is created by the same authors as the now-legendary stat545.com course. Their resources are extremely comprehensive and they have a fantastic Intro R course, especially for those who will be using R for doing statistics.

Right now, we REALLY like this short intro course by @juliesquid & @allisonhorst. They teach a lot of the workflow and tools around using R right from the get go, which we think is more helpful than knowing how to do all the things. The (excellent) materials are thoughtfully put together, link to a ton of other great resources, and just like the online #rstats community in general, are super supportive of new learners. We also really like the Teacup Giraffes R and statistics materials from Desirée De Leon and Hasse Walum.

If you want a book to work from/through, R for Data Science is highly recommended. (book is free online)

The R Studio Education team have assembled a phenomenal array of courses, tutorials, and other materials for learning R, and for many types of data analyses and modeling using R. These are top notch and so thoughtfully put together. There is something for learners at all levels.

A plug for the online R community. Follow the hashtag #rstats, & also check out accounts @RLadiesGlobal & @R4dsCommunity. The weekly #TidyTuesday social coding project is also a great way to practice your growing R skills. Have fun!

The R4dsCommunity Slack is a briliant resource for getting help to your R questions and finding tutorials. They also hold online Office Hours where you can get help with R from a real human.

The RStudio Community is a great go-to.

Learn R from within R with interactive sessions using swirlstats.com.

The carpentries R lessons are also a fab resource.

5.1.4 Resources fo spatial data in R

Here is an easy-to-use guide for working with spatial data in R: Geocomputation with R

Also, check out this new book on Spatial Statistics in R

5.2 Writing and reading practices

5.2.1 Writing process

Everybody has their own writing process, and we won’t micromanage each other, but there are a few principles and practices that will greatly facilitate the collaborative writing process.

The Gaynor Lab is a safe environment for sharing drafts of in-progress material. You should share drafts of materials in an imperfect state! That said, don’t waste your labmates’ time with unreadable material.

Before you commit to writing a chapter or manuscript, you should share an outline with Kaitlyn and other coauthors. This outline should include a breakdown of the topic of each paragraph, and ideally drafts (even hand-drawn!) of the key figures. We will then discuss the outline and decide what elements or analyses are still needed to finalize the story. You can expect me to provide general feedback on what would be the best story we can write with the data we have, or what data we may need to add. It is always easier to finalize the contents and story in the outline process than once the manuscript is already written.

All co-authors should have an opportunity to weigh in at the outline and draft stages, and must provide approval on the final manuscript prior to submission.

Kaitlyn and your co-authors will provide guidance on target journals and the submission process, including writing cover letters and addressing reviewer feedback.

Oftentimes, when sharing your draft, your readers will leave you feedback as comments on the document. For substantive comments, you should ideally handle them as follows:

do your best to address the comment by revising the document (if you agree)
respond to the comment and let them know how you addressed it (or why you didn’t address it, if you disagree)
let the person who left the original comment resolve/delete it when they review the next draft

This is helpful for when people (especially Kaitlyn or your coauthors!) revisit the draft later on and see how their feedback was addressed, like a “response to reviewers.”

5.2.2 Reference management

Find a reference management system that works for you, if you don’t already have one, and start using it early in your time in the Gaynor Lab to organize papers as you read them.

We will likely use Zotero for shared group writing projects, as it is free and easy to use, but you may find another system you prefer for your own research.

It is a good idea to develop a system for staying on top of the relevant literature in your field and subfields, which may include the following practices:

Set Google Scholar alerts for keywords or authors
Create an RSS feed for journals in your field
Subscribe to e-mail updates for journal tables of contents

5.2.3 Writing checklist

Here are some things to check for in your writing. Please cross-check this list when sharing a near-finished draft of a document (paper, proposal, etc.) for lab review, or Kaitlyn will send you to this list!

Don’t use the word “this” without qualifying what you are are talking about (or Kaitlyn will comment, “this WHAT”?)
Limit use of adverbs and “filler” words
Use active voice, rather than passive voice
Be consistent in your use of tense

5.3 Hypotheses and Predictions

Written by Kaitlyn

Developing good scientific hypotheses and predictions is one of the most important aspects of doing good science, and is also one of the most challenging. There is a lot of guidance out there about how to do this well, and here I am sharing one way to approach it. If it works for you, great, but it’s certainly not the only way to think about hypotheses and predictions!

5.3.1 Definitions

A good scientific hypothesis arises from ecological theory. A good hypothesis is mechanistic at its core: it not only articulates an expected ecological pattern, but provides an explanation for why you expect to see that pattern (sometimes this explanation comes in a separate sentence, rather than in the “hypothesis” itself; but it should be articulated somewhere in your introduction or proposal, and lead right into your hypothesis). A good hypothesis is generalizable across systems, and is not specific to a particular place or taxon. A good hypothesis is interesting. A good hypothesis will lead to good predictions (see below).

Example: [You set up some background on how predator hunting mode and prey anti-predator defenses interact to determine the spatial responses of prey to predation risk - clearly explain the “why” behind the hypothesis.] In this study, I will test the hypothesis that ambush predators generate predictable patterns of risk, which then shape spatial patterns of proactive prey anti-predator behavior.

A good prediction states what you would expect to see (often in a given study system), based on your hypothesis. A good prediction is clearly testable—you can measure and quantify the relevant variables, and use analytical approaches to quantify relationships among them. A good prediction is feasible to test.

Example: I predict that waterbuck will have larger group sizes and be more vigilant in areas with higher lion predation risk.

You may also find that you want to articulate multiple hypotheses and predictions in a single study/chapter, so you have room to clearly articulate the “why” behind your hypothesis, and the step-by-step assessment of your predictions. Sometimes they build on one another.

Example:

H1: Ambush predators generate predictable patterns of risk, associated with cover that they use to surprise their prey. P1: Lion kill sites are more likely to be in areas with taller grass. H2: Gregarious prey species rely on conspecifics for the detection and dilution of predators. P2: Waterbuck group size will be higher in areas with higher modeled lion risk. H3: Prey use vigilance to detect predators, and therefore they adjust their vigilance in response to their perceived predation risk. P3: Individual waterbuck will have higher vigilance in areas with higher modeled lion risk, and in smaller groups, and there will be an interaction between lion risk and group size on vigilance.

Note that you may end up with a single hypothesis generating multiple predictions, or multiple hypotheses feeding into a single prediction. In the latter case, it isn’t ideal if there are multiple possible explanations for a given phenomenon, as testing the prediction will not provide clear evidence for a given hypothesis. However, you may have cases where a certain prediction will only hold IF multiple hypotheses are true—then you CAN gather evidence for multiple hypotheses at once by testing a single prediction.

5.3.2 Developing interesting and feasible projects

The process of articulating hypotheses and predictions will allow you to ask yourself the following questions: Is my hypothesis interesting? Is my prediction feasible to test?

Sometimes people start with a hypothesis like this: I hypothesize that waterbuck avoid areas with higher lion activity. By my definitions above, this would be a prediction, not a hypothesis. If you were to articulate the corresponding general mechanistic hypothesis, you may find it isn’t all that interesting: Prey adjust their behavior to avoid being eaten by predators.

Sometimes people end the process without getting all the way to testable predictions: I hypothesize that waterbuck trade off foraging opportunities with predation risk, and this trade-off varies by individual. But then if you actually get to the testable prediction—maybe something like, Waterbuck reduce their avoidance of areas with high lion activity when those areas also have higher-quality forage, and this interaction is strongest for animals with a higher body condition index—you may realize that you don’t have a way to easily quantify forage quality throughout the study area at the relevant scale, nor measure the body condition of animals. (Or maybe this gets you brainstorming about how you might be able to do so!)

Ideally, you are aiming for something that is both interesting, and feasible. Sometimes, you may end up with some “plan B” ideas—perhaps not the most interesting, but clearly feasible. And sometimes, you may end up with “high risk, high reward” ideas—very interesting, but perhaps not totally feasible. You should probably have a mix of these types of ideas in your early drafts of your PhD proposal.

A note: Good statistical hypotheses are not the same thing as good scientific hypotheses. A good statistical null hypothesis is specific, and “boring” (it would be interesting to reject). At its core, it proposes something about a population parameter. (i.e., “The impala habitat selection beta coefficient does not vary with modeled lion activity.”) An “alternative hypothesis” in the realm of statistics would probably make for a good “prediction” as defined above.

5.4 Figures

If you’re looking for vector silhouettes of an organism, check out PhyloPic! All available for reuse.

5.5 Funding

Apply regularly for research funding! Not only will it potentially bring in money to support additional data collection, but it will also help you refine your ideas and develop important skills (grant-writing, budgeting, and science communication).

Here are some grants to consider (initials represent Gaynor Lab members who have received them). Please add to the list, and share opportunities with the rest of the lab via Slack!

Rufford Foundation (KG)
Idea Wild (KG)
Explorers Club (KG)
National Geographic grants
BRITE internship (BRC internal grant; JG)
Animal Behavior Society student grants (KG)

5.6 Writing peer reviews

Peer review is an important part of science, and you will be given opportunities to jointly peer review manuscripts for scientific journals during your time in the Gaynor Lab. Writing peer reviews can help you to become a more thoughtful, engaged, and informed scientist, and lead to improvements in your own work while giving back to the scientific community and shedding light on the peer review process from the other side.

Some good resources for peer review include: