Learning Objectives:
This assignment is motivated by a desire to better understand the nature of phenotypic variation associated with bananas, and the sources of said variation. From a learning objective perspective, this project will demonstrate your ability to i) manage data ii) perform a variety of different statistical tests iii) interpret the output of such tests iv) plot beautiful, meaningful figures/infographics v) present your findings to an interested audience. The required format for this assignment will be a narrated slide-deck, which is described in a later section of this document.
Project motivation / Research Questions
Bananas (Musca spp.) are a fruit consumed worldwide and the fourth most important food crop (after rice, wheat and maize) in the world (Prabha & Kumar 2015). Humans has been cultivating bananas for at least 8000 years (and possibly over 10,000 years)! Bananas are grown in over 150 countries each year with over 100 million metric tonnes of fruit being produced ear year. The most commonly internationally exported type of banana are those from the “Cavendish” cultivars belong to the AAA genome group, which includes the cultivars that have three sets of chromosome inherited from the wild (ancestral) species Musa acuminate. The bananas develop from flowers that grow on stems of this large, herbaceous (non-woody) plant. Bananas make up
~9% of the >6.5 billion dollars of fresh fruit imported in to Canada, (Agriculture and Agri-Food Canada, 2017), with ~ 15.70 kg of bananas consumed on average by each Canadian annually. Given their ubiquitous nature, bananas provide a great opportunity for us as biologists/statisticians to explore a number of different questions that are of great relevance from both biological and economic perspectives.
In this project, you use your own individualized banana dataset, and analyze it using the most appropriate statistical analysis techniques. The questions you will be trying to address are as follows:
1) How can we efficiently describe a banana’s phenotype? Does the phenotype of bananas depend on where (or) how they are bought?
2) Are there differences in the characteristics of groups of bananas that have come to Canada from different countries (or) by different companies? How much variation is there between/within these groups?
3) What is the relationship between a banana’s length and its width? Does it differ between those bananas on the inner row and the outer row?
4) Do the conditions under which bananas are stored affect how much it ripens over the course of a week?
How Will I Do this Assignment?
You should begin by visiting the course MyLS page to retrieve your personalized/individualized dataset (in csv format). The data therein contains a subset of the complete data collected by you and your classmates. The subset was created by me (using R of course) using a simple random
sampling technique to create a unique dataset for your personal use (i.e. everyone will have a slightly different set of data). To further individualize each of your assignments, the response and explanatory variables you will be analyzing will depend on when you were born & your student ID). Please consult the table below to determine which response variables you will be using.
Question 1: How can we efficiently describe a banana’s phenotype? Does the phenotype of bananas depend on where (or) how they were bought?
1. Using your individualized dataset, perform a Principle Component Analysis using the following variables: i) Outer curve length ii) Inner curve length, iii) Pedicel length, iv) Banana Circumference, v) Pedicel circumference, and vi) Number of sides.
2. Report in a table the loadings and % variance explained by the first 3 principle components. Depending on your birth month (see below), describe ONE of the principle components via your interpretation of its loadings
3. Using your re-scored Principle Component scores from your chosen component, as the response variables, determine statistically whether or there is a difference in the mean (or) median scores for bananas of the bananas bought in Kitchener-Waterloo or outside of K-W. Remember to first determine whether or not, for each of the two samples, whether the variables being compared have a normal distributions. Transform, in an appropriate fashion when the data is not be normally distributed, and determine whether the transformed data is/is not normally distributed. Also remember to first test whether the variances in the two samples are equal and choose the most appropriate parametric or non-parametric statistical test based on this information to make the comparison of means (or) medians.
4. Calculate the (appropriate) effect size statistic (& 95% CI) of this difference
5. Perform a two-sample power test to determine what the current power of your one- sample test is as well as determine the minimum sample size for a one sample test that would be necessary to detect a difference of this magnitude (with power = 0.8).
6. Using the graphing functions of R, create a simple box plot graph that summarizes the comparison being made in step 4 above.
Question 1 |
Variable used to describe banana “size” |
Birthday in January, February, March or April |
Principle Component 1 |
Birthday in May, June, July or August |
Principle Component 2 |
Birthday in September, October, November or December |
Principle Component 3 |
Question 2: Are there differences in the characteristics of groups of bananas that have come to Canada from different countries (or) by different companies? How much variation is there between/within these groups?
1. Consult the tables below to determine your specific response and explanatory variables.
2. Using a one-way fixed-effect ANOVA (or Kruskall-Wallis test) compare the mean/medians in your chosen variable measured by different groups. If test is significant, use post-hoc means comparison methods to determine exactly which groups differ from which.
3. Produce a boxplot figure illustrating the data for by each group (don’t forget to properly indicate, on your graph, if necessary, which means/medians (if any) are significantly different using connected letters.
4. Using a one-way random-effects ANOVA, determine how much of the variance in your chosen variable can be attributed to between group variation, and how much can be attributed to within group variation.
Question 2 (Part 2) |
Response variable that describes the banana’s “characteristic” |
Born on the 1st through 6th of the month |
Pedicel Length |
Born on the 7th through 12th of the month |
Pedicel Circumference |
Born on the 13th through 18th of the month |
USDA Ripeness Index Value (Start) |
Born on the 19th through 25th of the month |
Wright Ripeness Index Value (Start) |
Born on the 26th through 31st of the month |
Banana Circumference |
Question 2 (Part 2) |
“Explanatory” variable that describes the banana’s origin/importer |
Last digit in your student ID number is odd, |
Country of Banana Origin |
Last digit in your student ID number is even |
Banana Import Company |
Question 3: What is the relationship between a banana’s length and its width?
1. Consult the tables below to determine your specific X and Y variables.
2. Calculate the appropriate correlation coefficient statistic for the relationship between your two variables, as well as determine if it significantly different from zero.
3. Be sure to include any information relevant to your analysis (i.e. tests of normality, homogeneity of variance etc..., and/or any relevant transformation(s) performed) that influenced your choice of statistical test.
4. Produce a scatterplot + regression line displaying the relationship between your two variables.
5. Calculate the slope and the intercept of your regression line, and determine if these two parameters are significantly different from zero.
Question 3 (X variable - width) |
“X” Variable |
Birthday in January, February, March or April |
Banana Diameter |
Birthday in May, June, July or August |
Pulp Diameter |
Birthday in September, October, November or December |
Banana Circumference |
Question 3 (Y variable - length) |
“Y” Variable |
Next-to-last digit in your student ID number is odd |
Outer Curve Length |
Next-to-last digit in your student ID number is even |
Inner Curve Length |
Question 4: Do the conditions under which bananas are stored affect how much it ripens over the course of a week?
1. Perform a two-way ANOVA to determine if light treatment and/or spacing treatment have a significant effect on the in ripeness index (End index value – start index value).
2. Report & interpret the results of your ANOVA table.
3. Create a plot that best depicts in your opinion the most important results of your experiment, indicating the location of any statistically significant differences revealed by your statistical tests.
CS 340 Milestone One Guidelines and Rubric Overview: For this assignment, you will implement the fundamental operations of create, read, update,
Retail Transaction Programming Project Project Requirements: Develop a program to emulate a purchase transaction at a retail store. This
7COM1028 Secure Systems Programming Referral Coursework: Secure
Create a GUI program that:Accepts the following from a user:Item NameItem QuantityItem PriceAllows the user to create a file to store the sales receip
CS 340 Final Project Guidelines and Rubric Overview The final project will encompass developing a web service using a software stack and impleme