Statistical anlaysis handout, first half

Transcription

Statistical anlaysis handout, first half
Statistical tests using Statview
This packet describes how to do a number of basic statistical tests using the statistical program
Statview. In describing how to do each test, we’ve also described the conditions under which
the test is appropriate, what data assumptions need to be tested before running the test, and how
to interpret the output. In many cases, graphical formats for displaying your data are suggested.
Although this was originally written as a tutorial for both Statview and statistics, it’s now been
reformatted to function more as a reference manual. To that end, we’ve added a Table of
Contents so that you can quickly find descriptions of how to do various tests or data
manipulations. However, the descriptions of how to do the tests are still formatted in tutorial
style, with example data sets sometimes continuing from one test description to the next.
Table of Contents
page
Getting started with Statview…………………………………………………………………… 2
Data Formats and Manipulations……………………………………………………………….. 6
Key to Statistical Tests………………………………………………………………………….. 9
Unreplicated Data: Chi-square tests
Goodness-of-fit test…………………………………………………………………………. 11
Test of Independence……………………………………….………………………………. 11
Replicated Data: Comparisons of means (or medians) tests
Testing the Assumptions
Normality……………………………………………………………………………….. 13
Homogeneity of Variances – 2 groups………………………………………………….. 14
Homogeneity of Variances – 3 or more groups………………………………………… 15
Transformations (when the data assumptions aren’t met)..……………………….…… 16
Independent Variable(s all) Categorical
Two-sample t-test……………………………………………………………………….. 17
Paired t-test……………………………………………………………………………... 18
Mann-Whitney U test………………………………………………………………….... 19
One-way ANOVA and post-hoc comparisons (Fisher PLSD test) …………………….. 20
Two-way ANOVA…………………………………………………………………….... 22
Independent Variable(s all) Continuous
Correlation (no causation implied) …………………………..………………………… 24
Regression (causation implied)………………………………..…………………...…… 26
Independent Variables both Categorical and Continuous – ANCOVA…………………….. 28
Statistics with Statview - 2
Getting Started with Statview
Statview is a program that is very easy to learn but it is also very powerful. The main advantage
of Statview is that it allows you to VIEW your data in a dynamic graphical format, so that you
can better visualize the patterns in your data, and then provide statistical support for those
patterns.
This brief guide will get you started with the program, but for more detailed explanations refer to
the manuals: the Using Statview guide and the Staview Reference. These will be available in the
computer lab for your use. This guide also assumes that you know the basics of using Macintosh
software, such as double-clicking, dragging, etc. If not, consult your instructor or go through the
Macintosh tutorial on the computer.
To get started, find the icon for Statview on your computer desktop and then double-click on it.
Once the program finishes loading there should be a box displaying the logo for Statview.
•
Start a new data file by pulling down the File menu and choose New. This will create a new
untitled data file as shown below.
attribute
pane
enter data here
Now we want to enter some data to analyze. For example, suppose you are part of a biodiversity
survey in the Galapagos Islands and you are studying marine iguanas. After visiting a couple of
islands you think that there were more iguanas on island A than on island B. To examine this
hypothesis, you decide to quantify
the population densities of the iguanas on each island. You
lay out 20 quadrats (100 m2) on each island (A and B), and count the number of iguanas in each
quadrat. You enter your data into Statview in the format shown on the next page.
To enter the data yourself:
• Start by clicking on the cell labeled Input Column and enter the name Island A. Then press
Enter or Return.
• Then do the same for the second column, naming it Island B.
• Next, notice the attribute pane. The top five rows of the dataset tell you the type, source,
class, format, and decimal places for each variable. You can change each of these attributes
directly, just click on the cell you want to change and hold down the mouse button, then
scroll down to the setting you want and then release the mouse button.
• For example, change the attribute setting for Decimal places from 3 to 2 for the data from
both islands A and B so that it is displayed as in the figure on the next page.
Statistics with Statview - 3
•
•
•
•
•
Start entering the data for each population by clicking on the first cell in the spreadsheet,
entering the number, and hitting return (or Enter on the numeric keypad) to move to the next
cell. You can also use the arrow keys to move among cells.
To view the summary statistics (mean, standard deviation, etc. ) for the density data in each
column, pull down the attribute pane control on the far right of the spreadsheet in the
scrollbar. The further down you pull it the more summary statistics it will reveal in the
attribute pane above your data.
You can show or hide as many of the rows in the attribute pane as you desire by doubleclicking or clicking and dragging the attribute pane control button on the vertical scroll bar.
As you can see, the mean density of the marine iguana population on island A (13.35) is
smaller than the one on island B (15.55). However, are these means significantly different?
Let’s examine the data in more detail and then use some statistical tests to determine whether
the two population densities are significantly different.
Statistics with Statview - 4
A quick view of your data in graphical format
To get a quick view of what your data look like:
• Pull down the Analyze menu and choose New view. A new window will appear as shown
below.
variables
browser
analysis
browser
•
•
•
•
In the analysis browser, click on the arrow in front of the heading Cell Plots and a variety of
subheadings will appear. The arrow will now point down.
Outline all three choices: Point, Line, and Bar chart and then click Create Analysis.
In the dialog box that appears click on the checkbox for Show error bars, click on the button
for 95% confidence interval, and then click OK. Three boxes will then appear in the window
on the right, one for each type of graph.
Then click on the variable browser icon in the top right of the window. A list of the variables
will then appear in the variable browser window.
Statistics with Statview - 5
•
•
In the variable browser window, click on the name of the data column that you want to view
(Island A) and click Add. The data for that column will then displayed in the graphs as
shown on the next page.
In the variable browser, click on the name of the second data column that you want to
analyze (Island B) and click Add. The data for that column will also be displayed in each
graph.
Cell Mean
Cell Point Chart
Error Bars: 95% Confidence Interval
17
16.5
16
15.5
15
14.5
14
13.5
13
12.5
Island A
Island B
Cell Mean
Cell Line Chart
Error Bars: 95% Confidence Interval
17
16.5
16
15.5
15
14.5
14
13.5
13
12.5
Island A
Island B
Cell Mean
Cell Bar Chart
Error Bars: 95% Confidence Interval
18
16
14
12
10
8
6
4
2
0
Island A
•
•
•
Island B
Now that you have seen how to create three different ways to view the means of your data,
take a look at what each of them shows.
It is apparent that the mean population density of iguanas appears to be higher on Island B,
but the 95% confidence interval around the mean for island B is also wider, meaning we have
less confidence in the estimation of population density for this island.
Also, notice that the 95% confidence intervals for the mean population density of iguanas on
the two islands do not overlap each other. This suggests that the means are significantly
different. However, we will have to use a statistical test to confirm this.
Statistics with Statview - 6
DATA FORMATS AND MANIPULATIONS
For most tests you do in Statview there is a required format for your data. For unreplicated data
(for a Chi-square test of independence), you need only enter the total counts for each group in a
matrix in the dataset window (see p. 8). For replicated data, you have two format options:
“stacked” or “compact.” In the stacked format, all the values of your dependent variable are in a
single column with the groups stacked on one another. You then create a second (and possibly a
third and fourth) column that indicates which group each set of dependent values belongs in each
group. Alternatively, in the compact format, the values of your dependent variable are initially
entered in separate columns for each group (as you did in the iguana example, p. 3). But then
you “compact” them through a procedure that tells Statview that these are all measurements of
the same dependent variable categorized by independent variables you’ve designated. In fact,
these two formats are essentially the same to Statview, but they look rather different to you. The
differences are illustrated in the examples below using the first five measurements of iguana
density from islands A and B.
Stacked Format
Compacted Format
Island Iguana number
Density
A
12
Island A
Island B
A
13
12
15
A
10
13
13
A
11
10
16
A
12
11
10
B
15
12
11
B
13
B
16
B
10
B
11
Compacting with a single categorical variable (as shown above at right)
• First, close any View windows you have open.
• Select the two columns of data by dragging across the column headings to highlight them.
• Then click the Compact button in the upper left of the screen and a dialog box will appear
asking you to name the new compact variable. Name the compact variable Density. Then
click on the button down below for More Choices.
• You now want to designate the categories for the independent variable that will be called
Island. So click on New, and type the name Island into the first field for category name.
• Then name each of the categories for the variable. Type Island A in the Group Label field
and click on Add. The name Island A will then appear in the list below. Then type Island B
in the Group label field and click Add. The name Island B will then appear in the list below.
Click on the done button.
• The category name Island will then appear in the box on the right indicating that these
categories will be designated to each column. Then click the Compact button and the table
on the next page will appear.
Statistics with Statview - 7
Compacting with two categorical variables
If you have two categorical variables, the compacting procedure is a little more complicated.
Say you’ve measured the bill size of males and females in two species of birds. To run an
analysis of both of these factors simultaneously (see two-way ANOVA), you will need to
compact all four columns hierarchically (sex within species, species within bill size), as
described below.
•
•
•
•
•
To compact all four columns hierarchically, first enter the data into four separate columns.
Before compacting, test the assumption of normality for each column and the assumption
of homogeneity of variances for all four columns (see Testing Assumptions about your
Data, p. 11).
Next, outline the four columns and click on Compact to make a compact variable. A dialog
box will appear.
In the box, type “Bill Size” as the name under which to group all four columns. You are
starting at the top of the hierarchy, and this name should describe what the dependent
variable measures. Then click on the button down below for More Choices.
Now you want to designate the categories for each independent variable. You will be
moving down the hierarchy as you work. Click on New, and type the name “Species” into
the first field for category name.
Then name each of the categories for the “Species” variable. Type “Sp. A” in the Group
Label field and click on Add. The name “Sp. A” will then appear in the list below. Then
type “Sp. B” in the Group Label field and click Add, and the name “Sp. B” will appear in
the list below. Click on the Done button. The category name “Species” will then appear in
the box on the right.
Statistics with Statview - 8
•
•
Now you want to designate the categories of the variable “Sex,” so click on New, and type
the name “female” into the first field for category name. The name “female” will then
appear in the list below. Then type “male” in the Group Label field and click Add, and the
name “male” will appear in the list below. Click on the Done button. The category name
“Sex” will then appear in the box on the right.
Now click on the Compact button in the lower right of the dialog box and the data should be
grouped as shown above with the appropriate compact variables.
As with a single categorical variable, data with two categorical variables can also be stacked.
Enter the data with two columns for the two factors (Species and Sex) and one column for the
dependent variable (Bill Size) as shown below.
As you can see, setting up your data this way takes up more space. That is why your data are
more “compact” when using the compact variable format. There are advantages and
disadvantages to using either method that will become apparent as you work with Statview.
To set up your data in stacked format, first enter the data into four separate columns and test
your assumptions of normality and homogenous variances (see Testing Assumptions about
your Data, p. 11). Then, cut and paste your data into one column and set up your other two
columns to designate what the data in each row represent. Make sure to change the Data type to
String instead of Real so that you can type the names of the variables into the first two columns.
Statistics with Statview - 9
KEY TO STATISTICAL TESTS
Remember that these decisions apply only to your independent (causal) variable(s)!! Your
dependent (response) variables are always continuous and aren't used in choosing among
alternatives here.
Unreplicated data…………………………………………………………..Chi-square tests
1 causal variable………………………………………………………. Goodness-of-fit
2 causal variables (their interaction)……………………………..Test of independence
Replicated data
Categorical causal variable(s)….……………………………… Comparisons of means
1 variable, 1 group……………………………………………….…. 1 sample t-test
1 variable, 2 groups …………………………………………………2 sample t-test
paired t-test
(non-parametric) Mann-Whitney U test
1 variable, >2 groups……………………………………………Fisher's PLSD test
2 variables, ≥ 2 groups.……………………………………………………ANOVA
Continuous causal variable(s)……………………………………………… Regression
Categorical and continuous causal variables...………………………………ANCOVA
Brief descriptions of statistical tests covered in this packet
Chi square tests - no true replication -- only one count (usually) per group
Goodness of fit test - single variable; how well some expected distribution fits your data
Test of independence - two (or more) variables; do they act independently or do
changes in one variable affect the response to the other?
t-test - compares means of only two groups, samples unrelated except by treatment group
Mann-Whitney U test - this is a NON-PARAMETRIC alternative to a t-test (normality and
homogeneity of variance assumptions are not required, but it is not as powerful as a ttest).
paired t-test - compares two groups, but where samples are paired in some way
(before/after in same subject, a sample from each group measured at the same time, on
the same day . . .) and variation among pairs is likely to swamp any differences within
pairs.
Fisher's PLSD test - compares means of more than two groups with comparisons between
each possible pair (pairwise comparisons test)
ANOVA (=ANalysis Of VAriance) - compares means from 2 or more groups, as does Fisher's
test, but here we are usually more interested in the overall effects of the causal variables
rather than differences in particular group means. Example: caterpillars fed artificial
diets with 1) no added chemicals, 2) only nicotine added, 3) only caffeine added, or 4)
both nicotine and caffeine added. We are likely to be interested in 1) the effect of
nicotine, 2) the effect of caffeine, and, likely, 3) whether adding caffeine changes the
caterpillars' response to nicotine (i.e., is there an interaction between the two variables?),
rather than, e.g., whether caterpillars on caffeine only diets differ from caterpillars on
nicotine only diets.
Regression - your causal variable(s) is continuous - plot X vs. Y. You usually want to know
whether there is an influence of your continuous X variable in the value of Y (slope).
You also might be interested in whether Y has a value different from 0 when X is 0
(intercept).
ANCOVA (=ANalysis of COVAriance) - you have both continuous and categorical causal
variables. Basically this analyzes whether two regression lines differ in slope
(=interaction) or in intercept (=overall categorical variable effect). You can also find out if
there is an overall effect of the continuous variable (= overall slope with data from the two
lines combined).
Statistics with Statview - 11
UNREPLICATED DATA: CHI-SQUARE TESTS
Unreplicated data are measurements or counts such that you have only one value in each
category. For example, say that you counted the number of Douglas fir trees and the number of
bigleaf maple trees in a large site. Since you have only two numbers, one for Douglas fir and
one for maple, you can’t take a mean—you have unreplicated data. You can test unreplicated
data using a Chi-square test. This handout covers two types of Chi-square tests, the goodness-offit test (compares the distribution of your data with some hypothesized expected distribution) and
the test of independence (tests whether two variables have independent effects on the distribution
of your data).
Goodness-of-fit test
A goodness-of-fit test is one type of test that you can't get Statview to do for you. Fortunately, it
is very easy to do by hand. You will compare your observed counts in categories with expected
counts in the same categories. If the two distributions are very similar, you would expect to see
no statistical difference. If the distributions are rather different, however, you might expect a
significant difference. The test simply involves calculating a chi-square value (see formula
below) and then asking Statview to calculate the P-value associated with that chi-square value
(see below). To do this, Statview requires both the chi-square value and the degrees of freedom.
The degrees of freedom are always the number of categories - 1. So, if you have 4 categories,
you have 3 degrees of freedom. The P-value is the probability that the observed and expected
distributions are the same. Therefore, if P < 0.05, it means that we will call the counts in the
categories significantly different from expected. This tells you that whatever hypothesis
generated the expected distribution is probably not correct.
In a chi-square goodness-of-fit test, you must come up with an expected number in each category
to which you will compare your observed data. This is sometimes the trickiest part. It may be
that you would expect equal numbers in each category, or the expected numbers may depend on
the abundance of substrate types or other factor in the environment. This is where your
ecological knowledge comes in.
Once you have determined the proper expected numbers, the rest is easy. Simply plug the
numbers in the observed and expected categories into the following formula:
X2 = Σ (Obs - Exp)2 (summed over all categories)
Exp
For example, if you have two categories and observe 10 in one and 30 in the other, and expect
equal numbers in each, the calculation would be
(10 - 20)2 + (30 - 20)2 = 5 + 5 = 10.
There is 1 degree of freedom
20
20
(2 categories)
To get the P-value from Statview, open Formula under the Manage menu and click on the arrow
in front of Probabilities in the lower left corner of the dialog box. Double-click on
ProbChiSquare(?, 1), and then type your X2 value in place of the question mark. Enter your
degrees of freedom if different from the default 1. Then click OK. 1-P will be repeated in all the
rows of a new column in your dataset window. Subtract this number from 1 to get your Pvalue. Again, if the P-value is less than 0.05, the distributions are significantly different.
Chi-square test of independence.
This is a test that Statview runs. In a Chi-square test of independence, you are asking whether
two kinds of categories are related to each other in their effects on the numbers you count. For
example, if you want to know whether bees prefer to visit blue or white flowers, their visitation
rate might also be influenced by whether the flower has a fragrance. So you could set up a series
of flowers in all combinations: blue fragrant, white fragrant, blue scentless, and white scentless.
Statistics with Statview - 12
Count visits to each flower type and test whether the two factors, color and scent, are
independent of each other. This is usually set up in what is known as a contingency table as
shown below. The entries in the contingency table are the number of bee visits to each kind of
flower.
Flower
color
blue
white
Flower scent
fragrant
scentless
48
32
24
12
To run a Chi-square test of independence in Statview, you need to enter the data in the dataset
window just as you see it in the contingency table: two columns each with two rows. Be sure the
data in each column are labeled as integer and continuous. Label the columns so that these
labels will appear in your output. Then follow these steps to run the test of independence.
•
Open a new View window and click open the Contingency Table arrow. Click on
Summary Table, Observed Frequencies, and Expected values, then click on "Create
Analysis."
•
The output will include the analyses as well as the original data table and the expected
values for each cell assuming the two factors are independent. If the Chi Square P-value is
< 0.05, it means that the counts associated with one factor are influenced by the level of the
other factor, i.e., the two factors are not independent in their effects on the dependent
variable.
•
The output from these three selections for our flower data are shown below:
Summary Table for Rows, Columns
Num. Missing
0
DF
1
Chi Square
Chi Square P-Value
G-Squared
G-Squared P-Value
.469
Row 1
32
48
.4936
Row 2
12
24
36
.473
Totals
44
72
116
80
.4914
Contingency Coef.
.063
Phi
.064
Expected Values for Rows, Columns
Column 1 Column 2
Totals
.228
Row 1
30.345
49.655
80.000
Cty. Cor. P-Value
.6328
Row 2
13.655
22.345
36.000
Fisher's Exact P-Value
.5404
Totals
44.000
72.000
116.000
Cty. Cor. Chi Square
•
Observed Frequencies for Rows, Columns
Column 1 Column 2 Totals
So, in this case, fragrance and color have independent effects on bee visitation. Bees prefer
blue flowers to white (chi-square goodness of fit, X2 = 16.69, P < 0.0001), and bees prefer
fragrant over scentless flowers (chi-square goodness of fit, X2 = 6.76, P = 0.0093), but
being fragrant doesn't help a blue flower any more than it helps a white flower.
REPLICATED DATA: COMPARISON OF MEANS (OR MEDIANS) TESTS
If you have collected data from numerous individuals or plots within each group such that you
could take a mean of the values in your groups, you have replicated data. If you want to test to
see whether those means differ, you should try to use parametric tests. These are powerful tests
(are likely to find differences among groups if they are there) but they require your data to satisfy
two assumptions: 1) the data WITHIN EACH GROUP must be distributed approximately
normally, and 2) variances BETWEEN OR AMONG GROUPS must be homogeneous. If
your raw data don’t satisfy those assumptions, you should first try to transform the raw data to
make it satisfy the assumptions (see p. ). If none of the suggested transformations work, you
Statistics with Statview - 13
have to use a less powerful non-parametric test. This guide covers only one of these nonparametric tests, the Mann-Whitney U test, which compares the medians of two groups. There
are several other non-parametric tests available; if you need them, consult a statistics book.
TESTING ASSUMPTIONS ABOUT YOUR DATA
Test for Normality – Test each group of data separately!
Before you conduct any parametric tests you need to check that the data values come from an
"approximately normal" distribution. To do this, you can compare the frequency distribution of
your data values with those of a normalized version of these values. If the data are
approximately normal, then the distributions should be very similar. First, make a visual
comparison of the distribution of your data with a calculated normal distribution, and then
conduct a Normality test that will provide you with a statistic that determines whether your data
are significantly different from normal.
Using our example with iguana population density (see p. 3), the first thing you want to do is see
if the data for each population is approximately normally distributed.
To conduct the Normality test, go back to your original spreadsheet with your data. i.e., Under
the Window menu choose Untitled data set #1.
• Select the data values in column “Island A” (drag down over the data to outline it).
• From the Edit menu select Copy (or -c on the keyboard). From the File menu select Open,
and open the Normality test from the Dataset templates folder. A new window should appear
titled Normality Test.
• Click the first cell below the attribute pane in the column named Actual, and then paste your
data into this column by choosing Paste from the Edit menu (or -v on the keyboard). A
formula will then automatically compute Ideal Normal values from a normal distribution with
the same mean and standard deviation as the data you pasted into the column labeled Actual..
• Next, we use the K-S Normality Test to test the hypothesis that your Actual data values have
the same distribution as the computed Ideal Normal values.
• From the Analyze menu, select QC Analyses/
K-S Normality Test. The Assign variables
dialog box appears and automatically assigns
the Measurement variable to the proper slot.
By clicking on the arrow in the variable
browser in front of the Measurement
variable, you reveal the Category for
measurement variable (Categ...).
• Click on the variable Categ... and drag it
over into the Actual/Ideal measurements slot.
• Click OK. A new view shows a Komolgorov-Smirnov (K-S) table (note P-value at bottom)
and two histograms, the first with your data and a normal curve, and the second with the
calculated normal curve. If the result of the K-S test is significant (i.e., P<0.05), then the
distribution of your data is significantly different from normal. If your data are not normal,
you should inspect them for outliers which can have a strong effect on this test. Remove the
extreme outliers and try again. If this does not work, then you must either transform your
data so that it is normally distributed, or use a non-parametric test. Both of these options are
discussed later.
• After examining the distribution of your data and comparing it with an expected normal
distribution, you should find that the data for both populations are not significantly different
from normal (i.e., p > 0.05). The results for population A are shown below.
Statistics with Statview - 14
Histogram
Split By: Category for Measurement
Cell: Actual
Kolmogorov-Smirnov Test for Measurement
Grouping Variable: Category for Measurement
DF
2
8
Count, Actual
20
Count, Ideal Normal
20
7
6
1.600
5
P-Value
.8987
Count
.200
Chi Square
Maximum Difference
4
3
2
1
You should repeat these steps to test
for normality in the data from Island B.
0
9
10
11
12
13
14
Measurement
15
16
17
Test for homogeneity of variances -- TWO GROUPS
Before you run any parametric tests to compare means you also have to determine whether the
variances of the groups that you are comparing are reasonably similar. When you are comparing
the variances of two groups, you use the F-test. When you have three or more groups, you will
use Bartlett's test, described in the next section.
The comparison of variances involves looking at the ratio of the variances of the two data sets in
question. In our example, we will compare the variances of the Iguana densities for Islands A
and B. Your data can be in either a two column format or in a single data column with another
column with code variables identifying to which group each number belongs.
For example, if your density data for the iguana populations are in separate columns, compact
the two columns (see p. 6) to group the two columns under one variable name (e.g., Density).
Then
• Pull down the Analyze menu and choose New view (or Untitled View #1)
• In the analysis browser, double-click on Unpaired Comparisons.
• A dialog box will appear. Click on the check mark for unpaired t-test to turn it off, and then
click on the box for an F-test below. Then click OK.
• In the variable browser, select the continuous variable “Density,” and then click Add.
• Then click on the arrow in from of the continuous variable “Density” to reveal the compact
variable name “Islands.” Click on the compact variable name and then click Add.
• A table showing the results of the F-test will be displayed along with a table showing means,
etc. as seen below.
F Test for Density
Grouping Variable: Islands
Hypothesized Ratio = 1
Var. Ratio
Island A, Island B
Num. DF
Den. DF
F-Value
P-Value
19
19
.298
.0113
.298
Group Info for Density
Grouping Variable: Islands
Count
Mean
Variance
Std. Dev.
Std. Err
Island A
20
13.350
2.239
1.496
.335
Island B
20
15.550
7.524
2.743
.613
The P-value for the F-test is <0.05, therefore the variances of the two groups are significantly
different. You can now either transform the data to try and make the variances homogeneous, or
Statistics with Statview - 15
you can run a test that does not require homogeneity of variances (i.e., a non-parametric test like
the Mann-Whitney U test—see p. 15).
Test for homogeneity of variances – THREE OR MORE GROUPS
To continue the example using iguana population density data, let’s add data 16 densities from a
third Island C. Enter these data into your spreadsheet in a new column and then make a new
compact variable for the three groups. i.e., you should expand your original two columns first,
then create a new compact variable.
2
Density (100 m )
Island C:
15 13 10 14 12 12 13 13 14 14 11 14 15 12 15 16
Of course, the first thing we want to do is make sure the data for each population is
approximately normally distributed (see p. 11). Then you want to make sure the variances
among the three (or more) groups are homogeneous. This is done with Bartlett’s test.
Bartlett’s test requires that you enter the group names, counts, means, and standard deviations
into an analysis table. First, open that analysis table:
• From the File menu, select Open and then from the Dataset templates folder select Compute
Bartlett’s test. Click Open. The following window should appear, and there are columns for
the Group names, Group counts, Group means, and standard deviations that you need to fill
in with the appropriate values.
•
Go back to your original data set and copy the needed information from each column of
summary statistics and paste it into the appropriate columns in the Compute Bartlett’s test
file. This will be easiest if you drag across the columns of data and copy multiple cells
simultaneously from your data set, then use Paste transposed under the Edit menu when
pasting your data into the correct cells in the Compute Bartlett’s test file.
• Once all of the cells for each of the first four columns on the left are completed, the table will
automatically calculate Bartlett’s test, and in the window on the right you will see an Fvalue and a P-value for the test.
Example: Using the data presented above for the three iguana populations on different islands,
you should get the following result:
Statistics with Statview - 16
•
Since your variances are not homogeneous (P<0.05), then you may not proceed directly to
ANOVA comparisons of means (see below). You have two choices of what to do. You
can either transform your data to attempt to make the variances homogeneous or you may
run a test that does not require homogeneity of variances (a non-parametric test like Welch’s
Test for three or more groups—not covered in this guide).
TRANSFORMATIONS
Transformations of data are often used to make the variances homogeneous or to make
distributions more normal. Both homogeneous variances and normal distributions are
assumptions of parametric tests. If the distributions were not approximately normal or the
variances were not homogeneous, a transformation such as common log, natural log or square
root often helps.
To transform your data:
• Go back to the spreadsheet of your data (choose it under the Window menu).
• Pull down the Manage window and select Formula. A box will open that displays your
variables on the left. Since your original data columns are now part of a compact variable,
they are now named Island A Density, and Island B Density in the box on the left. You can
transform your data columns the way they are, or you could also uncompact the columns by
clicking on the Density heading and then clicking on the Expand button.
•
•
•
•
Select the variable you want to transform by double-clicking on it. The variable will be
displayed in the Formula variable definition box on the right.
Now you can transform the variable by 1) using the calculator below, 2) choosing functions
from the box on the lower left, or 3) typing the transformation in the Formula variable
definition box.
Click on the Attributes button and specify the new variable name (i.e., log A), the type and
class of data, and the decimal places required.
Click OK and then click the Compute button and Statview will do the transformation on each
data point and put the results in the specified column.
Try using each of the above transformations to see if you can achieve homogeneity among the
density data for the iguana populations on Islands A and B. Conduct a series of homogeneity of
variance F-tests using the transformed data. If the variances are now homogeneous, go on with
the parametric comparison of means test using the transformed data. If the variances are still too
different, you can do a Mann-Whitney U test (two groups only!) on the original data (see p. 14).
Statistics with Statview - 17
A special case of transformation should also be mentioned here. Whenever your data are
percents, they will generally not be normally distributed. To make percent data normal, you
should do an arcsine-square root transformation of the proportions (percents/100). Both of these
functions are in the menu in the dialog box. The composite function to be put into the Formula
variable definition box would look like:
arcsin(sqrt(percent/100))
Then proceed with comparisons of means on the transformed data.
INDEPENDENT VARIABLE(S ALL) CATEGORICAL
Independent categorical variables are measurements of dependent variables that taken within
categories or groups (e.g., island A/island B, male/female, species 1/species 2…). Categorical
variables can also be constructed from technically continuous variables (e.g., distance, size, time,
fertilizer level…) by dividing up the range of the continuous variable (e.g., near/far,
small/medium/large, early/late, low/high…). To choose the appropriate statistical test for your
data, you need to assess how many of which kind(s) of INDEPENDENT variables you have.
Two-sample t-test
This test compares the means from two groups, such as the density data for the two different
iguana populations. As in the example for the homogeneity of variances F-test, your data can be
in either a two column format, or in a stacked format with a second column that contains code
variables identifying the group to which each number belongs.
The following example assumes your log density data for the iguana populations are part of a
compact variable. To run a two-sample t-test on the data:
• Pull down the Analyze menu and choose New view.
• In the analysis browser, select Unpaired comparisons and click Create Analysis.
• Click OK to accept the default analysis parameters.
• In the variable browser, select the continuous variable (e.g., log Density), and then click Add.
• Then click on the arrow in front of the continuous variable to reveal the compact variable
name. Click on the compact variable name and then click Add.
• A table showing the results of the t-test on the log Density data will be displayed.
• The output consists of a table showing the mean difference between the two groups, the
degrees of freedom of the test, the t-value, and the P-value. In this example P<0.05 so the
two means are significantly different. The next table shows the summary statistics for each
group.
Unpaired t-test for log Density
Grouping Variable: Islands
Hypothesized Difference = 0
Island A, Island B
Mean Diff.
DF
t-Value
P-Value
-.062
38
-2.958
.0053
Group Info for log Density
Grouping Variable: Islands
Count
Mean
Variance
Std. Dev.
Std. Err
Island A
20
1.123
.003
.051
.011
Island B
20
1.185
.006
.079
.018
So, this statistical test provides strong support for your original hypothesis that the iguana
densities varied among the islands.

Documents pareils