Instructions
In this assignment, you will have the opportunity to review concepts from your previous coursework in statistics. Two highly relevant concepts that carry over from fundamental to more intermediate statistics courses include an understanding of probability and the mechanics of running statistical tests using commonly used software.
Two types of software commonly used in business and research are Microsoft Excel and SPSS. There are many other tools available, but the former is highly common at home and work, and the latter is the tool of choice for quantitative analysis in graduate programs. In this assignment, you will be asked to run an analysis in Excel and replicate it in SPSS.
This assignment contains two datasets, one in an Excel file and the other in an SPSS data file. The dataset contains three variables: age measured in years, engagement (average score of a survey on a five-point Likert scale), and extroversion (average score of a survey on a five-point Likert scale). The higher the Likert scale numbers, the more engaged and/or extroverted the person is. Once you have completed your analysis, please upload your manuscript with
1. narrative explanations of the findings,2. the Excel file containing the results of your analysis,3. and the SPSS output file also providing the results of your analysis.
In this assignment, please calculate the following in both Excel and SPSS:
- Calculate descriptive statistics for each of the three variables:
- Mean, median, mode, standard deviation and interpret them in narrative form.
- Pearson Product Moment Correlations between all three variables. Interpret these findings.
For this assignment, upload the three files identified above: (1) the narrative, (2) the Excel file, and (3) the SPSS output file.
Length: 1-2 pages not including output.
References: Include a minimum of 3 resources (remember that software used to generate research findings are considered to be resources)
Week 1
Age | Engagement | Extroversion |
43 | 2.75 | 4 |
34 | 4.25 | 3 |
55 | 5 | 4.25 |
50 | 3.5 | 3.5 |
45 | 3 | 5 |
58 | 2 | 2.75 |
48 | 5 | 5 |
54 | 4.5 | 3.25 |
48 | 2.25 | 2.25 |
50 | 3 | 4.25 |
37 | 2 | 2 |
29 | 4 | 2.75 |
51 | 4.25 | 2.75 |
50 | 3.25 | 2.5 |
47 | 5 | 4.25 |
27 | 3 | 5 |
48 | 4.5 | 4 |
35 | 5 | 3.5 |
31 | 5 | 4 |
27 | 4.25 | 2 |
52 | 4 | 4.5 |
31 | 4 | 4 |
41 | 2.75 | 4.75 |
47 | 2 | 5 |
45 | 3 | 3.25 |
54 | 3.5 | 4.25 |
34 | 2 | 4 |
25 | 5 | 5 |
53 | 4 | 2.75 |
29 | 2.75 | 4.25 |
,
Week 1 Assignment 2: Excel and SPSS Analysis
School of Business, univ.
BUS-7106
Running Head: WEEK 1 ASSIGNMENT 2
1
Introduction
In this assignment analysis of data sets in both Excel and SPSS are performed. In both software packages I perform descriptive statistics the mean, mode, median, standard deviation, and the Pearson Product Moment Correlations between all three variables. In addition, I generate histograms, charts, tables, and other useful images to illustrate other relationships between variables and to provide a clearer understanding of the meaning of descriptive statistics in the real-world. The definitions and rationale behind each statistical test of central tendency and relationships between variables are discussed in this paper as well. As a scholar practitioner specializing in statistical analysis and finance-economic quantitative analysis is imperative I can explain the concepts and apply them in research. The data uses three variables, age, engagement, and extroversion.
Central Tendency Excel
Central tendency refers to the central value within a probability distribution, also known as the central position, and there are three measurements used to obtain a central tendency which are the mean, median, and mode (Weiers, 2010). Colloquially the three measurement types are sometimes called averages as well, while only the arithmetic mean takes an average as the summation of values divided by number of individual values (Weiers, 2010). The median is the middle number, and the mode is the highest occurring individual number (Weiers, 2010). The arithmetic mean is susceptible to outliers than the media, but otherwise the most representative of the population parameter (Laerd, 2018). If the distribution curve is normal or near normal, it is not highly skewed, but if it is highly skewed to the left or right, it is significantly skewed then the median is a more accurate measure than the arithmetic mean (Weiers, 2010). I am not a fan of Excel as it is prone to bugs where symbols and number disappear in the cells when performing calculations; SPSS is far more robust, easy to use, and efficient in my experience. Sometimes when I enter Mean = = Mean( Excel fails to find the appropriate test for central tendency. I calculated some data below but then stopped and moved on to performing more tests in SPSS as it is more user friendly. Standard deviation measures the spread of numbers via the symbol sigma and it is calculated as the square root of the variance (Weiers, 2010). The variance is the average of the squared distances (differences), from the mean value (Weiers, 2010). Variance is represented by sigma squared, and this is why the S.D. is the square root of the variance to measure the spread as a non-squared value (Weiers, 2010).
In Excel, the average age in the first column is 42.6 years old as calculated via Mean =
=Average(number 1…. number z). The median is 46 years old, via Median = Median(number1… number z), and the mode is 50 years old, via Mode = Mode(number1… number z), but one can eyeball the mode easily. The standard deviation is 9.826825655. When I recalculate the mean via the standard deviation mechanism the mean changes to 43.06896552 and the median is rounded up to 47 while the mode is unchanged at 50. When I calculate by hand on my calculator, I also get the average of 43.0689655172. Excel is prone to missing a cell when trying to capture all of the cells in a given calculation in my experience. The assignment says to Calculate descriptive statistics for each of the three variables: Mean, median, mode, standard deviation and interpret them in narrative form. However, these are three different kinds of variables so we cannot calculate central tendency between all three variables simultaneously and have meaningful results. Below is a pivot table, charts, and histograms with more detailed information on the three variables the Excel data:
WEEK 1 ASSIGNMENT 2
10
Row Labels |
Average of Engagement |
Sum of Extroversion |
Mode = |
50 |
|
Median = |
47 |
|
Mean = |
43.06896552 |
|
STDEV = |
9.826825655 |
|
55 |
5 |
4.25 |
35 |
5 |
3.5 |
25 |
5 |
5 |
31 |
4.5 |
8 |
51 |
4.25 |
2.75 |
29 |
4 |
2.75 |
54 |
4 |
7.5 |
53 |
4 |
2.75 |
52 |
4 |
4.5 |
48 |
3.916666667 |
11.25 |
27 |
3.625 |
7 |
47 |
3.5 |
9.25 |
50 |
3.25 |
10.25 |
34 |
3.125 |
7 |
45 |
3 |
8.25 |
29+A7A5:A3A2:A3 1 |
2.75 |
4.25 |
43 |
2.75 |
4 |
41 |
2.75 |
4.75 |
37 |
2 |
2 |
58 |
2 |
2.75 |
Grand Total |
7.599876211 |
111.75 |
'Engagement' by 'Age'
60
50
40
30
20
10
0
0
5
10
15
20
Age
25
30
35
40
'Engagement', 'Extroversion' by 'Age'
60
50
40
30
20
10
0
0
5
10
15
20
Age
25
30
35
40
Engagement Extroversion
Engagement
Central Tendency SPSS
SPSS may not be as useful to simulate statistics as MATLAB, or Stata, but it is more useful than Excel.
Descriptive Statistics
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Age |
30 |
25.00 |
58.00 |
42.6000 |
9.99172 |
Extroversion |
30 |
2.00 |
5.00 |
3.7250 |
.94766 |
Engagement |
30 |
2.00 |
5.00 |
3.6167 |
1.03544 |
Valid N (listwise) |
30 |
SPSS is easier to navigate than Excel, more efficient in its calculations via one push button outputs, and it displays above multiple data points in one convenient table.
Pearson Product Moment Correlation
he Pearson Product Moment Correlation holds four inherent assumptions in order to be properly applied which are:
1.) The variables must two in number, and continuous.
2.) There must be a linear relationship between variables; Pearson actually tests the degree of linearity between variables to put it more accurately.
3.) There cannot exist significant outliers.
Age
Statistic
Engagement
27.00 Mean
95% Confid
ence In
terval Lower Bo
3.6250
und -4.3164
for Mean
Upper Bo
nd 11.5664
5% Trimme
d Mean
.
4.) The normal distribution must be approximated (Weiers 2010).
Below is how not to apply the Pearson Correlation, as per the assignment wording: Pearson Product Moment Correlations between all three variables. Interpret these findings. Pearson is represented by r and takes on a value between -1 and 1, where 0 means there is not association, a negative value means a negative relationship and any positive number means there is a positive
correlation (Laerd,
Age
Pearson
Age Engagement
1 -.079
Extroversion
-.064
2018).
Correlation Sig. (2-tailed) N
Engagement Pearson
.676
30 30
-.079 1
.737
30
.122
Correlation Sig. (2-tailed) N
Extroversion Pearson
.676
30
-.064
.522
30
.122
30
1
Std. Error
.62500
Correlation Sig. (2-tailed)
N
.737 .522
30 30
u
30
Median Variance
Std. Deviation Minimum
Maximum
3.6250
.781
.88388
3.00
4.25
Range |
1.25 |
||||
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
29.00 |
Mean |
3.3750 |
.62500 |
||
95% Confidence Interval Lower Bound |
-4.5664 |
||||
for Mean |
Upper Bound |
11.3164 |
|||
5% Trimmed Mean |
. |
||||
Median |
3.3750 |
||||
Variance |
.781 |
||||
Std. Deviation |
.88388 |
||||
Minimum |
2.75 |
||||
Maximum |
4.00 |
||||
Range |
1.25 |
||||
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
31.00 |
Mean |
4.5000 |
.50000 |
||
95% Confidence Interval Lower Bound |
-1.8531 |
||||
for Mean |
Upper Bound |
10.8531 |
|||
5% Trimmed Mean |
. |
||||
Median |
4.5000 |
||||
Variance |
.500 |
||||
Std. Deviation |
.70711 |
||||
Minimum |
4.00 |
||||
Maximum |
5.00 |
||||
Range |
1.00 |
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
34.00 |
Mean |
3.1250 |
1.12500 |
||
95% Confidence Interval Lower Bound |
-11.1695 |
||||
for Mean |
Upper Bound |
17.4195 |
|||
5% Trimmed Mean |
. |
||||
Median |
3.1250 |
||||
Variance |
2.531 |
||||
Std. Deviation |
1.59099 |
||||
Minimum |
2.00 |
||||
Maximum |
4.25 |
||||
Range |
2.25 |
||||
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
45.00 |
Mean |
3.0000 |
.00000 |
||
95% Confidence Interval Lower Bound |
3.0000 |
||||
for Mean |
Upper Bound |
3.0000 |
|||
5% Trimmed Mean |
3.0000 |
||||
Median |
3.0000 |
||||
Variance |
.000 |
||||
Std. Deviation |
.00000 |
||||
Minimum |
3.00 |
||||
Maximum |
3.00 |
||||
Range |
.00 |
||||
Interquartile Range |
.00 |
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
47.00 |
Mean |
3.5000 |
1.50000 |
||
95% Confidence Interval Lower Bound |
-15.5593 |
||||
for Mean |
Upper Bound |
22.5593 |
|||
5% Trimmed Mean |
. |
||||
Median |
3.5000 |
||||
Variance |
4.500 |
||||
Std. Deviation |
2.12132 |
||||
Minimum |
2.00 |
||||
Maximum |
5.00 |
||||
Range |
3.00 |
||||
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
|||
48.00 |
Mean |
3.9167 |
.84574 |
||
95% Confidence Interval Lower Bound |
.2777 |
||||
for Mean |
Upper Bound |
7.5556 |
|||
5% Trimmed Mean |
. |
||||
Median |
4.5000 |
||||
Variance |
2.146 |
||||
Std. Deviation |
1.46487 |
||||
Minimum |
2.25 |
||||
Maximum |
5.00 |
||||
Range |
2.75 |
||||
Interquartile Range |
. |
||||
Skewness |
-1.508 |
1.225 |
Kurtosis |
. |
. |
|||
50.00 |
Mean |
3.2500 |
.14434 |
||
95% Confidence Interval Lower Bound |
2.6290 |
||||
for Mean |
Upper Bound |
3.8710 |
|||
5% Trimmed Mean |
. |
||||
Median |
3.2500 |
||||
Variance |
.063 |
||||
Std. Deviation |
.25000 |
||||
Minimum |
3.00 |
||||
Maximum |
3.50 |
||||
Range |
.50 |
||||
Interquartile Range |
. |
||||
Skewness |
.000 |
1.225 |
|||
Kurtosis |
. |
. |
|||
54.00 |
Mean |
4.0000 |
.50000 |
||
95% Confidence Interval Lower Bound |
-2.3531 |
||||
for Mean |
Upper Bound |
10.3531 |
|||
5% Trimmed Mean |
. |
||||
Median |
4.0000 |
||||
Variance |
.500 |
||||
Std. Deviation |
.70711 |
||||
Minimum |
3.50 |
||||
Maximum |
4.50 |
||||
Range |
1.00 |
||||
Interquartile Range |
. |
||||
Skewness |
. |
. |
|||
Kurtosis |
. |
. |
The data above looks at the relationship between age and engagement as a dependent variable. Engagement is constant within the same age group, so boxplots were included but other output was omitted.
Conclusion
There was not multiple correlation coefficient performed in this paper, which could have been used, but a standard central tendency test of all three variables is not possible, nor is a Pearson Product Moment Correlation for three variables at once appropriate. Performing these tests one at a time for each variable is tedious and time-consuming. For our COVID research we could apply Pearson to two variables at a time if the test for linearity is met, and we already know we are using continuous variables in some cases where only 0 and 1 are being used.
References
Laerd, (2018). Measures of Central Tendency. Retrieved from: