File Name: descriptive and inferential statistics examples .zip
Size: 15676Kb
Published: 18.04.2021
When analysing data, for example, the marks achieved by students for a piece of coursework, it is possible to use both descriptive and inferential statistics in your analysis of their marks. Not Found.
Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population. In statistics, the range is the spread of your data from the lowest to the highest value in the distribution. It is the simplest measure of variability.
By Malcolm J. Brandenburg, Derald E. Wentzien, Riza C. Bautista, Agashi P. Nwogbaga, Rebecca G. Miller and Paul E. Undergraduate data science research projects form an integral component of the Wesley College science and mathematics curriculum.
In this chapter, we provide examples for hypothesis testing, where statistical methods or strategies are coupled with methodologies using interpolating polynomials, probability and the expected value concept in statistics. Wesley College Wesley is a minority-serving, primarily undergraduate liberal-arts institution.
Its STEM science, technology, engineering and mathematics fields contain a robust federal and state sponsored directed research program [ 1 , 2 ]. In this program, students receive individual mentoring on diverse projects from a full-time STEM faculty member. All incoming freshmen are immersed in research in a specially designed quantitative reasoning a level mathematics core course, a first-year seminar course and level frontiers in science core course [ 1 ]. Projects in all level-1 STEM core courses provide an opportunity to develop a base knowledge for interacting and manipulating data.
These courses also introduce students to modern computing techniques and platforms. At the other end of the Wesley core-curriculum spectrum, the advanced undergraduate STEM research requirements reflect the breadth and rigor necessary to prepare students for possible future postgraduate programs. For analyzing data in experiential research projects, descriptive and inferential statistics are major components.
To help students with poor mathematical ability and to further enhance their general thinking skills, in our remedial mathematics courses, we provide a foundation in algebraic concepts, problem-solving skills, basic quantitative reasoning and simple simulations.
Our institution also provides a plethora of student academic support services that include an early alert system, peer and professionally trained tutoring services and writing center support.
Single and multiparameter linear free energy relationships LFERs help chemists evaluate multiple kinds of transition-state molecular interactions observed in association with compound variability [ 3 ]. Chemical kinetics measurements are understood by correlating the experimental compound reaction rate k or equilibrium data and their thermodynamics.
The computationally challenging stoichiometric analysis elucidates metabolic pathways by analyzing the effect of physiochemical, environmental and biological factors on the overall chemical network structure. All of these determinations are important in the design of chemical processes for petrochemical, pharmaceutical and agricultural building blocks.
In this section, through results obtained from our undergraduate directed research program in chemistry, we outline examples with statistical descriptors that use inferential correctness for testing hypotheses about regression coefficients in LFERs that are common to the study of solvent reactions. To understand mechanistic approaches, multiple regression correlation analyses using the one- and two-term Grunwald-Winstein equations Eqs.
To avoid multicollinearity, it is stressed that the chosen solvents have widely varying ranges of nucleophilicity N and solvent-ionizing power Y values [ 3 , 4 ]. In Eqs. To study the solvent nucleophilic attack at a sp 2 carbonyl carbon, we completed detailed Grunwald-Winstein Eqs. Using Eq. An analysis of Eq. Since the use of Eq. Fluorine is a very poor leaving-group when compared to chlorine, hence for carbonyl group containing molecules, we proposed the existence of a bimolecular tetrahedral transition-state TS with a rate-determining addition step within an A-E pathway as opposed to a bimolecular concerted associative S N 2 process with a penta-coordinate TS.
We experimentally measured the solvolytic rates for PhCOF at In 37 solvent systems, a two-term Grunwald-Winstein Eq. On the other hand, for PhCOCl at Hence, we proposed an S N 1 process with significant solvation l component of the developing aryl acylium ion. This suggests that the A-E pathway is prevalent.
In addition, there were three solvents where there was no clear demarcation of the changeover region. At This rate trend is primarily due to more efficient PhCOF ground-state stabilization. PhCOCN is an ecologically important chemical defensive secretion of polydesmoid millipedes and cyanide is a synthetically useful highly active leaving group. Since the leaving group is involved in the rate-determining step of any S N 2 process, we became skeptical with the associative S N 2 proposal and decided to reinvestigate the PhCOCN analysis.
Using the Lee data within Arrhenius plots Eq. The For all of the Arrhenius plots, the R 2 values ranged from 0. In order to utilize Eqs. For the seven highly ionizing aqueous TFE mixtures, using Eq. Using Eqs. In the very polar TFE mixtures, in Eq. The l value of 0. In all of the common solvents at These observations are very reasonable as the cyanide group is shown to have a greater inductive effect and in addition, the cyanide anion is a weak conjugate base.
Complete historical data time series are needed to create effective mathematical models. If a reasonable estimate for the missing value can be determined, the data series can then be used for future analysis.
In this section, we present a methodology to generate a reasonable estimate for a missing or inaccurate values when two important conditions exist: 1 a similar data series with complete information is available and 2 a pattern or trend is observable.
The extent increases until it reaches a maximum for the year in mid-March and decreases until it reaches a minimum for the year in mid-September. Unfortunately, the data set contains missing data for some of the days.
The extent of the northern polar ice cap in the month of January for , and is utilized as an example. Complete daily data for January in and is available. The January data has a missing data value for January 25, Figure 2 presents the line graph of the daily ice extent for January of , and A complete time series is available for and , so the first condition is met. The line graphs also indicate that the extent of the polar ice caps is increasing in January, so the second condition is met.
An interpolating polynomial will be introduced and used to estimate the missing value for the extent of the polar ice cap on January 25, Polynomials of higher degrees could also be used. The extent of the polar ice for January 25 will be removed from the data series for and and an estimate will be prepared using polynomials of degree 1. Another estimate is prepared using polynomials of degree 3.
The estimated value will be compared to the actual value for the years and The degree of the polynomial that generates the best closest estimate for January 25 will be the degree of the polynomial used to generate the estimate for January 25, A two-equation, two-unknown system of equations is created when using polynomials of degree 1. One known value before and after the missing value for each year is used to set up the system of equations.
To simplify the calculations, January 24 is recorded as time period 1, January 25 is recorded as time period 2 and January 26 is recorded as time period 3. The time period and extent of the sea ice for each year was recorded in Excel.
The coefficients a i can be found by solving the system of equations. Substitution, elimination, or matrices can be used to solve the system of equations. A TI graphing calculator and matrices were used to solve this system.
The estimate for January 25, is: 12 , , The estimate for January 25, is: 13 , , The absolute values of the deviations actual and estimated values were calculated in Excel. A four-equation, four-unknown system of equations is created when using polynomials of degree 3. Two known values before and after the missing value are used to set up the system of equations. To simplify the calculations, January 23 is recorded as time period 1, January 24 is recorded as time period 2, January 25 is recorded as time period 3, January 26 is recorded as time period 4 and January 27 is recorded as time period 5.
The mean of the absolute deviations for polynomials of degree 1 and the mean of the absolute deviations for polynomials of degree 3 were calculated in Excel.
The polynomial of degree 3 provided the smallest mean absolute deviation. Therefore, a third order polynomial will be used to generate an estimate for the sea ice extent on January 25, Figure 3 shows the extent of the sea ice in January, with the estimate for January In , an unprecedented outbreak of Ebola occurred predominantly in West Africa. Statistics through dynamic modeling played a crucial role with clinical data collection and management. The lessons learned and the resultant statistical advances continue to inform and drive current and subsequent pandemics.
We used statistical curve fitting that involved both exponential and polynomial functions as well as model validation using nonlinear regression and R 2 statistical analysis. Consequently, the data for this project began from that week to October 31, The Ebola data was used to create epidemiological models to predict the possible pathway of a West Africa type of Ebola outbreak.
The WHO number of Ebola cases and death toll as of October 31st, were Liberia cases with deaths , Sierra Leone cases with deaths , Guinea cases with deaths , Nigeria 20 cases with eight deaths , the United States four cases with one death , Mali one case with one death and Spain one case with zero death.
The dotted curve in Figure 4 shows the actual observed deaths while the solid line shows the number of deaths as determined by the fitted model. As shown in Figure 4 , the growth of the Guinea deaths is exponential. A comparison of the actual data to the projected data shows that the two are similar but not exact Table 2. The projected amount of deaths is approximately by week 35 or the week of November 23, Unlike the Guinea deaths, the Liberian deaths are modeled using polynomial function Figure 5.
The model is not exact but it is close enough to predict that by week 35, there would be over deaths in Liberia Table 3. An exponential function was not used as it was not suitable since the actual growth was not initially fast enough to match the exponential growth.
By Malcolm J. Brandenburg, Derald E. Wentzien, Riza C. Bautista, Agashi P. Nwogbaga, Rebecca G. Miller and Paul E. Undergraduate data science research projects form an integral component of the Wesley College science and mathematics curriculum.
The field of statistics deals with collecting, exploring and presenting large amounts of data to discover underlying patterns and trends. Statistics is applied every day in many areas of our lives: business, industry, medicine and government—to facilitate making informed decisions in the presence of uncertainty and variation. For example, factory authorities use information from statistical quality control unit to know whether the length or weight of their products is within established standards.
Inferential statistics is used to analyse the results and draw conclusions. Experts described inferential statistics as the mathematics and logic of how this generalization from sample to population can be made Kolawole , These procedures might be used to estimate the likelihood that the collected data occurred by chance and to draw conclusions about a larger population from which samples were collected. Theoretical structure signify that inferential statistics infer from the sample to the population. They determine probability of characteristics of population based on the characteristics of sample and help assess strength of the relationship between independent causal variables, and dependent effect variables.
It is quite hard to identify, whether the research relies on descriptive statistics or inferential statistics, as people usually, lacks knowledge about these two branches of statistics. As the name suggests, descriptive statistics is one which describes the population. On the other end, Inferential statistics is used to make the generalisation about the population based on the samples.
Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. It only takes a minute to sign up. My understanding was that descriptive statistics quantitatively described features of a data sample, while inferential statistics made inferences about the populations from which samples were drawn. However, the wikipedia page for statistical inference states:. For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. The "for the most part" has made me think I perhaps don't properly understand these concepts.
When analysing data, such as the marks achieved by students for a piece of coursework, it is possible to use both descriptive and inferential statistics in your analysis of their marks. Typically, in most research conducted on groups of people, you will use both descriptive and inferential statistics to analyse your results and draw conclusions. So what are descriptive and inferential statistics? And what are their differences? Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made.
Unit And remember to use examples of descriptive statistics, not reasons to use descriptive statistics. Check your essay to make sure your Introduction Paragraph has a hook and a Thesis Statement.
The descriptive statistic obtained from the sample would allow the researcher to make an inference (rationale conclusion) about the population through inferential.
Reply