# Define descriptive statistics and list the various descriptive measures

Some datasets are updated on a scheduled basis, and other datasets are changed as improvements in collecting the data make updates worthwhile. In order to deal with these changes, new versions of a dataset may be created. Unfortunately, there is no consensus about when changes to a dataset should cause it to be considered a different dataset altogether rather than a new version. In the following, we present some scenarios where most publishers would agree that the revision should be considered a new version of the existing dataset.

What Is Central Limit Theorem? For practical purposes, the main idea of the central limit theorem CLT is that the average of a sample of observations drawn from some population with any shape-distribution is approximately distributed as a normal distribution if certain conditions are met.

In theoretical statistics there are several versions of the central limit theorem depending on how these conditions are specified.

These are concerned with the types of assumptions made about the distribution of the parent population population from which the sample is drawn and the actual sampling procedure. One of the simplest versions of the theorem says that if is a random sample of size n say, n larger than 30 from an infinite population, finite standard deviationthen the standardized sample mean converges to a standard normal distribution or, equivalently, the sample mean approaches a normal distribution with mean equal to the population mean and standard deviation equal to standard deviation of the population divided by the square root of sample size n.

In applications of the central limit theorem to practical problems in statistical inference, however, statisticians are more interested in how closely the approximate distribution of the sample mean follows a normal distribution for finite sample sizes, than the limiting distribution itself.

r-bridal.com has been an NCCRS member since October The mission of r-bridal.com is to make education accessible to everyone, everywhere. Students can save on their education by taking the r-bridal.com online, self-paced courses and earn widely transferable college credit recommendations for a fraction of the cost of a traditional course. Courses consist of engaging, bite-sized. Box and Cox () developed the transformation. Estimation of any Box-Cox parameters is by maximum likelihood. Box and Cox () offered an example in which the data had the form of survival times but the underlying biological structure was of hazard rates, and the transformation identified this. Statistics is a branch of mathematics dealing with the collection, organization, analysis, interpretation and presentation of data. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country.

Sufficiently close agreement with a normal distribution allows statisticians to use normal theory for making inferences about population parameters such as the mean using the sample mean, irrespective of the actual form of the parent population.

It is well known that whatever the parent population is, the standardized variable will have a distribution with a mean 0 and standard deviation 1 under random sampling. Moreover, if the parent population is normal, then it is distributed exactly as a standard normal variable for any positive integer n.

It is generally not possible to state conditions under which the approximation given by the central limit theorem works and what sample sizes are needed before the approximation becomes good enough.

As a general guideline, statisticians have used the prescription that if the parent distribution is symmetric and relatively short-tailed, then the sample mean reaches approximate normality for smaller samples than if the parent population is skewed or long-tailed.

In this lesson, we will study the behavior of the mean of samples of different sizes drawn from a variety of parent populations. Examining sampling distributions of sample means computed from samples of different sizes drawn from a variety of distributions, allow us to gain some insight into the behavior of the sample mean under those specific conditions as well as examine the validity of the guidelines mentioned above for using the central limit theorem in practice.

Under certain conditions, in large samples, the sampling distribution of the sample mean can be approximated by a normal distribution. The sample size needed for the approximation to be adequate depends strongly on the shape of the parent distribution.

Symmetry or lack thereof is particularly important.

For a symmetric parent distribution, even if very different from the shape of a normal distribution, an adequate approximation can be obtained with small samples e. For symmetric short-tailed parent distributions, the sample mean reaches approximate normality for smaller samples than if the parent population is skewed and long-tailed.

In some extreme cases e. For some distributions without first and second moments e. Many problems in analyzing data involve describing how variables are related. The simplest of all models describing the relationship between two variables is a linear, or straight-line, model. A more elegant, and conventional method is that of "least squares", which finds the line minimizing the sum of distances between observed points and the fitted line.

Know that there is a simple connection between the numerical coefficients in the regression equation and the slope and intercept of regression line. Know that a single summary statistic like a correlation coefficient does not tell the whole story.

A scatter plot is an essential complement to examining the relationship between the two variables. Analysis of Variance The tests we have learned up to this point allow us to test hypotheses that examine the difference between only two means.

ANOVA does this by examining the ratio of variability between two conditions and variability within each condition. For example, say we give a drug that we believe will improve memory to a group of people and give a placebo to another group of people. We might measure memory performance by the number of words recalled from a list we ask everyone to memorize.

A t-test would compare the likelihood of observing the difference in the mean number of words recalled for each group. An ANOVA test, on the other hand, would compare the variability that we observe between the two conditions to the variability observed within each condition.

Recall that we measure variability as the sum of the difference of each score from the mean. Exponential Density Function An important class of decision problems under uncertainty concerns the chance between events.

For example, the chance of the length of time to next breakdown of a machine not exceeding a certain time, such as the copying machine in your office not to break during this week. Exponential distribution gives distribution of time between independent events occurring at a constant rate.

Its density function is: Applications include probabilistic assessment of the time between arrival of patients to the emergency room of a hospital, and arrival of ships to a particular port.These notes are meant to provide a general overview on how to input data in Excel and Stata and how to perform basic data analysis by looking at some descriptive statistics using both programs.

Excel. To open Excel in windows go Start – Programs – Microsoft Office – Excel.

## Background

When it opens you will see a blank worksheet, which consists of alphabetically titled columns and numbered rows. Abstract. This document provides Best Practices related to the publication and usage of data on the Web designed to help support a self-sustaining ecosystem.

The web pages listed below comprise a powerful, conveniently-accessible, multi-platform statistical software package.