Inferential Statistics: include one or more of the inferential statistical procedures that you learned about in this module (that is, a t-test or ANOVA) Custom Essay

    For this module of your SLP, you can use an article you used in a previous module’s SLP paper, or choose a different one. The article must be no more than 5 years old. The article must include one or more of the inferential statistical procedures that you learned about in this module (that is, a t-test or ANOVA).

    Begin by providing the reference for the article, in proper format.

    Write an introductory paragraph that includes a reminder of what your topic is.

    Introduce and briefly describe the study in one paragraph.

    Then identify the following:

    Null and alternative hypothesis

    Sampling procedures

    Independent and Dependent Variable/s

    Alpha level

    Outcome (significant results, or fail to reject null hypothesis)

    What 2 questions would you like to ask the researcher about the results?

    If you were designing your own study about this topic, what would your independent variable be?

    What would your dependent variable be?

    What would you expect to find? For example: males will be more likely to exercise than females … something related to your own topic.

    ASSIGNMENT EXPECTATIONS: Please read before completing assignments.

    Copy the actual assignment from this page onto the cover page of your paper (do this for all papers in all courses).
    Assignment should be approximately 2 pages in length (double-spaced).
    Please use major sections corresponding to the major points of the assignment, and where appropriate use sub-sections (with headings).
    Remember to write in a Scientific manner (try to avoid using the first person except when describing a relevant personal experience).
    Quoted material should not exceed 10% of the total paper (since the focus of these assignments is on independent thinking and critical analysis). Use your own words and build on the ideas of others.
    When material is copied verbatim from external sources, it MUST be properly cited. This means that material copied verbatim must be enclosed in quotes and the reference should be cited either within the text or with a footnote.
    Use of peer-reviewed articles is required.
    Credible professional sources are used (for example, government agencies, nonprofit organizations, academic institutions, scholarly journals). Wikipedia is not acceptable. WHAT IS STATISTICAL INFERENCE?

    What is a FACT? It is something we know through the direct evidence of our senses. When we do basically descriptive statistics, we are directly counting and measuring. It is a fact that the mean height of a particular class of sixth grade boys is (let’s say) 4’11” and it’s a fact that the standard deviation of the heights in that class is 2.5″. We can verify this directly.

    How does an INFERENCE differ from a fact? In making an inference we are assuming that something is true, we draw a conclusion based on evidence or signs that it has occurred or it is true, not on a direct observation.

    To differentiate with an example from “real life”:

    I arrive at one of my live, real-time Introductory Psychology classes looking very grim. I am carrying a stack of tests with me.

    What kinds of conclusion are the students going to draw?

    Now, can they be absolutely sure about their conclusions?

    How about an alternative hypothesis? For example, on my way to school I rear-ended someone, ruined my little car and my insurance rates. Or maybe I just had a big fight with the man I am dating.

    It could be that the tests were very poor, or it could be that I am having a hard time in some other aspect of my life. There is no way to tell until we move from the inferential to the descriptive. And of course, I may not actually tell my students what is going on, if it isn’t the test that is bumming me out. So, how can they really know for certain?

    Inductive reasoning involves putting our evidence together to support a claim about a phenomenon. Inductive reasoning is evaluated in terms of its strength. One can provide a strong inductive argument and still be wrong (this is not true of what we call deductive reasoning, but that is a story for a different day).

    STATISTICAL INFERENCE involves using procedures based on samples or bits of information used to make statements about some broader set of circumstances.

    So in induction and statistical inference, we use little bits of evidence, information and clues to make statements about phenomena. We reason from the specific to the general, however, all conclusions we arrive at are potentially wrong. Therefore, we refer to them as CONDITIONAL CONCLUSIONS. We can’t say that the conclusions are absolutely correct.

    CONDITIONAL CONCLUSION: Any conclusion based on induction cannot be stated as being absolutely correct. There is always a probability that an inference is wrong.

    What we have been working up to over the past 6 weeks is the essential problem:

    We generally do research on some aspect of a population. What is a population? It’s all those individuals, observations or measurements that share a common characteristic (which is probably of interest to us).

    We want to be able to generalize about some feature in a population but we are unable to study that population in its entirety. How can we use the little bit of data that we are able to gather to take an educated guess about this aspect of the population?

    That’s a general definition that fits most instances. However, in the health and behavioral sciences we often deal with a hypothetical version of the population known as a TREATMENT POPULATION. I am calling it hypothetical because it may only exist when the experiment is being conducted.

    The research we do is an attempt to describe the treatment population if it did exist – as if our newly developed antidepressant was being given to all the depressed people in the larger population. It is very difficult to study a population exhaustively, so we depend on our sample to be able to make inferences concerning the important characteristics, which we call PARAMETERS.

    TREATMENT POPULATION: A hypothetical population that experimenters create for research purposes. The parameters of the treatment population are estimated using the results obtained from the subjects in the experimental group who are exposed to a specific independent (manipulated) variable.

    We are going to use the smaller pieced of information gleaned from our sample to make larger and more complicated inferences and inductions. This is what inferential statistics enables us to do.

    INFERENTIAL PROCEDURES

    The questions that we ask using inferential procedures are:

    1. Does the behavior or feature we see in the sample represent the behavior of the population as a whole?

    2. Is it legitimate for us to use the data from the sample to claim or conclude that there is a real, significant difference between two or more experimental conditions?

    These questions represent two different types of inferential procedures known as

    – Estimation of population parameters (point estimation and confidence intervals)

    – Hypothesis testing

    In this module we will emphasize hypothesis testing, as many health research studies of interest involve the use of statistical procedures to test hypotheses.

    PART II
    STATISTICAL HYPOTHESIS TESTING

    To contrast with point/interval estimation, in hypothesis testing we are not trying to directly estimate a parameter of the population; rather we are comparing our sample statistic result with a known or hypothesized population parameter. We want to see if there is any significant difference between our sample result and what is known about the population in general. SIGNIFICANT does not mean “important” in quantitative reasoning (not necessarily at any rate.) Rather it means “unlikely to be due to mere chance.”

    There are two general types of hypothesis testing procedures:

    – A result is compared to a known population average. An example would be the rate of cancer in people exposed to a certain hazardous chemical in their line of work compared to the national average.

    – Samples from two or more treatment populations are compared: different dosage levels of a drug, or a drug, a non-pharmacological intervention, and no treatment.

    In quantitative reasoning procedures, HYPOTHESIS TESTING is the comparison of sample results with some known or hypothesized population parameters.

    TYPES OF HYPOTHESES

    We all do hypothesis testing every day of our lives as we navigate through various situations where we have to make educated guesses about what is going on with people and situations at work and socially/interpersonally. Earlier I used the example of my introductory psychology students trying to spin hypotheses about why I was acting like such a grouch.

    The hypothesis testing that we do in research should be somewhat more rigorous and organized.

    First, we make a general statement about the relationship between the independent and dependent variables or the magnitude of the observation (the size of the effect of interest.) This is our conceptual hypothesis.

    Example: I am doing research to find out if doing 1 hour of moderate aerobic exercise 4 times a week as opposed to no regular aerobic exercise is related to longer lifespan among women over the age of 65.

    What is my independent variable?

    Exercise or no exercise.

    What is my dependent variable?

    Years of life.

    Try this one:

    You are doing a study of the effects of a vitamin supplement on energy levels of men who are recovering from heart bypass surgery.

    What is your independent variable?

    What is your dependent variable?

    We use our conceptual hypothesis as a basis for a statistical hypothesis. This is a mathematical statement that can be shown to be supported or not supported through statistical procedures.

    In my study, there should be a difference in the mean lifespan of the women who exercise and the women who don’t. Specifically, the women who do exercise should having a higher mean lifespan than those who do not.

    What is the statistical hypothesis in your study (the vitamin supplement study)?

    It is customary for researchers to state first the hypothesis in terms of NO SIGNIFICANT RESULTS. It is a way of keeping one’s hopes for exciting results in check so that experimenter expectations don’t unduly influence the results.

    This is called the NULL hypothesis. Here are links to visit to learn more about the NULL hypothesis:

    Internet Glossary of Statistical Terms (2002). Retrieved Jan 1, 2012 from http://www.animatedsoftware.com/statglos/sgnullhy.htm

    Null Hypothesis. Retrieved Jan 1, 2012 from http://davidmlane.com/hyperstat/A29337.html

    To put this into statistical language, the null hypothesis is that the mean of the treatment group has approximately the same value as the mean of the control group. This is how it looks in statistical language:

    μ1 = μ2

    In my study – “There is no difference (increase) in lifespan of women who do aerobic exercise on a regular basis compared to those who do not.”

    What is the NULL hypothesis in your study?

    The result that represents a significant difference is called the ALTERNATIVE hypothesis. It can be shown a few different ways depending on your whether there is any evidence that the difference may be in a particular direction (more or less.)

    μ1 < μ2; μ1 > μ2; μ1 does not equal μ2

    Two types of alternative statistical hypotheses?

    Directional hypothesis

    The direction of the relationship difference between the two populations is explicitly stated. An alternative hypothesis that states the direction in which the population parameter in the experimental group differs from that in the control group.

    Usually we are interested in a direction for our significant difference. In my study, the alternative hypothesis is “Women who do aerobic exercise regularly will live longer than women who do not.” When we do not state the alternative hypothesis in terms of a specific direction for the effect or the difference, it is called a NON-DIRECTIONAL hypothesis. Think about it – if I just said “There is a difference in lifespan between women who exercise regularly and women who do not” it would seem to be quite significant if I found that the mean lifespan in the exercising group was fewer years. But this would be not be a desirable outcome, given that I probably want to show that exercise helps people live longer.

    What should you do with the vitamin study? Should you state your hypothesis in a directional or non-directional manner? Which would be the more meaningful alternative hypothesis for your purposes?

    This is important to note: We MAY in fact find differences between the population means in examining our competing (null versus alternative) hypotheses. However, one of the functions of the statistical tests will be to hold us to a set size of difference between the means of the two (or more) groups we are comparing. If the difference is not SIGNIFICANT according to the requirements of our procedures, we are compelled to assume that those differences are due to RANDOM SAMPLING ERROR.

    It is also the custom that we set up our null and alternative hypotheses BEFORE we collect and analyze our data — this is termed “a priori”. It is part of the ethics of doing research.

    So one needs to determine what the null and alternative hypotheses will be (conceptually and statistically) and then determine if the alternative will be stated directionally or non-directionally.

    Whether one chooses a directional or non-directional expression will depend on the following:

    – Is there strong evidence before beginning that the difference will be in a particular (positive or negative direction)? Just hoping it will be in a particular direction doesn’t mean one should use a directional alternative hypothesis.

    -If all that can be reasonably suspected is that the two population means will be different, it is more prudent to use a non-directional alternative hypothesis. It is tougher to obtain significant results under these conditions. This makes for more reliable results.

    WHEN DO WE REJECT THE NULL HYPOTHESIS?

    Every time you and your friend get together and work on your latest exciting module, you go out for dinner together afterwards and discuss all the fascinating things that you have learned. You decide to make it your custom to flip a coin to see who will pick up the check. Your friend always provides the coin.

    The first day your friend wins. But you are not especially paranoid, so you don’t suspect anything yet. But then he/she wins three more times in a row. Your nerves and your budget are starting to suffer. And you know the odds of this happening are fairly low (the probability is .5 * .5 * .5 * .5* = .0625). And it continues on for additional six times, so the odds are really becoming as we say astronomical that your friend is getting treated to free dinner every week just because of random forces in the universe. What are you starting to suspect? (Your suspicion that your friend is supplying a loaded coin is your ALTERNATIVE HYPOTHESIS. What is your NULL?)

    Here is a decision table for the possibilities you face:

    REALITY (columns)
    YOUR DECISION (rows)
    FRIEND IS CHEATING FRIEND IS NOT CHEATING
    YOU THINK FRIEND IS CHEATING YOU ARE CORRECT* (statistical power) YOU GET A BLACK EYE (“alpha error”)
    YOU THINK FRIEND IS NOT CHEATING YOU ARE A SUCKER (“beta error”) YOU ARE CORRECT**

    *In this case, you will probably get a black eye also, but you will be ahead again financially.

    **In this case, your friend is an expensive date.

    So what is our null hypothesis? That the coin you guys are using is just fine, that no cheating is going on, the coin is not loaded or two-headed. The 10 wins in a row that your friend has pulled off, the subsidized dinners have been purely a matter of chance.

    What is our alternative hypothesis? That the 10 wins are not a matter of chance; that your friend is an operator and is providing a coin that helps him/her be assured of winning the flip and getting free food. (The probability of 10 wins in a row, according to my calculations, is .1%).

    Even so, you just can’t KNOW the true proportion of correct guesses from your friend. You could start flipping the coin in question and have your friend guess long into the night and never exhaust the population of possible outcomes. There is an unlimited population of coin toss predictions. But we can still do hypothesis testing even with these unknown values of our population parameters.

    Again, the wording of the null hypothesis is a way of using language to enforce a certain scientific discipline. Everyone likes to make cool discoveries; the problem is when we are looking for certain results we tend to find them (the problem of wishful thinking). By constructing a null hypothesis, we encourage ourselves to not be too invested in our expectations. We also don’t use the term “proven.” We reject our alternative hypothesis if the results are significant. We “fail to reject” it if they are not.

    If our results are not significant (that is, we fail to reject the null hypothesis) we have not closed the door on our hypothesis either. Failure to reject the null hypothesis DOES NOT automatically mean that the null hypothesis is correct. It may also mean that in this case we did not collect information sufficient to reject it. We expect in doing scientific work that we will have to repeat our procedures until we have a convincing and consistent set of results. If we do reject the null, we have not PROVED the alternative — our acceptance of the alternative hypothesis is conditional and provides the basis and justification for further research.

    People do abuse the word significant. Any time research results are reported, in academic or popular magazines, there is the potential for this abuse or misunderstanding of statistical language. So when you read a statement such as: Researchers have found a significant relationship between eating alfalfa at every meal and weight loss — don’t take it at face value. If the study is reported somewhere in full, check the tests and the significance levels set by the researchers. There is a big difference between significant results at an alpha level of .68 and a level of .01. The smaller the alpha level, the more meaningful is the term “significant.”

    For the coin flipping, freeloading friend, you set an alpha level of .001. You have reached this level of significance at the tenth toss. You reject the null hypothesis, take the black eye, and find someone else with whom to review quantitative reasoning over dinner.

    THE TWO TYPES OF HYPOTHESIS TESTING ERROR

    In science and in life, we are always dealing with a large amount of annoying uncertainty. That is just how it goes. If we waited to be certain in all situations before speaking or acting, we would never say or do anything.

    When we do our probabilistic testing of hypotheses, we can expect to make errors periodically. There are two types of errors that we make, and from which we hope to learn.

    TYPE I OR ALPHA ERROR

    We reject the null hypothesis when it is in reality true. (We accuse our friend of using a loaded coin when our friend was really playing fair.)

    The alpha level not only sets the cutoff point at which we will reject the null hypothesis; it also sets the likelihood of committing this type of error. When we set an alpha level of .05, we are committing ourselves to a 5% chance of rejecting a true null hypothesis, at a p-level of .01 it is 1%, at .001, it is .1%. The lower the alpha level, the less chance of committing this type of error. But this certainty has a cost. The lower we set the alpha level, the greater the chance that we will commit a beta error and accept a false null hypothesis. We sacrifice what we call POWER in order to use a very low alpha level. This other competing type of error, which we must try to balance out the chance of, is:

    TYPE II OR BETA ERROR

    We fail to reject the null hypothesis when it is in reality false. (We assume our friend is playing fair when in fact he or she is playing with a loaded coin.)

    Beta is the probability of making this type of error. The implication of this type of error is that in our quest to be scrupulous about not accepting a chance or random effect as an actual systematic effect of interest, we ignore a possible effect of interest.

    We need our test and our significance levels to allow us a sufficient level of power as well as carefully guard against finding effects that are not there. Power is the potential for our research and tests to reject an actually false null hypothesis. If we set our alpha levels so low that we have little or no chance of doing so, it is not a good thing.

    Think of alpha and beta as lying on a graph with a diagonal: as one goes up, the other goes down. The .05 significance level, in most cases, is regarded as the best compromise level between alpha and beta errors, although significance of results at .01 level is generally more highly prized in the world of research. Although this is usually a very good thing, let’s say your research results show that the effect you were looking at would have occurred only 2% of the time by chance. At a an alpha level of .01, you still fail to reject the null hypothesis even though there has been an effect from your treatment. You probably did not allow yourself sufficient power in setting the alpha level. You commit a beta error.

    The alpha level, along with the number of subjects, is used in conjunction with statistical tables in order to set a “critical” value for our statistic (“t” for t tests and “F” for ANOVA.) If our testing procedure yields a t or F value higher than the critical value, we can reject the null. If our obtained t or F (obtained from the test) does not exceed the critical, we FAIL TO REJECT the null (by custom we do not “Accept” because nothing has been proven and a future research study may not replicate our non-significant results.

    Because of the reciprocal nature of these types of error, we need to carefully consider the consequences of each before we set our alpha level.

    Two Main Types of Research Errors

    1. Random errors can be minimized but cannot be avoided. For example, they may be related to sampling variability or measurement precision. Random errors can be determined and can be addressed using statistical analysis.

    2. Systematic errors are also called bias. There are many causes of bias, including complex human factors. Because of this, systematic errors or bias must be considered when designing any research study in order to avoid false differences between observed and true values.

    PART III
    Main Categories of Research Bias

    Selection biases, which may result in the subjects in the sample not being representative of the population you intend to study.
    Measurement biases, which include issues related to how the outcome of interest was measured.
    Intervention (exposure) biases, which involve differences in how the treatment or intervention was carried out, or how participants were exposed to the factor of interest.
    Selection Biases happen when two groups are compared but they are different in some way. Those differences may influence the outcome of the research. Examples include volunteer or referral bias, and nonrespondent bias. By definition, nonequivalent group designs also introduce selection bias.

    Volunteer or referral bias happens because people who volunteer to participate in a study (or who are referred to it) are often different than people who do not volunteer or are not referred to the study. This bias usually favors the treatment group, because volunteers tend to be more motivated and concerned about their health.

    Non-respondent bias is when the people who do not respond to a survey differ in important ways from the people who do respond. This bias can work in either direction.

    Measurement Biases involve systematic error that can happen when researchers collect data. Some examples include instrument bias, insensitive measure bias, expectation bias, recall or memory bias, attention bias, and verification or work-up bias (UMDNJ, n.d.).

    Instrument bias happens when calibration errors lead to inaccurate measurements being recorded. An example is an unbalanced scale being used to weigh people.

    Insensitive measure bias happens when the measurement tool(s) used are not sensitive enough to detect what might be important differences in the variable of interest.

    Expectation bias happens when masking or blinding is not carried out. This means the researchers know which group is the control and which is the intervention group. Observers may err in measuring data toward the expected outcome. This bias usually favors the intervention group.

    Recall or memory bias can be a problem if outcomes being measured require that subjects recall past events. People may recall positive events more than negative events. Some participants may be questioned differently or engaged in more conversation than others, which could improve their recollections more than others.

    Attention bias happens when people who know they are part of a study and are getting more attention, give more favorable responses or perform better than people who are unaware of the study’s intent.

    Intervention (Exposure) Biases

    Intervention or exposure biases include contamination bias, co-intervention bias, timing bias(es), compliance bias, withdrawal bias, and proficiency bias. This type of bias is most often associated with research that compares groups.

    Contamination bias happens when members of the ‘control’ group inadvertently receive the treatment or are exposed to the intervention. This can potentially minimize the difference in outcomes between the two groups.

    Co-intervention bias occurs when some participants are receiving some other interventions at the same time as the study treatment, but those other interventions are not accounted for.

    Timing bias depends on the timing of the study. If an intervention is provided over a long period of time, maturation could be the cause for improvement. If treatment is of very short duration, there may not have been sufficient time for a noticeable effect.

    Compliance bias occurs when participants differ in their levels of adherence to the planned intervention and this affects the study outcomes.

    Withdrawal bias happens when people who drop out of the study differ significantly from people who continue to participate and complete the study.

    Proficiency bias happens when the interventions or treatments are not applied equally to subjects. This may be due to skill or training differences among personnel and/or differences in resources or procedures used at different sites.

    UMDNJ (nd). Major Sources of Bias in Research Studies. Retrieved from http://www.umdnj.edu/idsweb/shared/biases.htm

    T-TESTS

    A t-test allows us to compare means of two samples. The difference in the size of the means needs to be large enough to achieve statistical significance.

    Sometimes we are comparing the mean of a treatment group or other group of interest to an already well-established, known population mean (a one-sample t-test.)
    Often we are comparing a treatment group to a control group, in which case we would use an independent two-sample t-test.
    Sometimes we are comparing the same group to itself – the classic “before and after” model. The two samples are the same sample observed or measured before and after the treatment. In this case we use a dependent or correlated t-test
    ANOVA (Analysis of Variance)

    When we have more than two means to compare, or different levels of treatment, a procedure called Analysis of Variance is used. Analysis of variance uses a comparison of the amount of dispersion between the groups to the amount of dispersion within the groups, instead of a direct comparison of means, which is considered inappropriate with more than two groups. The amount of variation between the groups is attributed to the treatment or change in conditions between the groups; the amount of variation within the groups is attributed to ERROR or random effects that would render our results meaningless. The ratio of between group variation to within group variation needs to be large enough to achieve statistical significance. If significant results are achieved, direct comparisons of the means can be done using a variety of post-hoc or “after the fact” statistical tests.
    REMEMBER WHAT IS IMPORTANT IN USING STATISTICAL TESTS:

                                                                                                                                      Order Now