Asking for help with data transformation for normal distribution

本文由 brookstream2014-04-11 发表於 "语料库与外语教学" 讨论区

  1. Hello all! Could anyone take a look at the SPSS data file I've attached here? It has two variables--a dependent variable for "noun phrase length", and an independent variable for "corpus name". I need to do an Independent Samples T-test to see if there is a significant difference between the two corpora in noun phrase length. However, data in the "length" variable is non-normally distributed, so I need to do a data transformation for a parametric T-test to be performed. I've tried the Lg10 function to transform the data, but the transformed data is still not normally distributed. Can I ask for your advice on what functions should be used for an effective transformation? Could you please directly work on the data and upload the transformed data?

    If there's no way to transform the data to normality, does it mean I can only use non-parametric tests for my data?

    Thank you very so much for your help!
     

    附件文件:

  2. 回复: Asking for help with data transformation for normal distribution

    If the random variable is not normally distributed, how can you transform it to be so?

    I think transformation such as standardization only affects the scale of the variable instead of the distribution of it.

    Yes, I think you should consider using a statistical test other than T-test for your data.

    You may want to refer to chapter 1.2.1 (one nominal independent variable and one interval dependent variable) of Stefan Gries's book Statistics for Linguistics with R for more information.

    p.s.

    I can't open the attached file because there is no SPSS package on my computer. It's better if you could upload your raw data in plain text format for more people to see it.

    By the way, I think R is much better than SPSS in dealing with data analysis in linguistic studies. Why not give it a try?
     
  3. chrisyang

    chrisyang 普通会员

    回复: Asking for help with data transformation for normal distribution

    To brookstream:
    What is the logic of transforming the raw data before doing an independent t-test?
     
  4. chrisyang

    chrisyang 普通会员

    回复: Asking for help with data transformation for normal distribution

    Using the following commands, I imported the SPSS data provided into R.
    > library(foreign)
    > rawdata<-read.spss(file.choose(),to.data.frame=TRUE)

    To brookstream:
    Are the data in the file attached below the same as yours?
     

    附件文件:

    • data.txt
      文件大小:
      10.2 KB
      浏览:
      7
  5. 回复: Asking for help with data transformation for normal distribution

    Hi, qhdjason. Thanks for the reply. Since I'm not statistics savvy I just followed the usual practice of transforming skewed data using the Lg10 function, without knowing much of its rationale.

    Thank you for pointing me to using R for my data although I'm not sure if I know how to run it. Anyway, I'll give it a go.

    Fortunately, chrisyang has already helped to upload the pure text file, which is precisely what the data looked like in its original. Thanks, chrisyang!
     
  6. 回复: Asking for help with data transformation for normal distribution

    Hi, chrisyang. Thanks a lot for giving feedback to my problem. Like I said, I just followed the usual practice without much understanding of the underlying rationale for the data transformation.

    Grateful to you for uploading the pure text file. Yes, it is what I got with my data. So did you get any results of the comparison using R? Look forward to knowing that!

    Cheers