 |
It is important to practise on
some real datasets. One useful resource is StatLib . Here we analyse the TUMOR
data set contributed by Terry Therneau. One reason of choosing this data set is that it is
small enough for easy handling (n=86). The purpose is purely for computer practice; not to
examine the quality or findings of the study. The bladder tumor data file contains 8 variables (names): treatment group
(group), follow-up time (futime), pre-treatment number of tumors (number), largest
pre-treatment tumor size (size), and times to first, second, third, and fourth
recurrences. Only time to first recurrence is analysed in this practice. We used a word
processor to edit the raw file so that each row represents one subject, each line contains
only 5 values (removing the last 3 recurrences). If the fifth value was left blank
(meaning no recurrence) we replace the blank by a dot (.), preceded by a space. All
comments and description in the file were also removed. The file is saved as
c:\data\tumor.dat (a text /ascii file). |