Sunday, June 29, 2014

stone flakes IV

In this post I want to try something new, a causal graphical model. The aim here is just as much to get myself a feel what these things do as to understand how the stone flakes data fit together.

Data

Data are stone flakes data which I analyzed previously. The first post was clustering, second linking to hominid type, third regression. Together these made for the bulk of a standard analysis. In this new analysis the same starting data is used.
r2 <- read.table('StoneFlakes.txt',header=TRUE,na.strings='?')
r1 <- read.table('annotation.txt',header=TRUE,na.strings='?')
r12 <- merge(r1,r2)

Packages

The main package used is pcalg (Methods for graphical models and causal inference). Even though it lives on cran, it requires RBGL (An interface to the BOOST graph library) which lives on Bioconductor. Plots are made via Rgraphvis (Provides plotting capabilities for R graph objects), Bioconductor again, which itself has the hard work done by graphviz, which, on my linux machine, is a few clicks to install. 
library('pcalg')
library('Rgraphviz')


First analysis

I am just following the vignette here, to get some working code.
rx <- subset(r12,,names(r2)[-1])
rx <- rx[complete.cases(rx) & !(r12$ID %in% c('ms','c','roe','sz','va','arn')),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
    p = ncol(rx), alpha = 0.01)
png('graph1.png')
plot(pc.gmG, main = "")


personally I dislike this plot since you have to know which variable is which number. I don't think that is acceptable for things one wants to share. Since I could not find documentation how to modify this via the plot statement, I took the ugly road of directly modifying an S4 object; pc.Gmc.
pc.gmG@graph@nodes <- names(rx)
names(pc.gmG@graph@edgeL) <- names(rx)
png('graph2.png')
plot(pc.gmG, main = "")
dev.off()

This makes some sense looking at the variable names.
RTI (Relative-thickness index of the striking platform) is connected to WDI (Width-depth index of the striking platform). PSF (platform primery (yes/no, relative frequency)) is related to FSF (Platform facetted (yes/no, relative frequency)). PSF is also related to PROZD (Proportion of worked dorsal surface (continuous)) which then goes to ZDF1 (Dorsal surface totally worked (yes/no, relative frequency)). ZDF1 is also influenced by FLA (Flaking angle (the angle between the striking platform and the splitting surface)).

Second analysis

Much as like this analysis, it does not lead to a connection between flakes on one hand and age or group on the other hand. Since the algorithm assumes normal distributed variables, group is out of the question. Log(-age) seems to be closest to normal distributed.
rx <- subset(r12,,names(r2)[-1])
rx$lmage <- log(-r12$age)
rx <- rx[complete.cases(rx) & !(r12$ID %in% c('ms','c','roe','sz','va','arn')),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
    p = ncol(rx), alpha = 0.01)
pc.gmG@graph@nodes <- names(rx)
names(pc.gmG@graph@edgeL) <- names(rx)
plot(pc.gmG, main = "")


T

Adding age links the two parts, while keeping most of the previous graph unchanged. The causal link however, seems reversed, does age cause change in flakes or do changes in flakes cause age? Nevertheless, it does show a different picture than before. In linear regression FSF and LBI contributed, but there I had not removed the outliers. In this approach FSF features, but is in its turn driven by PSF. The other direct influence is ZDF1, which is now also driven by WDI.

Third analysis

It is probably pushing the limits of what is normal distributed, but there are two binairy variables, stone material (1=flint, 2=other) and site (1=gravel pit, 0=other).
rx <- subset(r12,,c('site','mat',names(r2)[-1]))
rx$lmage <- log(-r12$age)
rx <- rx[complete.cases(rx) & !(r12$ID %in% c('ms','c','roe','sz','va','arn')),]
suffStat <- list(C = cor(rx), n = nrow(rx))
pc.gmG <- pc(suffStat, indepTest = gaussCItest,
    p = ncol(rx), alpha = 0.01)
pc.gmG@graph@nodes <- names(rx)
names(pc.gmG@graph@edgeL) <- names(rx)
plot(pc.gmG, main = "")

This makes a link from material to RTI (relative thickness) and a connection site to FLA (flaking angle), see below.
# site (1=gravel pit, 0=other)
boxplot(FLA ~ c('other','gravel pit')[site+1],
    data=r12,
    ylab='FLA',
    xlab='site')

#stone material (1=flint, 2=other)
boxplot(RTI ~ c('flint','other')[mat],
    data=r12,
    ylab='RTI',
    xlab='mat')

No comments:

Post a Comment