Sunday, October 18, 2015

Trying to optimize

I wanted to try some more machine learning. On Kaggle there is a competition How Much Did It Rain? II. This is quite a bigger data set than Titanic. To quote from Kaggle:
Rainfall is highly variable across space and time, making it notoriously tricky to measure. Rain gauges can be an effective measurement tool for a specific location, but it is impossible to have them everywhere. In order to have widespread coverage, data from weather radars is used to estimate rainfall nationwide. Unfortunately, these predictions never exactly match the measurements taken using rain gauges.

Data

On the data themselves:
To understand the data, you have to realize that there are multiple radar observations over the course of an hour, and only one gauge observation (the 'Expected'). That is why there are multiple rows with the same 'Id'.
I have downloaded the data and at this point am just experimenting with them. It is quite a big data set: there are 9125329 rows in the training set. My idea was to do 'something' per record, combine the records of one hour to get a prediction. The 'something' is as yet undefined. The idea to combine by Id is supposed to be retained.
What became clear pretty quickly is that everything is slow with this amount of data. Hence for now I will use only 10% of the training data. For ease of access the data are sitting in a R data set.
load('aaa3.RData')
###
# take 10% of data
rawdata <- rawdata[rawdata$Id < quantile(rawdata$Id,.1),]
# extract keys per hour . Id & Expected
# rawdata$Id!=c(0,rawdata$Id[1:(nrow(rawdata)-1) 
# is the R way to write the SAS code: by Id; If first.Id;
r1b <- rawdata[rawdata$Id!=c(0,rawdata$Id[1:(nrow(rawdata)-1)]),c(1,24)]
Id <- factor(rawdata$Id)

Model

To get but to get an idea of the process I started with linear regression, but that is just a temporary approach. For linear regression there are 22 parameters, for 21 observed values and an intercept. Prediction per row follows from a simple matrix multiplication. The model including estimation of the error in the fit sits in small R function. As preparation a column of ones is added to the x data. The summary per Id can be done pretty quickly and easy via the group_by() and summarise() functions from dplyr.
Based on the current results I have decided that such a function will have to be transferred to C++ or such in order to have a decent computation time. But that is for a future time, it has been quite some years that I programmed in C or Fortran, I'll need a refresher first, luckily edX has a course 'Introduction to C++' running right now.
r1m <- as.matrix(rawdata[,c(-1,-2,-24)])
rm(rawdata) # control memory usage
r1m <- cbind(rep(1,nrow(r1m)),r1m) # add column of 1
r1m[is.na(r1m) ] <- 0
betas <- rep(1,ncol(r1m))
#myerr calculated mean prediction per Id and compares with Expected values
myerr <- function(betas) {
  pred <- data.frame(Id=Id,
          pred=as.numeric(tcrossprod(betas,r1m))) %>%
      group_by(.,Id) %>%
      summarise(.,m=mean(pred)) 
  sum(abs(pred$m-r1b$Expected))
}
#mmyerr is myerr for maximization
mmyerr <- function(betas) -myerr(betas) 

Parameter estimation & optimization

The problem has now been reduced to getting the parameters which give the lowest prediction error. just throwing this in optim() did not lead to satisfactory results. So this post gives some experiments with alternate approaches. So, I played with some of these, and for this post ran it all to get decent data. The table shows the quick summary.

package function time result converged
stats optim 771 1805459 No

optim (BFGS) 729 1678736 Yes

optim(CG) 5527 1678775 Yes
adagio simpleEA 623 1722928 No
dfoptim hjk 4289 1678734 Yes
GA ga 589 1775910 NA
It appears that optim() (Nelder Mead) did not function at all. In contrast, using the BFGS option gave quite an improvement. Using that would imply that a fast differentiation could improve the result quite a bit. The quite decent result for simpleEA() is actually a bit of a disappointment. It started with a box and converged to the center of that box. Whatever I changed in the options, it would always end in the same center. hjk() functions but is quite slow. Finally, ga() is just a bit too slow and has difficulty finding the actual optimum.
> system.time(
+     ooptim <- optim(betas,
+         myerr,
+         control=list(maxit=5000)))
   user  system elapsed 
771.204   0.216 770.743 
> ooptim
$par
 [1]  5.91962946 -0.26798071 -1.19797656 -0.13297181  0.51015756  1.49454289
 [7]  0.75377772  0.47774932 -0.54917591 -1.36237144  9.32248883 -4.58509259
[13]  1.53973413 -1.28367152  1.57070010  2.80960763 -1.53639162  0.24043815
[19]  0.55385066  0.65754031  2.25820673 -0.09389256

$value
[1] 1805459

$counts
function gradient 
    5001       NA 

$convergence
[1] 1

$message
NULL

> system.time(
+     ooptimBFGS <- optim(betas,
+         myerr,
+         control=list(maxit=5000),
+         method='BFGS'))
   user  system elapsed 
730.253   0.226 729.741 
> ooptimBFGS
$par
 [1] -0.193572528  0.036277278  0.018915001  0.055201253 -0.005343143
 [6]  0.048512761  0.018770974 -0.067885397  0.006576984  0.009960527
[11]  0.354908920 -0.417610915  0.004300646 -0.313622718  0.013003956
[16]  0.030448388  0.004622354  0.026076960  0.022466472  0.015661149
[21]  0.093649287 -0.056203122

$value
[1] 1678736

$counts
function gradient 
     626       94 

$convergence
[1] 0

$message
NULL

> system.time(
+     ooptimCG <- optim(betas,
+         myerr,
+         control=list(maxit=5000),
+         method='CG'))
    user   system  elapsed 
5531.910    1.397 5527.053 
> ooptimCG
$par
 [1] -0.112575961  0.030875504  0.019111803  0.06Yes0134187 -0.009593463
 [6]  0.051076794  0.019737360 -0.077191287  0.016508372  0.003658111
[11] -0.071932138 -0.036570553 -0.051100594 -0.179869515 -0.010209693
[16]  0.066388807  0.005459227  0.039691325  0.020881573  0.020044547
[21]  0.068064112 -0.051825924

$value
[1] 1678775

$counts
function gradient 
    2283      772 

$convergence
[1] 0

$message
NULL

> library(adagio)
> system.time(
+     osimpleEA <- simpleEA(myerr,
+         lower=rep(-5,ncol(r1m)),
+         upper=rep(5,ncol(r1m))))
   user  system elapsed 
624.432   0.176 623.857 
> osimpleEA
$par
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

$val
[1] 1722928

$fun.calls
[1] 4060

$rel.scl
[1] 0.625

$rel.tol
[1] 0

> library(dfoptim)
> system.time(
+     ohjk <- hjk(betas,myerr)
+ )
    user   system  elapsed 
4291.935    2.036 4289.022 
> ohjk
$par
 [1] -0.188190460  0.035865784  0.020156860  0.056446075 -0.005027771
 [6]  0.046699524  0.018516541 -0.069412231  0.006889343  0.010589600
[11]  0.257308960 -0.412174225  0.065429688 -0.279411316  0.014663696
[16]  0.027637482  0.011859894  0.020298004  0.023769379  0.013179779
[21]  0.092247009 -0.056926727

$value
[1] 1678734

$convergence
[1] 0

$feval
[1] 28245

$niter
[1] 19

> library(GA)
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Package 'GA' version 2.2
Type 'citation("GA")' for citing this R package in publications.
> system.time(
+     oga <- ga(type='real-valued',
+         fitness=mmyerr,
+         min=rep(-5,ncol(r1m)),
+         max=rep(5,ncol(r1m))))
Iter = 1  | Mean = -10979924  | Best = -2576076 
Iter = 2  | Mean = -7162806  | Best = -2576076 
Iter = 3  | Mean = -4932565  | Best = -2318262 
Iter = 4  | Mean = -4639204  | Best = -2237286 
Iter = 5  | Mean = -3638851  | Best = -2237286 
Iter = 6  | Mean = -3689674  | Best = -2068472 
Iter = 7  | Mean = -3226292  | Best = -1990599 
Iter = 8  | Mean = -2977074  | Best = -1876791 
Iter = 9  | Mean = -2775401  | Best = -1876791 
Iter = 10  | Mean = -2332140  | Best = -1865152 
Iter = 11  | Mean = -2437267  | Best = -1817242 
Iter = 12  | Mean = -2471635  | Best = -1799720 
Iter = 13  | Mean = -2168155  | Best = -1799720 
Iter = 14  | Mean = -2096314  | Best = -1799720 
Iter = 15  | Mean = -2253981  | Best = -1799720 
Iter = 16  | Mean = -2049947  | Best = -1799720 
Iter = 17  | Mean = -2089075  | Best = -1789813 
Iter = 18  | Mean = -2093853  | Best = -1789813 
Iter = 19  | Mean = -1976973  | Best = -1789813 
Iter = 20  | Mean = -1933447  | Best = -1789813 
Iter = 21  | Mean = -2000920  | Best = -1789813 
Iter = 22  | Mean = -1950218  | Best = -1788022 
Iter = 23  | Mean = -1886263  | Best = -1788022 
Iter = 24  | Mean = -1914422  | Best = -1788022 
Iter = 25  | Mean = -2024395  | Best = -1788022 
Iter = 26  | Mean = -2016097  | Best = -1788022 
Iter = 27  | Mean = -1947956  | Best = -1788022 
Iter = 28  | Mean = -1900507  | Best = -1787331 
Iter = 29  | Mean = -2066703  | Best = -1787331 
Iter = 30  | Mean = -1835251  | Best = -1787331 
Iter = 31  | Mean = -1942032  | Best = -1787331 
Iter = 32  | Mean = -2019920  | Best = -1787331 
Iter = 33  | Mean = -1891563  | Best = -1787331 
Iter = 34  | Mean = -1838287  | Best = -1787331 
Iter = 35  | Mean = -1925757  | Best = -1787331 
Iter = 36  | Mean = -1972783  | Best = -1787331 
Iter = 37  | Mean = -2053707  | Best = -1787331 
Iter = 38  | Mean = -1999298  | Best = -1787331 
Iter = 39  | Mean = -2057161  | Best = -1787331 
Iter = 40  | Mean = -2016452  | Best = -1787331 
Iter = 41  | Mean = -2106619  | Best = -1787331 
Iter = 42  | Mean = -1848483  | Best = -1787331 
Iter = 43  | Mean = -1952123  | Best = -1784611 
Iter = 44  | Mean = -1911939  | Best = -1784611 
Iter = 45  | Mean = -1867947  | Best = -1784611 
Iter = 46  | Mean = -1961794  | Best = -1784611 
Iter = 47  | Mean = -1878957  | Best = -1784611 
Iter = 48  | Mean = -2004529  | Best = -1784611 
Iter = 49  | Mean = -1819421  | Best = -1784611 
Iter = 50  | Mean = -1904183  | Best = -1784611 
Iter = 51  | Mean = -1870062  | Best = -1783617 
Iter = 52  | Mean = -1948733  | Best = -1782194 
Iter = 53  | Mean = -2023928  | Best = -1782194 
Iter = 54  | Mean = -2008359  | Best = -1782194 
Iter = 55  | Mean = -2022090  | Best = -1782194 
Iter = 56  | Mean = -2086763  | Best = -1782194 
Iter = 57  | Mean = -1959158  | Best = -1782194 
Iter = 58  | Mean = -1849578  | Best = -1782194 
Iter = 59  | Mean = -2119746  | Best = -1782194 
Iter = 60  | Mean = -2218175  | Best = -1780252 
Iter = 61  | Mean = -2096596  | Best = -1780252 
Iter = 62  | Mean = -1979876  | Best = -1779764 
Iter = 63  | Mean = -2172198  | Best = -1779187 
Iter = 64  | Mean = -1876837  | Best = -1779187 
Iter = 65  | Mean = -1880262  | Best = -1779187 
Iter = 66  | Mean = -1830083  | Best = -1779187 
Iter = 67  | Mean = -1908915  | Best = -1779187 
Iter = 68  | Mean = -1994229  | Best = -1779187 
Iter = 69  | Mean = -2090335  | Best = -1779187 
Iter = 70  | Mean = -2314393  | Best = -1779187 
Iter = 71  | Mean = -2072167  | Best = -1779187 
Iter = 72  | Mean = -1901014  | Best = -1779187 
Iter = 73  | Mean = -1805347  | Best = -1779187 
Iter = 74  | Mean = -1868138  | Best = -1779187 
Iter = 75  | Mean = -2054748  | Best = -1779187 
Iter = 76  | Mean = -1894803  | Best = -1779187 
Iter = 77  | Mean = -1789384  | Best = -1779187 
Iter = 78  | Mean = -1919701  | Best = -1779187 
Iter = 79  | Mean = -1885417  | Best = -1779187 
Iter = 80  | Mean = -1911716  | Best = -1779187 
Iter = 81  | Mean = -1951938  | Best = -1779187 
Iter = 82  | Mean = -1968106  | Best = -1779187 
Iter = 83  | Mean = -2118552  | Best = -1778650 
Iter = 84  | Mean = -1891108  | Best = -1778650 
Iter = 85  | Mean = -1967131  | Best = -1778650 
Iter = 86  | Mean = -1971338  | Best = -1778650 
Iter = 87  | Mean = -1798790  | Best = -1778508 
Iter = 88  | Mean = -1990970  | Best = -1778440 
Iter = 89  | Mean = -2103851  | Best = -1778440 
Iter = 90  | Mean = -2176462  | Best = -1778440 
Iter = 91  | Mean = -1852161  | Best = -1778440 
Iter = 92  | Mean = -1913977  | Best = -1778440 
Iter = 93  | Mean = -2110709  | Best = -1778440 
Iter = 94  | Mean = -1948281  | Best = -1778440 
Iter = 95  | Mean = -2221610  | Best = -1777905 
Iter = 96  | Mean = -2276691  | Best = -1775910 
Iter = 97  | Mean = -2141834  | Best = -1775910 
Iter = 98  | Mean = -1871241  | Best = -1775910 
Iter = 99  | Mean = -1829631  | Best = -1775910 
Iter = 100  | Mean = -1914998  | Best = -1775910 
   user  system elapsed 
590.256   0.418 589.855 
> summary(oga)
+-----------------------------------+
|         Genetic Algorithm         |
+-----------------------------------+

GA settings: 
Type                  =  real-valued 
Population size       =  50 
Number of generations =  100 
Elitism               =  2 
Crossover probability =  0.8 
Mutation probability  =  0.1 
Search domain 
    x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21
Min -5 -5 -5 -5 -5 -5 -5 -5 -5  -5  -5  -5  -5  -5  -5  -5  -5  -5  -5  -5  -5
Max  5  5  5  5  5  5  5  5  5   5   5   5   5   5   5   5   5   5   5   5   5
    x22
Min  -5
Max   5

GA results: 
Iterations             = 100 
Fitness function value = -1775910 
Solution               = 
            x1        x2        x3        x4         x5        x6       x7
[1,] 0.2099863 0.3481724 0.3910016 -1.416855 -0.3929375 0.5143907 0.319798
           x8        x9        x10         x11       x12      x13        x14
[1,] 1.795368 -0.445552 -0.8136325 -0.01177531 -0.823179 0.744795 -0.7005772
           x15       x16       x17         x18        x19       x20       x21
[1,] 0.3549757 0.4885278 0.3535035 -0.04773755 -0.5155096 0.3308792 0.1921899
           x22
[1,] 0.3446834

Sunday, October 4, 2015

Predicting Titanic deaths on Kaggle VII: More Stan

Two weeks ago I used STAN to create predictions after just throwing in all independent variables. This week I aim to refine the STAN model. For this it is convenient to use the loo package (Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models). See also the paper by Aki Vehtari, Andrew Gelman and Jonah Gabry.
Since the package does the heavy lifting, it only remains to wrap it a function so I can quickly compare some models. A potential next step is to automate some more, but I did not do that pending current results.

Data

Same as last time.

Model

To keep the model similar as last time, I need to get a full design matrix for each independent variable in the model. So I made a mechanism which takes a model formulation and creates both the design matrix and a bunch of indices to keep track which column corresponds to which part of the model. To be specific, terms contains 1 to nterm if the corresponding column in xmat is from variable 1 (intercept) to the last variable. This sits in the function des.matrix.
The generated quantities block is purely for the LOO statistic.
It is preferred to compile the model only once, hence fit1 is calculated beforehand. Having done that preparation, MySmodel  is a function which does model fitting, LOO statistic and output it all in one step. In this function I can just drop in the formula and get something usable as output, so I can easily examine a bunch of models. It seemed to me that forward selection was a suitable way to examine the model space. I know it is not ideal, but at this point I mainly want to know if this actually will function.

Results

To my surprise, Title was the parameter which gave the best predictions. I had expected sex to play that role.
Survived ~ Title -445.4972 16.46314 
Computed from 4000 by 891 log-likelihood matrix

         Estimate   SE
elpd_loo   -445.5 16.5
p_loo         4.1  0.2
looic       891.0 32.9

All Pareto k estimates OK (k < 0.5)
The next variable was passenger class

Survived ~ Title + Pclass -395.5926 17.42705 
Unfortunately after adding a few independent variables things gave only minor improvements. This os not because of anything faulty, I made a classical mechanism to leave 10% out and predict the remainder. Those results were similar, but took more time and showed more run to run variation in the results. The only true advantage was that it gave results on the same scale as previous cross validations. 
I expanded the model formula to about 10 terms. At that point, the expected prediction error decreased so slow that I decided on an eight term model. (Title + Pclass + sibsp + Title:Pclass + Embarked + oe + Title:sibsp + parch). The functions myPmodel and mySpred refit the model and perform the actual predictions. The result was a disappointing 0.78 on Kaggle. A minor improvement on the previous STAN result, but boosting is still better.

Code

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
library(loo)

# read and combine
train <- read.csv('train.csv')
train$status <- 'train'
test  <- read.csv('test.csv')
test$status <- 'test'
test$Survived <- NA
tt <- rbind(test,train)

# generate variables
tt$Embarked[tt$Embarked==''] <- 'S'
tt$Embarked <- factor(tt$Embarked)
tt$Pclass <- factor(tt$Pclass)
tt$Survived <- factor(tt$Survived)
tt$age <- tt$Age
tt$age[is.na(tt$age)] <- 999
tt$age <- cut(tt$age,c(0,2,5,9,12,15,21,55,65,100,1000))

tt$Title <- sapply(tt$Name,function(x) strsplit(as.character(x),'[.,]')[[1]][2])
tt$Title <- gsub(' ','',tt$Title)
tt$Title[tt$Title=='Dr' & tt$Sex=='female'] <- 'Miss'
tt$Title[tt$Title %in% c('Capt','Col','Don','Sir','Jonkheer','Major','Rev','Dr')] <- 'Mr'
tt$Title[tt$Title %in% c('Lady','Ms','theCountess','Mlle','Mme','Ms','Dona')] <- 'Miss'
tt$Title <- factor(tt$Title)
# changed cabin character
tt$cabchar <- substr(tt$Cabin,1,1)
tt$cabchar[tt$cabchar %in% c('F','G','T')] <- 'X';
tt$cabchar <- factor(tt$cabchar)
tt$ncabin <- nchar(as.character(tt$Cabin))
tt$cn <- as.numeric(gsub('[[:space:][:alpha:]]','',tt$Cabin))
tt$oe <- factor(ifelse(!is.na(tt$cn),tt$cn%%2,-1))
tt$Fare[is.na(tt$Fare)]<- median(tt$Fare,na.rm=TRUE)
tt$ticket <- sub('[[:digit:]]+$','',tt$Ticket)
tt$ticket <- toupper(gsub('(\\.)|( )|(/)','',tt$ticket))
tt$ticket[tt$ticket %in% c('A2','A4','AQ3','AQ4','AS')] <- 'An'
tt$ticket[tt$ticket %in% c('SCA3','SCA4','SCAH','SC','SCAHBASLE','SCOW')] <- 'SC'
tt$ticket[tt$ticket %in% c('CASOTON','SOTONO2','SOTONOQ')] <- 'SOTON'
tt$ticket[tt$ticket %in% c('STONO2','STONOQ')] <- 'STON'
tt$ticket[tt$ticket %in% c('C')] <- 'CA'
tt$ticket[tt$ticket %in% c('SOC','SOP','SOPP')] <- 'SOP'
tt$ticket[tt$ticket %in% c('SWPP','WC','WEP')] <- 'W'
tt$ticket[tt$ticket %in% c('FA','FC','FCC')] <- 'F'
tt$ticket[tt$ticket %in% c('PP','PPP','LINE','LP','SP')] <- 'PPPP'
tt$ticket <- factor(tt$ticket)
tt$fare <- cut(tt$Fare,breaks=c(min(tt$Fare)-1,quantile(tt$Fare,seq(.2,.8,.2)),max(tt$Fare)+1))
tt$sibsp=factor(c(1:4,rep(4,6))[tt$SibSp+1])
tt$parch=factor(c(1:4,rep(4,6))[tt$Parch+1])


train <- tt[tt$status=='train',]
test <- tt[tt$status=='test',]

#end of preparation and data reading

options(width=90)


#################4444444444444444444##############################


des.matrix <- function(formula,data) {
  form2 <- strsplit(as.character(formula),'~',fixed=TRUE)
  resp <- form2[[length(form2)]]
  form3 <- strsplit(resp,'+',fixed=TRUE)[[1]]
  la <- lapply(form3,function(x) 
        model.matrix(as.formula(paste('~' , x, '-1' )),data) )
  nterm <- c(1,sapply(la,ncol))
  terms <- rep(1:length(nterm),nterm)
  ntrain <- nrow(data)
  mat <- do.call(cbind,la)
  mat <- cbind(rep(1,ntrain),mat)
  np <- ncol(mat)
  list( 
      survived = c(0,1)[data$Survived],
      np=np,
      ntrain=nrow(data),
      terms=terms,
      nterm=max(terms),
      tx=mat)
}

datain <- des.matrix(~ Sex+Pclass,data=train)

parameters=c('std','f','log_lik')
my_code <- ' 
    data {
    int<lower=0> ntrain;
    int survived[ntrain];
    int<lower=1> np;
    int<lower=1> nterm;
    int terms[np];
    matrix <lower=0,upper=1> [ntrain,np]  tx;
    }
    parameters {
    vector[np]  f;
    real <lower=0> std[nterm];
    real <lower=0> stdhyp;
    }
    model {        
    stdhyp ~ normal(0,2);
    std ~ normal(0,stdhyp);
    for (i in 1:np) {
       f[i] ~ normal(0,std[terms[i]]);
    }
    survived ~ bernoulli_logit(tx*f);
    }
    generated quantities {
    vector [ntrain] log_lik;
    for (i in 1:ntrain) {
    log_lik[i] <- bernoulli_logit_log(survived[i], tx[i]*f);
    }
    }
    '

fit1 <- stan(model_code = my_code, 
    data = datain, 
    pars=parameters,
    iter = 1000, 
    chains = 4,
    open_progress=FALSE)
#fit1

#log_lik1 <- extract_log_lik(fit1)
#loo1 <- loo(log_lik1)
#print(loo1, digits = 3)

print.mySmodel <- function(x) {
  print(x$loo1)
  cat('\n')
  invisible(x)
}

mySmodel <- function(formula,data) {
  datain <- des.matrix(formula,data)
  
  fitx <- 
      stan(model_code = my_code, 
      data = datain, 
      pars=parameters,
      fit=fit1,
      iter = 2000, 
      chains = parallel::detectCores(),
      open_progress=FALSE)
      log_lik1 <- extract_log_lik(fitx)
      loo1 <- loo(log_lik1)
      ll <-  list(myform=formula,fitx=fitx,loo1=loo1)
      class(ll) <- 'mySmodel'
      cat(format(formula),loo1$elpd_loo,loo1$se_elpd_loo,'\n')
      ll
}
mySmodel(Survived ~ 
        Title ,
    data=train)

################# prediction functions
    
myPmodel <- function(formula,data) {
  datain <- des.matrix(formula,data)
  
  fitx <- 
      stan(model_code = my_code, 
          data = datain, 
          pars=parameters,
          fit=fit1,
          iter = 2000, 
          chains = parallel::detectCores(),
          open_progress=FALSE)
      list(myform=formula,fitx=fitx)
    }    
    
PredM <- myPmodel(~ Title + Pclass + sibsp + Title:Pclass + Embarked + oe + Title:sibsp + parch
   ,data=train)

mySpred <- function(mymodel,newdata) {
  pfit <- as.matrix(mymodel$fitx)
  fmat <- pfit[,grep('^f\\[',colnames(pfit))]
  px <- des.matrix(mymodel$myform,data=newdata)$tx
  mylpred <- tcrossprod(fmat,px)
  mpred <- apply(mylpred,2,function(x) mean(x))
  pred <- as.numeric(gtools::inv.logit(mpred)>.5)
  factor(pred)
}
preds <- mySpred(PredM,test)

out <- data.frame(
    PassengerId=test$PassengerId,
    Survived=as.numeric(as.character(preds)),
    row.names=NULL)
write.csv(x=out,
    file='stanqua.csv',
    row.names=FALSE,
    quote=FALSE)