Tuesday, October 29, 2013

A first attempt at an individual-based model in R



I have been curious for a while as to how R might be used for the construction of an individually-based model (IBM), or agent-based model (ABM). In particular, what R objects lend themselves best to storing information on individuals, and allow for new individuals to be added or subtracted throughout the simulation?

In this first attempt, I have ended up opting for a multi-level list, where elements represent individuals, and sub-levels contain attribute information. The main reason is speed - In a previous post I highlighted the fact that lists are not penalized in the same way as a data.frame when the object in "grown" or concatenated with additional information (due to time spent reallocating memory).

The example models a population with a given birth rate, death rate, and carrying capacity. Attributes of individuals that are recorded in the list include their age, whether they are alive or dead, and their color (blue or red). The attribute of color is passed from parent to offspring, and there is a tendency for one phenotype to dominate over time.  The idea comes from a simple model of genetic drift that can be explored with the IBM programming platform NetLogo.

So far so good, but there are still some speed issues associated with a list that continues to grow. Part of the speed issue is due to the calculation of summary statistics during each iteration (using e.g. sapply). A cropping of the list to retain only alive individuals, has dramatic improvements on speed as well, so there appears to be a cost associated with growing the list object itself. The difference might be that my previous example was filling a empty list, where the number of elements was predefined. 

I would be interested if anyone has any thoughts on this or, more generally, on the construction of IBMs in R. While R is probably not the best programming language suitable to IBMs, it would be interesting to know if more examples exist.



Example:
b <- 0.14 # probability of birth
d <- 0.08 # probability of death
K <- 100 # carrying capacity
N0 <- 50 # starting number of individuals
t <- 500 # time of simulation
 
#create starting individual w attributes ("alive", "age", "color")
set.seed(1)
ind <- vector(mode="list", N0)
for(i in seq(ind)){
 ind[[i]]$alive <- 1
 ind[[i]]$age <- 0
 ind[[i]]$color <- c("blue", "red")[round(runif(1)+1)]
}
 
#make empty vectors to record population statistics
time <- seq(t+1)
 
pop <- NaN * time # population size
pop[1] <- N0
 
frac.blue <- NaN * time # fraction of population that is blue
cols <- sapply(ind, function(x) x$color)
frac.blue[1] <- sum(cols  == "blue") / length(cols)
 
med.age <- NaN * time
ages <- sapply(ind, function(x) x$age)
med.age[1] <- median(ages)
 
 
#simulation
save.alive.only <- TRUE # optional cropping of "ind" to include alive individuals only 
t1 <- Sys.time()
for(i in seq(t)){ # loop for each time increment
 
 is.alive <- which(sapply(ind, function(x) x$alive) == 1)
 for(j in is.alive){ #loop for each alive individual
  birth <- runif(1) <= (b * (1 - length(is.alive)/K)) # calculate a birth probability for each individual that is alive
  if(birth){
   len.ind <- length(ind)
   ind[[len.ind+1]] <- list(alive=1, age=0, color=ind[[j]]$color) # create offspring, inherits color of parent
  }
  death <- runif(1) <= d # calculate a death probability for each individual 
  if(death){
   ind[[j]]$alive <- 0 # if death, reset alive = 0
  } else { #else, advance age + 1
   ind[[j]]$age <- ind[[j]]$age + 1 # advance age of parent
  }
 }
 
 #optional cropping of list "ind"
 if(save.alive.only){
  is.dead <- which(sapply(ind, function(x) x$alive) == 0)
  if(length(is.dead) > 0) ind <- ind[-is.dead]
 }
 
 #Population stats
 is.alive <- which(sapply(ind, function(x) x$alive) == 1)
 pop[i+1] <- length(is.alive) 
 
 cols <- sapply(ind, function(x) x$color)
 frac.blue[i+1] <- sum(cols[is.alive]  == "blue") / length(is.alive)
 
 ages <- sapply(ind, function(x) x$age)
 med.age[i+1] <- median(ages[is.alive])
 
 print(paste(i, "of", t, "finished", "[", round(1/t*100), "%]"))
}
t2 <- Sys.time()
dt <- t2-t1
dt
 
 
#plot populations
png("pops_vs_time.png", width=6, height=4, units="in", res=400)
par(mar=c(4,4,1,1))
pop.blue <- pop * frac.blue
pop.red <- pop * (1-frac.blue)
ylim=range(c(pop.blue, pop.red))
plot(time, pop.blue, t="l", lwd=2, col=4, ylim=ylim, ylab="Population size")
lines(time, pop.red, lwd=2, col=2)
legend("topleft", legend=c("blue pop.", "red pop."), lwd=2, col=c(4,2), bty="n")
dev.off()
 
#plot median age
png("med_age_vs_time.png", width=6, height=4, units="in", res=400)
par(mar=c(4,4,1,1))
plot(time, med.age, t="l", lwd=2, ylab="Median age")
dev.off()
Created by Pretty R at inside-R.org


9 comments:

  1. I've been wondering about how to put together not just iBM's, but how to populate nicely-analyzable structures in general. What about putting individuals in an object-oriented framework, with each individual being represented by a single object? You would also have collections to access all individuals, etc.

    ReplyDelete
    Replies
    1. I think this is the strategy of the R package simecol, but I haven't had time to look into it. Maybe you're willing to write a guest post on the approach?

      Delete
  2. V interesting post as I'd been wondering how to implement ABM in R. One thought - don't programs such as NetLogo act on individual living agents within a population at random? So would it be better to replace "for (j in is.alive)" with "for (j in sample(is.alive))" in your code?

    ReplyDelete
    Replies
    1. I don't believe that that is a generalization that one has for all ABMs. You can have all sorts of interactions between agents - random, within a given distance, etc. In this very simple example, there is no interaction at all - only a chance of inheriting a given trait given the proportions of that trait in the population.

      Delete
  3. Marc,
    Thanks for your interesting post. I made some editions to your code: I got rid of some loops, I vectorized some computations. The code runs faster.

    b = 0.14 # probability of birth
    d = 0.08 # probability of death
    K = 100 # carrying capacity
    N0 = 50 # starting number of individuals
    t = 500 # time of simulation


    #create starting individual w attributes ("alive", "age", "color")
    set.seed(1)


    myind=cbind(alive=rep(1,N0),age=rep(0,N0),color=round(runif(N0)+1))
    # color=c("blue", "red")[myind[,3]]


    #make empty vectors to record population statistics
    time = seq(t+1)
    mypop = rep(NaN,t+1) # population size
    mypop[1] = N0


    myfrac.blue = rep(NaN,t+1) # fraction of population that4 is blue
    myfrac.blue[1] = sum(myind[,'color'] == 1) / sum(myind[,1])


    mymed.age = rep(NaN,t+1)
    myages = myind[,'age']
    mymed.age[1] = median(ages)


    #simulation
    save.alive.only = TRUE # optional cropping of "ind" to include alive individuals only
    t1 = Sys.time()


    for(i in 1:t){ # loop for each time increment

    mybirths = rbind(c(0,0,0), myind[myind[,'alive']==1,])
    n.alive=dim(mybirths)[1] - 1

    if(n.alive){
    # this is to reproduce the order of random variables in your code
    ru=runif(2*n.alive)
    # you use one call to runif for births and the next for deaths

    #select births
    mybirths[,1]=c(0, as.numeric(ru[2*(1:n.alive)-1]<=b*(1-n.alive/K)) )
    mybirths[,2]=0
    mybirths=mybirths[mybirths[,1]==1,]


    #select survivors
    myind[myind[,'alive']==1,1]=myind[myind[,'alive']==1,1]*(ru[2*(1:n.alive)]>d)

    #advance age of survivors
    myind[,2]=myind[,2]+myind[,1]

    #optional cropping of list "ind"
    if(save.alive.only) myind = myind[myind[,1]==1,]

    #old & new
    myind=rbind(myind,mybirths)
    }


    #Population stats
    mypop[i+1] = sum(myind[,1])

    myfrac.blue[i+1] = sum(myind[,'color']==1) / mypop[i+1]

    mymed.age[i+1] = median(myind[myind[,1]==1,2])

    print(paste(i, "of", t, "finished", "[", round(1/t*100), "%]"))
    }

    t2 <- Sys.time()
    dt <- t2-t1
    dt


    #plot populations
    #png("pops_vs_time.png", width=6, height=4, units="in", res=400)
    par(mar=c(4,4,1,1))
    mypop.blue <- mypop * myfrac.blue
    mypop.red <- mypop * (1-myfrac.blue)
    ylim=range(c(mypop.blue, mypop.red))
    plot(time, mypop.blue, t="l", lwd=2, col=4, ylim=ylim, ylab="Population size")
    lines(time, mypop.red, lwd=2, col=2)
    legend("topleft", legend=c("blue pop.", "red pop."), lwd=2, col=c(4,2), bty="n")
    #dev.off()

    #plot median age
    #png("med_age_vs_time.png", width=6, height=4, units="in", res=400)
    par(mar=c(4,4,1,1))
    plot(time, med.age, t="l", lwd=2, ylab="Median age")
    #dev.off()

    ReplyDelete
  4. Hey Marc,

    Your code has been very helpful for me. I have recently transitioned a matrix pop model that I have been working on into an IBM for various reasons. Im hoping to do the IBM in R to avoid learning a whole new language. I have looked through your code extensively and even tried re-working it a bit for my own purposes (hopefully you dont mind). I think the list is a good way to visualize how to keep track of individuals and have them "birth" new individuals. I have one big problem so far that I was hoping to get your take on: I need each individual to birth more than one offspring. I work with amphibians that lay sometimes 1000s of eggs. For now I was trying to get the code to work for 2 offspring from each parent, but the "list" aspect was giving me errors. Ive messed around extensively with this line of code:

    if(birth){
    len.ind <- length(ind)
    ind[[len.ind+1]] <- list(alive=1, age=0, color=ind[[j]]$color)
    }

    with little success so far. Im currently considering trying to build a function to do it several times, if it can only be done one individual at a time because of the list aspect. If you have any thoughts I would be very interested!

    Thanks,

    -Scott

    ReplyDelete
    Replies
    1. Hi Scott - thanks for you kind words. I have actually gone a bit away from using lists since I wrote this post and now find data.frames to be a bit more handy for summary statistics. In any case, this is more a stylistic change, and the following comments should hold true for lists. I have employed the following strategy when reproducing individuals. First, if you are able to calculate the number of offspring for each adult, then you can make a new empty list of that length. Second, you can fill it up this new list with the offspring's attributes. And finally, when they are all ready to add to the population, you can bind the two together. With data.frames, this is simply rbind(ind, offspring). With lists, you can use ind <- c(ind, offspring). Your post has given me the motivation to write an update on this topic. I'll try to get an example posted soon. Cheers, Marc

      Delete
    2. Hi Scott - for some additional advice into your problem, I have written the following post: http://menugget.blogspot.de/2014/09/additional-tips-for-structuring.html

      Delete