Thursday, June 9, 2011

Rejection Sampling


An interesting sampling method that was covered briefly in my Bayesian statistics course was rejection sampling. Since I have nothing better to do, I thought it would be fun to make an acceptance-rejection algorithm using R. FUN!

The Rejection Sampling method is usually used to simulate data from an unknown distribution. To do this one samples from a distribution that covers the suport of the unknown distribution and use certain criteria for accepting/rejecting the sampled values. One way to do this is as follows (Rice, p 92).

Step 1: Generate T with density m.

Step 2: Generate U, uniform on [0,1] and independent of T. If M(T)*U ≤ ƒ(T), then let X = T (accept T). Otherwise, go to Step 1 (Reject T).

Where M(x) is a function such that M(x) ≥ ƒ(x) on [a,b].

To keep things simple for myself I will be simulating a Beta distribution with parameters 6 and 3 (ƒ). To do this I will sample T's from a scaled uniform distribution (M), and reject sampled values where M(T)*U ≥ ƒ(T).

In a plot of the beta distribution with parameters 6 and 3 we can see that the ƒ(x) never goes above 3. For this reason I chose to scale the uniform distribution M by multiplying it by 3.
Here is the R code to implement rejection sampling for 100,000 observations in this example.

sample.x = runif(100000,0,1)
accept = c()

for(i in 1:length(sample.x)){
U = runif(1, 0, 1)
if(dunif(sample.x[i], 0, 1)*3*U <= dbeta(sample.x[i], 6, 3)) {
accept[i] = 'Yes'
}
else if(dunif(sample.x[i],0,1)*3*U > dbeta(sample.x[i], 6, 3)) {
accept[i] = 'No'
}
}

T = data.frame(sample.x, accept = factor(accept, levels= c('Yes','No')))

We can plot the results along with the true distribution with the following code.

hist(T[,1][T$accept=='Yes'], breaks = seq(0,1,0.01), freq = FALSE, main = 'Histogram of X', xlab = 'X')
lines(x, dbeta(x,6,3))


With 100,000 observations sampled, the data fit very well.
We can look at the densities of both the accepted and rejected values to get an idea of what's going on.

library(ggplot2)
print(qplot(sample.x, data = T, geom = 'density', color = accept))


Looking at a stacked histogram of all the sampled values together we can really see how much wasted data there are in this example.

print(qplot(sample.x, data = T, geom = 'histogram', fill = accept, binwidth=0.01))


In fact, when I ran this example I got 33,114 accepted values and 66,886 rejected values. I probably could have chosen a better value than 3 to scale the uniform distribution, but ideally rejection sampling uses a known distribution that is only slightly different from the unknown distribution we're trying to estimate.

6 comments:

  1. cool work...do you have any examples of 3d graphs..

    ReplyDelete
  2. Thanks for sharing. This is quite helpful for me as I am starting to learn R. I hope you continue to update the blog, and keep the posts on your R work coming =)

    ReplyDelete
  3. Hi there! your post is great for R beginners just like me!!
    one question do you happen to have kind of the same example for a Normal distribution?
    thanks in advance! best regards from Mexico City

    ReplyDelete
  4. Wow! great helpful work!! I am a beginner in R and it helps.Thanks.

    ReplyDelete
  5. awesome,just curious,is this the same code for every distibution?would it work with the normal distribution instead of the beta in this case??

    ReplyDelete
  6. lines(x, dbeta(x,6,3))

    It is:
    lines(sample.x, dbeta(sanple.x,6,3))

    right?

    ReplyDelete