Simulate a pairwise comparison between alleles at a single location with coverage error. To do this, we use a random binomial distribution that takes in coverage and true allele frequency and outputs the estimated allele frequency.
Code
# how to simulate coverage## rbinom(1, cvg, AFtrue) / cvg## output is estimated AFcvg <-20trueAF <-0.43estAF <-rbinom(1,cvg,trueAF) / cvgestAF
[1] 0.45
Function that calculates estimated allele frequency when it is given coverage and true allele frequency:
Code
# function calculating estimated allele frequencyestAF <-function(cvg,trueAF){ est <-rbinom(1,cvg,trueAF) / cvgreturn(est)}
Now, we can scale our code and plot the results:
Code
# run this n times, scale up n <-1000tru <-0.43cvg <-sample(2:150,n,replace =TRUE)af <-tibble("cvg"= cvg,"trueAF"=rep(tru,times=n))af$estAF <- af %>%pmap(estAF) %>%unlist()# plot estimated allele frequency by coverageplt <- af %>%ggplot(aes(x = estAF, y = cvg)) +geom_point(alpha =0.15,colour="red") +geom_smooth(method ="gam", colour ="red") +geom_vline(xintercept=tru, size =0.8,linetype ="dashed") +labs(x ="Estimated allele frequency", y ="Coverage") +annotate("label",x = tru, y =-50, label ="True allele frequency") +theme_cowplot()plt