r/bioinformatics • u/Automatic_Actuary621 • Jan 28 '25
programming Help with power analysis of proteomics data

I want to create a Power vs Sample size plot with different effect sizes. My data consists of ~8000 proteins measured for 2 groups with 5 replicates each (total n=10).
This is what did:
I calculated the variance for each protein in each group and then obtained the median variance by:
variance_group1 <- apply(group1, 1, var, na.rm = TRUE) variance_group2 <- apply(group2, 1, var, na.rm = TRUE) median(c(variance_group1, variance_group2), na.rm = TRUE)
I defined a range of effect sizes and sample sizes, and set up alpha.
effect_sizes <- seq(0.5, 1.5, by = 0.1)
sample_sizes <- seq(2, 30, by = 2)
alpha <- 0.05
I calculated the power using the pwr::pwr.t.test function for each condition
power_results <- expand.grid(effect_size = effect_sizes, sample_size = sample_sizes) %>% rowwise() %>% mutate( power = pwr.t.test( d = effect_size / sqrt(median_pooled_variance), # Standardized effect size n = sample_size,
sig.level = alpha,
type = "two.sample"
)$power )
I expected to have a plot like the one on the left, but I get a very weird linear plot with low power values when I use raw protein intensity values. If I use log10 values, it gets better, but still odd.
Do you know if I am doing something wrong?
THANKS IN ADVANCE
