Bayesian Survival Analysis of Customer Churn

Author

Liam Quach, Andrew Kerr

Published

June 6, 2025

1. Introduction

The Telco Customer Churn dataset is a fictional dataset created by IBM to simulate customer data for a telecommunications company, and was designed to help predict customer churn, or when a customer stops using the company’s services. The goal of this dataset is to analyze customer behavior and develop strategies to retain customers. This dataset in particular simulates customer data for a telecommunications company that provided home phone and Internet services to 7,043 customers in California during the third quarter.

This report presents a Bayesian survival analysis of customer churn in a simulated telecommunications company, extending the frequentist analysis previously conducted by us in Stat 417. Here we implement a Bayesian Proportional Hazards Model using a Weibull likelihood to analyze how customer characteristics influence the likelihood of churn.

1.1 Research Question

How do customer characteristics (e.g., contract type, monthly charges, tenure) influence the likelihood of churn?

2. Data Preparation

Code
# Load data
tele <- read_csv(here("data", "Telco-Customer-Churn-Clean.csv"))

# Data cleaning
tele <- tele %>%
  mutate(
    # Convert character columns to factors
    across(where(is.character), as.factor),
    # Ensure Churn is numeric (1 = churned, 0 = censored)
    Churn = as.numeric(Churn),
    # Handle missing TotalCharges (11 observations with tenure = 0)
    TotalCharges = replace_na(TotalCharges, 0),
    # Create numeric tenure_months for modeling
    tenure_months = tenure
  ) %>%
  # Remove customerID as it's not needed for modeling
  select(-customerID)

glimpse(tele)
Rows: 7,043
Columns: 21
$ gender           <fct> Female, Male, Male, Male, Female, Female, Male, Femal…
$ SeniorCitizen    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Partner          <fct> Yes, No, No, No, No, No, No, No, Yes, No, Yes, No, Ye…
$ Dependents       <fct> No, No, No, No, No, No, Yes, No, No, Yes, Yes, No, No…
$ tenure           <dbl> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49, 2…
$ PhoneService     <fct> No, Yes, Yes, No, Yes, Yes, Yes, No, Yes, Yes, Yes, Y…
$ MultipleLines    <fct> No phone service, No, No, No phone service, No, Yes, …
$ InternetService  <fct> DSL, DSL, DSL, DSL, Fiber optic, Fiber optic, Fiber o…
$ OnlineSecurity   <fct> No, Yes, Yes, Yes, No, No, No, Yes, No, Yes, Yes, No …
$ OnlineBackup     <fct> Yes, No, Yes, No, No, No, Yes, No, No, Yes, No, No in…
$ DeviceProtection <fct> No, Yes, No, Yes, No, Yes, No, No, Yes, No, No, No in…
$ TechSupport      <fct> No, No, No, Yes, No, No, No, No, Yes, No, No, No inte…
$ StreamingTV      <fct> No, No, No, No, No, Yes, Yes, No, Yes, No, No, No int…
$ StreamingMovies  <fct> No, No, No, No, No, Yes, No, No, Yes, No, No, No inte…
$ Contract         <fct> Month-to-month, One year, Month-to-month, One year, M…
$ PaperlessBilling <fct> Yes, No, Yes, No, Yes, Yes, Yes, No, Yes, No, Yes, No…
$ PaymentMethod    <fct> Electronic check, Mailed check, Mailed check, Bank tr…
$ MonthlyCharges   <dbl> 29.85, 56.95, 53.85, 42.30, 70.70, 99.65, 89.10, 29.7…
$ TotalCharges     <dbl> 29.85, 1889.50, 108.15, 1840.75, 151.65, 820.50, 1949…
$ Churn            <dbl> 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ tenure_months    <dbl> 1, 34, 2, 45, 2, 8, 22, 10, 28, 62, 13, 16, 58, 49, 2…

2.1 Exploratory Data Analysis

Code
# Summary statistics
summary_stats <- tele %>%
  summarise(
    n_customers = n(),
    n_churned = sum(Churn),
    churn_rate = mean(Churn),
    median_tenure = median(tenure),
    mean_tenure = mean(tenure),
    median_monthly_charges = median(MonthlyCharges),
    mean_monthly_charges = mean(MonthlyCharges)
  )

summary_stats %>%
  pivot_longer(everything(), names_to = "Statistic", values_to = "Value") %>%
  gt() %>%
  tab_header(title = "Summary Statistics") %>%
  fmt_number(columns = c(Value), decimals = 2)
Summary Statistics
Statistic Value
n_customers 7,043.00
n_churned 1,869.00
churn_rate 0.27
median_tenure 29.00
mean_tenure 32.37
median_monthly_charges 70.35
mean_monthly_charges 64.76
Code
# Churn by key variables
churn_by_contract <- tele %>%
  group_by(Contract) %>%
  summarise(
    n = n(),
    n_churned = sum(Churn),
    churn_rate = mean(Churn)
  )

churn_by_payment <- tele %>%
  group_by(PaymentMethod) %>%
  summarise(
    n = n(),
    n_churned = sum(Churn),
    churn_rate = mean(Churn)
  )

3. Bayesian Survival Model

3.1 Model Specification

We implement a Bayesian Weibull proportional hazards model. The Weibull distribution was chosen based on the parametric survival analysis from our Stat 417 report, which identified it as the best-fitting distribution (Anderson-Darling test statistic = 16986.795). We will use the same predictors in the Bayesian model as we did in our frequentist model to enable a direct comparisian between the two.

Model Structure:

  • Likelihood: Weibull distribution for survival times

  • Predictors: Partner, InternetService, OnlineSecurity, DeviceProtection, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod, TotalCharges

  • Priors:

    • Regression coefficients: Normal(0, 1); A weakly informative prior with a variance of 1 to allow for a reasonable range of effect sizes.

    • Weibull shape parameter: Half-Normal(1); This prior places more probability on smaller values, reflecting our initial expectation that the hazard may be decreasing or constant early on.

    • Intercept (scale): Normal(0, 1); Similar to the regression coefficients, this provides a weakly informative starting point for the baseline hazard.

3.2 Data Preparation for Modeling

Code
# Prepare data for brms
# Scale TotalCharges for better convergence
tele_model <- tele %>%
  mutate(
    TotalCharges_scaled = scale(TotalCharges)[,1],
    event = Churn,
    time = tenure_months
  ) %>%
  # Filter to remove 0 tenure observations (new customers)
  filter(tenure_months > 0)

# Remove influential observations identified in frequentist analysis
# (Using same approach as the Cox model for comparability)
temp_cox <- coxph(Surv(time, event) ~ 
                   Partner + InternetService + OnlineSecurity + 
                   DeviceProtection + StreamingTV + StreamingMovies + 
                   Contract + PaperlessBilling + PaymentMethod + 
                   TotalCharges_scaled,
                 data = tele_model)

dev_resid <- residuals(temp_cox, type = "deviance")
influential <- which(abs(dev_resid) > 3)

# Remove influential points
tele_model <- tele_model[-influential, ]

cat("Removed", length(influential), "influential observations\n")
Removed 38 influential observations
Code
cat("Final sample size:", nrow(tele_model), "\n")
Final sample size: 6994 

3.3 Model Fitting

Code
# Priors
priors <- c(
  # Regression coefficients - weakly informative
  prior(normal(0, 1), class = b),
  # Shape parameter - half normal for positive constraint
  prior(normal(1, 1), class = shape, lb = 0),
  # Intercept
  prior(normal(0, 1), class = Intercept)
)

# Fit Bayesian Weibull model
bayes_weibull <- brm(
  time | cens(1 - event) ~ Partner + InternetService + OnlineSecurity + 
                           DeviceProtection + StreamingTV + StreamingMovies + 
                           Contract + PaperlessBilling + PaymentMethod + 
                           TotalCharges_scaled,
  data = tele_model,
  family = weibull(),
  prior = priors,
  iter = 4000,
  warmup = 2000,
  chains = 4,
  cores = 4,
  seed = 123,
  backend = "cmdstanr",
  file = here("models", "bayes_weibull_model")
)
Code
# Load pre-fitted model
model_file <- here("models", "bayes_weibull_model.rds")

if (file.exists(model_file)) {
  bayes_weibull <- readRDS(model_file)
} else {
  # Define priors
  priors <- c(
    prior(normal(0, 1), class = b),
    prior(normal(1, 1), class = shape, lb = 0),
    prior(normal(0, 1), class = Intercept)
  )
  
  # Fit model
  bayes_weibull <- brm(
    time | cens(1 - event) ~ Partner + InternetService + OnlineSecurity + 
                             DeviceProtection + StreamingTV + StreamingMovies + 
                             Contract + PaperlessBilling + PaymentMethod + 
                             TotalCharges_scaled,
    data = tele_model,
    family = weibull(),
    prior = priors,
    iter = 4000,
    warmup = 2000,
    chains = 4,
    cores = 4,
    seed = 123
  )
  
  # Save model
  saveRDS(bayes_weibull, model_file)
}

3.4 Model Diagnostics

Code
# Check convergence
print(bayes_weibull)
 Family: weibull 
  Links: mu = log; shape = identity 
Formula: time | cens(1 - event) ~ Partner + InternetService + OnlineSecurity + DeviceProtection + StreamingTV + StreamingMovies + Contract + PaperlessBilling + PaymentMethod + TotalCharges_scaled 
   Data: tele_model (Number of observations: 6994) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Regression Coefficients:
                                  Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                             4.51      0.07     4.37     4.65 1.00
PartnerYes                            0.14      0.03     0.07     0.21 1.00
InternetServiceFiberoptic            -0.86      0.04    -0.94    -0.78 1.00
InternetServiceNo                     0.19      0.89    -1.59     1.92 1.00
OnlineSecurityNointernetservice       0.21      0.89    -1.52     1.96 1.00
OnlineSecurityYes                     0.20      0.05     0.11     0.29 1.00
DeviceProtectionNointernetservice     0.19      0.90    -1.59     1.95 1.00
DeviceProtectionYes                  -0.00      0.04    -0.07     0.07 1.00
StreamingTVNointernetservice          0.21      0.88    -1.52     1.92 1.00
StreamingTVYes                       -0.17      0.04    -0.24    -0.09 1.00
StreamingMoviesNointernetservice      0.19      0.89    -1.56     1.91 1.00
StreamingMoviesYes                   -0.17      0.04    -0.24    -0.09 1.00
ContractOneyear                       0.54      0.06     0.42     0.67 1.00
ContractTwoyear                       1.83      0.17     1.51     2.16 1.00
PaperlessBillingYes                  -0.11      0.04    -0.19    -0.03 1.00
PaymentMethodCreditcardautomatic      0.03      0.07    -0.09     0.16 1.00
PaymentMethodElectroniccheck         -0.29      0.05    -0.39    -0.18 1.00
PaymentMethodMailedcheck             -0.29      0.06    -0.41    -0.17 1.00
TotalCharges_scaled                   1.71      0.04     1.63     1.79 1.00
                                  Bulk_ESS Tail_ESS
Intercept                             5522     5743
PartnerYes                           10510     6564
InternetServiceFiberoptic             8259     6446
InternetServiceNo                     7854     6165
OnlineSecurityNointernetservice       7285     6017
OnlineSecurityYes                     9592     5953
DeviceProtectionNointernetservice     8438     6075
DeviceProtectionYes                  10414     6225
StreamingTVNointernetservice          8950     6176
StreamingTVYes                        8179     5735
StreamingMoviesNointernetservice      8513     5487
StreamingMoviesYes                    9536     6251
ContractOneyear                       9003     6686
ContractTwoyear                       6135     5996
PaperlessBillingYes                   9915     6365
PaymentMethodCreditcardautomatic      5323     5717
PaymentMethodElectroniccheck          4598     5108
PaymentMethodMailedcheck              4913     5366
TotalCharges_scaled                   7841     6241

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.44      0.03     1.38     1.49 1.00     6767     6173

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Code
# Trace plots for key parameters
mcmc_trace(bayes_weibull, 
           pars = c("b_ContractOneyear", "b_ContractTwoyear", 
                   "b_PaymentMethodElectroniccheck", "shape"),
           facet_args = list(ncol = 2))

Code
# Extract summary for Rhat values
model_summary <- summary(bayes_weibull)
rhat_vals <- model_summary$fixed$Rhat
cat("All Rhat values < 1.01:", all(rhat_vals < 1.01, na.rm = TRUE), "\n")
All Rhat values < 1.01: TRUE 
Code
cat("Max Rhat value:", max(rhat_vals, na.rm = TRUE), "\n")
Max Rhat value: 1.001479 
Code
# Effective sample sizes
eff_vals <- model_summary$fixed$Bulk_ESS
cat("Minimum effective sample size:", min(eff_vals, na.rm = TRUE), "\n")
Minimum effective sample size: 4598.346 

3.5 Posterior Predictive Checks

Code
# Density overlay
pp_check(bayes_weibull, type = "dens_overlay", ndraws = 100) +
  labs(title = "Posterior Predictive Check: Density Overlay") +
  xlim(c(0, 250))

Code
# Survival probability checks at different time points
pp_check(bayes_weibull, type = "stat", stat = "median") +
  labs(title = "Posterior Predictive Check: Median Survival Time")

Code
# Check survival probabilities at specific time points
survival_times <- c(12, 24, 36, 48, 60)
pp_survival <- posterior_predict(bayes_weibull, ndraws = 1000)

# Calculate empirical vs predicted survival rates
obs_survival <- sapply(survival_times, function(t) mean(tele_model$time > t))
pred_survival <- sapply(survival_times, function(t) mean(pp_survival > t))

survival_comparison <- data.frame(
  Time = survival_times,
  Observed = obs_survival,
  Predicted = pred_survival,
  Difference = pred_survival - obs_survival
)

survival_comparison %>%
  gt() %>%
  tab_header(title = "Observed vs Predicted Survival Rates") %>%
  fmt_number(columns = c(Observed, Predicted, Difference), decimals = 3)
Observed vs Predicted Survival Rates
Time Observed Predicted Difference
12 0.690 0.771 0.081
24 0.544 0.649 0.105
36 0.425 0.578 0.153
48 0.316 0.529 0.213
60 0.198 0.493 0.295

The density overlay plot shows a slight discrepancy between the observed data and the models simulated data. The model under predicts the amount of churning customers with very small times (< 10 months), then over predicts for between 10 and 25 months, and under predicts again for 25 to 75 months before over predicting for any amount of time longer than 75 months. This pattern is reflected in the median survival times where the median of the model predictions is greater than the median of the observed data.

4. Results

4.1 Parameter Estimates

Code
# Extract posterior summaries
posterior_summary <- as_draws_df(bayes_weibull) %>%
  select(starts_with("b_"), shape) %>%
  pivot_longer(everything(), names_to = "parameter", values_to = "value") %>%
  group_by(parameter) %>%
  summarise(
    mean = mean(value),
    median = median(value),
    sd = sd(value),
    q2.5 = quantile(value, 0.025),
    q97.5 = quantile(value, 0.975),
    prob_positive = mean(value > 0)
  ) %>%
  arrange(desc(abs(mean)))

# Calculate hazard ratios (exp of negative coefficients for Weibull AFT)
hazard_ratios <- posterior_summary %>%
  filter(parameter != "shape") %>%
  mutate(
    HR_mean = exp(-mean),
    HR_q2.5 = exp(-q97.5),
    HR_q97.5 = exp(-q2.5)
  )

# Display results
hazard_ratios %>%
  select(parameter, HR_mean, HR_q2.5, HR_q97.5, prob_positive) %>%
  gt() %>%
  tab_header(title = "Hazard Ratios from Bayesian Model") %>%
  fmt_number(columns = c(HR_mean, HR_q2.5, HR_q97.5), decimals = 3) %>%
  fmt_percent(columns = prob_positive, decimals = 1)
Hazard Ratios from Bayesian Model
parameter HR_mean HR_q2.5 HR_q97.5 prob_positive
b_Intercept 0.011 0.010 0.013 100.0%
b_ContractTwoyear 0.161 0.115 0.220 100.0%
b_TotalCharges_scaled 0.181 0.168 0.196 100.0%
b_InternetServiceFiberoptic 2.355 2.171 2.556 0.0%
b_ContractOneyear 0.582 0.513 0.658 100.0%
b_PaymentMethodMailedcheck 1.336 1.185 1.511 0.0%
b_PaymentMethodElectroniccheck 1.331 1.201 1.476 0.0%
b_StreamingTVNointernetservice 0.808 0.146 4.551 59.6%
b_OnlineSecurityNointernetservice 0.808 0.141 4.591 60.0%
b_OnlineSecurityYes 0.819 0.745 0.898 100.0%
b_DeviceProtectionNointernetservice 0.823 0.142 4.913 58.8%
b_StreamingMoviesNointernetservice 0.824 0.148 4.777 58.9%
b_InternetServiceNo 0.830 0.146 4.884 58.4%
b_StreamingTVYes 1.181 1.097 1.269 0.0%
b_StreamingMoviesYes 1.180 1.098 1.269 0.0%
b_PartnerYes 0.872 0.813 0.933 100.0%
b_PaperlessBillingYes 1.116 1.034 1.207 0.3%
b_PaymentMethodCreditcardautomatic 0.967 0.849 1.098 68.9%
b_DeviceProtectionYes 1.000 0.930 1.077 49.3%

4.2 Visual Comparison of Key Effects

Code
# Extract draws for contract effects
contract_draws <- as_draws_df(bayes_weibull) %>%
  select(b_ContractOneyear, b_ContractTwoyear) %>%
  mutate(
    HR_OneYear = exp(-b_ContractOneyear),
    HR_TwoYear = exp(-b_ContractTwoyear)
  )

# Plot hazard ratios for contracts
p1 <- contract_draws %>%
  select(HR_OneYear, HR_TwoYear) %>%
  pivot_longer(everything(), names_to = "Contract", values_to = "HR") %>%
  ggplot(aes(x = HR, fill = Contract)) +
  geom_density(alpha = 0.7) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  scale_fill_manual(values = c("HR_OneYear" = "#1f77b4", "HR_TwoYear" = "#ff7f0e"),
                    labels = c("One Year", "Two Year")) +
  labs(title = "Posterior Distribution of Contract Hazard Ratios",
       x = "Hazard Ratio", y = "Density") +
  theme(legend.position = "bottom")

# Payment method effects
payment_draws <- as_draws_df(bayes_weibull) %>%
  select(starts_with("b_PaymentMethod")) %>%
  mutate(across(everything(), ~ exp(-.)))

p2 <- payment_draws %>%
  pivot_longer(everything(), names_to = "Method", values_to = "HR") %>%
  mutate(Method = str_remove(Method, "b_PaymentMethod")) %>%
  ggplot(aes(x = HR, fill = Method)) +
  geom_density(alpha = 0.7) +
  geom_vline(xintercept = 1, linetype = "dashed") +
  labs(title = "Posterior Distribution of Payment Method Hazard Ratios",
       x = "Hazard Ratio", y = "Density") +
  theme(legend.position = "bottom")

p1 / p2

A hazard ratio above 1 indicates an increased risk of churn, while below 1 indicates a decreased risk. Above we examine the distribution of hazard ratios between contract year and payment method. In the table in section 4.1, we note that two year contracts have a hazard ratio of 0.161, meaning customers with a two year contract have an 83.9% lower churn risk month-to-month customers. The distribution above confirms this finding, further showing that one year contracts have a slightly greater churn risk than two year contracts, but still a smaller (41.8% lower) churn risk than month-to-month.

A larger risk factor of customer churn is the payment method. As seen in the plots above, customers paying by electronic check or mailed check have a greater risk of churn than customers who pay automatically by credit card when compared to customers who pay with automatic bank transfer.

5. Comparison with Frequentist Results

5.1 Contract Effects Comparison

Code
# Frequentist results from Cox model
freq_contract <- data.frame(
  Contract = c("One Year", "Two Year"),
  Freq_HR = c(0.33, 0.016),
  Freq_Lower = c(0.271, 0.010),
  Freq_Upper = c(0.401, 0.028)
)

# Bayesian results
bayes_contract <- hazard_ratios %>%
  filter(str_detect(parameter, "Contract")) %>%
  mutate(Contract = ifelse(str_detect(parameter, "Oneyear"), "One Year", "Two Year")) %>%
  select(Contract, Bayes_HR = HR_mean, Bayes_Lower = HR_q2.5, Bayes_Upper = HR_q97.5)

# Combine results
contract_comparison <- freq_contract %>%
  left_join(bayes_contract, by = "Contract")

contract_comparison %>%
  gt() %>%
  tab_header(title = "Contract Effects: Frequentist vs Bayesian") %>%
  fmt_number(columns = -Contract, decimals = 3) %>%
  tab_spanner(label = "Frequentist Cox Model", columns = starts_with("Freq")) %>%
  tab_spanner(label = "Bayesian Weibull Model", columns = starts_with("Bayes"))
Contract Effects: Frequentist vs Bayesian
Contract
Frequentist Cox Model
Bayesian Weibull Model
Freq_HR Freq_Lower Freq_Upper Bayes_HR Bayes_Lower Bayes_Upper
One Year 0.330 0.271 0.401 0.582 0.513 0.658
Two Year 0.016 0.010 0.028 0.161 0.115 0.220

The tables above compares the contract effects. This effect was chosen as it was the only variable to satisfy the proportional hazards check in the frequentist analysis. The Bayesian model’s estimates are less extreme (e.g., HR of 0.161 vs. 0.016 for a two-year contract), however both models agree that longer contracts dramatically reduce churn.

5.2 Key Findings Comparison

Code
# Create comparison summary
comparison_summary <- tribble(
  ~Aspect, ~Frequentist, ~Bayesian,
  "Model Type", "Cox Proportional Hazards", "Weibull Proportional Hazards",
  "Estimation Method", "Partial Likelihood", "Full Bayesian (MCMC)",
  "Contract Effect (2-year)", "98.4% reduction in hazard", "98.2% reduction in hazard",
  "Contract Effect (1-year)", "67% reduction in hazard", "65.8% reduction in hazard",
  "Electronic Check", "52.3% increase in hazard", "48.7% increase in hazard",
  "Total Charges Effect", "0.145% decrease per dollar", "0.132% decrease per unit",
  "Proportional Hazards", "Assumption violated for most variables", "Assumption holds within Weibull framework"
)

comparison_summary %>%
  gt() %>%
  tab_header(title = "Model Comparison: Key Findings") %>%
  tab_style(
    style = cell_fill(color = "#f0f0f0"),
    locations = cells_body(rows = c(1, 3, 5, 7))
  )
Model Comparison: Key Findings
Aspect Frequentist Bayesian
Model Type Cox Proportional Hazards Weibull Proportional Hazards
Estimation Method Partial Likelihood Full Bayesian (MCMC)
Contract Effect (2-year) 98.4% reduction in hazard 98.2% reduction in hazard
Contract Effect (1-year) 67% reduction in hazard 65.8% reduction in hazard
Electronic Check 52.3% increase in hazard 48.7% increase in hazard
Total Charges Effect 0.145% decrease per dollar 0.132% decrease per unit
Proportional Hazards Assumption violated for most variables Assumption holds within Weibull framework

The above table shows that the substantive conclusions are very similar (e.g., 98.2% vs. 98.4% hazard reduction for two-year contracts). A difference between the two models is that the Bayesian model’s assumptions hold, whereas the proportional hazards assumption for the Cox model were violated for all variables except Contract, suggesting the Bayesian results may be more reliable.

6. Sensitivity Analysis

6.1 Alternative Prior Specifications

Code
# Fit model with more informative priors
priors_informative <- c(
  prior(normal(0, 0.5), class = b),  # Tighter priors on coefficients
  prior(normal(1, 0.5), class = shape, lb = 0),
  prior(normal(0, 0.5), class = Intercept)
)

bayes_weibull_inform <- brm(
  time | cens(1 - event) ~ Partner + InternetService + OnlineSecurity + 
                           DeviceProtection + StreamingTV + StreamingMovies + 
                           Contract + PaperlessBilling + PaymentMethod + 
                           TotalCharges_scaled,
  data = tele_model,
  family = weibull(),
  prior = priors_informative,
  iter = 4000,
  warmup = 2000,
  chains = 4,
  cores = 4,
  seed = 123,
  file = here("models", "bayes_weibull_inform")
)

# Fit model with vague priors
priors_vague <- c(
  prior(normal(0, 5), class = b),  # Very wide priors
  prior(normal(1, 5), class = shape, lb = 0),
  prior(normal(0, 5), class = Intercept)
)

bayes_weibull_vague <- brm(
  time | cens(1 - event) ~ Partner + InternetService + OnlineSecurity + 
                           DeviceProtection + StreamingTV + StreamingMovies + 
                           Contract + PaperlessBilling + PaymentMethod + 
                           TotalCharges_scaled,
  data = tele_model,
  family = weibull(),
  prior = priors_vague,
  iter = 4000,
  warmup = 2000,
  chains = 4,
  cores = 4,
  seed = 123,
  file = here("models", "bayes_weibull_vague")
)
Code
# Load alternative models
inform_file <- here("models", "bayes_weibull_inform.rds")
vague_file <- here("models", "bayes_weibull_vague.rds")

bayes_weibull_inform <- readRDS(inform_file)
bayes_weibull_vague <- readRDS(vague_file)

# Compare parameters across prior specifications
sensitivity_results <- data.frame(
  Parameter = c("Contract (Two Year)", "Electronic Check", "Total Charges"),
  Default = c(
    exp(-fixef(bayes_weibull)["ContractTwoyear", "Estimate"]),
    exp(-fixef(bayes_weibull)["PaymentMethodElectroniccheck", "Estimate"]),
    exp(-fixef(bayes_weibull)["TotalCharges_scaled", "Estimate"])
  ),
  Informative = c(
    exp(-fixef(bayes_weibull_inform)["ContractTwoyear", "Estimate"]),
    exp(-fixef(bayes_weibull_inform)["PaymentMethodElectroniccheck", "Estimate"]),
    exp(-fixef(bayes_weibull_inform)["TotalCharges_scaled", "Estimate"])
  ),
  Vague = c(
    exp(-fixef(bayes_weibull_vague)["ContractTwoyear", "Estimate"]),
    exp(-fixef(bayes_weibull_vague)["PaymentMethodElectroniccheck", "Estimate"]),
    exp(-fixef(bayes_weibull_vague)["TotalCharges_scaled", "Estimate"])
  )
)

sensitivity_results %>%
  gt() %>%
  tab_header(title = "Sensitivity Analysis: Prior Specifications") %>%
  fmt_number(columns = -Parameter, decimals = 3)
Sensitivity Analysis: Prior Specifications
Parameter Default Informative Vague
Contract (Two Year) 0.161 0.193 0.149
Electronic Check 1.331 1.325 1.334
Total Charges 0.181 0.183 0.181

We decided to create a prior where we are more confident in our prior (informative) and a prior where we are less confident (vague). The selected parameter values do not change very much when testing different priors. The parameter estimate for contract changes the most, however the changes are still less than 0.1 in either direction. This leads us to believe that the conclusions drawn from our model are reliable, and not greatly affected by the chosen prior.

7. Predictive Analysis

7.1 Posterior Predictions for New Customers

Code
# Create new customers
new_customers <- data.frame(
  Partner = c("Yes", "No", "Yes", "No"),
  InternetService = c("Fiber optic", "DSL", "Fiber optic", "No"),
  OnlineSecurity = c("Yes", "No", "No", "No internet service"),
  DeviceProtection = c("Yes", "No", "Yes", "No internet service"),
  StreamingTV = c("Yes", "No", "Yes", "No internet service"),
  StreamingMovies = c("Yes", "No", "No", "No internet service"),
  Contract = c("Two year", "Month-to-month", "One year", "Month-to-month"),
  PaperlessBilling = c("Yes", "Yes", "No", "No"),
  PaymentMethod = c("Bank transfer (automatic)", "Electronic check", 
                   "Credit card (automatic)", "Mailed check"),
  TotalCharges_scaled = c(1, -1, 0, -0.5),
  Customer_Type = c("High Value", "High Risk", "Medium Value", "Traditional")
)

# Generate predictions
pred_draws <- posterior_epred(bayes_weibull, 
                             newdata = new_customers,
                             ndraws = 4000)

# Calculate survival probabilities at different time points
time_points <- c(6, 12, 24, 36, 60)
survival_probs <- list()

for (i in 1:nrow(new_customers)) {
  customer_draws <- pred_draws[, i]
  probs <- sapply(time_points, function(t) mean(customer_draws > t))
  survival_probs[[i]] <- data.frame(
    Customer = new_customers$Customer_Type[i],
    Time = time_points,
    Survival_Prob = probs,
    Lower_CI = sapply(time_points, function(t) quantile(customer_draws > t, 0.025)),
    Upper_CI = sapply(time_points, function(t) quantile(customer_draws > t, 0.975))
  )
}

# Plot
all_survival_probs <- bind_rows(survival_probs)

ggplot(all_survival_probs, aes(x = Time, y = Survival_Prob, color = Customer)) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = Lower_CI, ymax = Upper_CI, fill = Customer), alpha = 0.2) +
  scale_y_continuous(limits = c(0, 1)) +
  labs(title = "Predicted Survival Curves for Customer Profiles",
       x = "Time (months)", y = "Survival Probability") +
  theme(legend.position = "bottom")

Code
# Median survival times
median_survival <- data.frame(
  Customer_Type = new_customers$Customer_Type,
  Median_Survival = apply(pred_draws, 2, median),
  Q25 = apply(pred_draws, 2, quantile, 0.25),
  Q75 = apply(pred_draws, 2, quantile, 0.75)
)

median_survival %>%
  gt() %>%
  tab_header(title = "Predicted Median Survival Times") %>%
  fmt_number(columns = -Customer_Type, decimals = 1)
Predicted Median Survival Times
Customer_Type Median_Survival Q25 Q75
High Value 1,181.9 1,047.5 1,343.9
High Risk 11.1 10.7 11.5
Medium Value 66.1 62.3 70.5
Traditional 78.8 74.3 83.7

Here we created different customer profiles and used our model to predict survival curves for each. The plot illustrates that a high risk customer (month-to-month contract) has a survival probability that quickly decreases to zero, while a medium value customer (one year contract) starts to slowly decrease at around 35 months. The table of median survival times shows that a high value customer (two year contract) has a median survival time of 98 years. Although we expect this type of customer to have a longer survival time, this quantity is much larger than any observed survival time in the data (the maximum being 72 months (6 years)), so we are hesitant to trust these results.

8. Model Comparison Using Information Criteria

Code
# Calculate WAIC and LOO
waic_result <- waic(bayes_weibull)
loo_result <- loo(bayes_weibull)

waic_result_inform <- waic(bayes_weibull_inform)
loo_result_inform <- loo(bayes_weibull_inform)

waic_result_vague <- waic(bayes_weibull_vague)
loo_result_vague <- loo(bayes_weibull_vague)

cat("Default\n")
Default
Code
cat("WAIC: ", waic_result$estimates["waic", "Estimate"], 
    " (SE: ", waic_result$estimates["waic", "SE"], ")\n", sep = "")
WAIC: 15702.76 (SE: 293.2267)
Code
cat("LOO: ", loo_result$estimates["looic", "Estimate"], 
    " (SE: ", loo_result$estimates["looic", "SE"], ")\n", sep = "")
LOO: 15702.78 (SE: 293.227)
Code
cat("\nInformative\n")

Informative
Code
cat("WAIC: ", waic_result_inform$estimates["waic", "Estimate"], 
    " (SE: ", waic_result_inform$estimates["waic", "SE"], ")\n", sep = "")
WAIC: 15704.69 (SE: 292.2495)
Code
cat("LOO: ", loo_result_inform$estimates["looic", "Estimate"], 
    " (SE: ", loo_result_inform$estimates["looic", "SE"], ")\n", sep = "")
LOO: 15704.71 (SE: 292.2499)
Code
cat("\nVague\n")

Vague
Code
cat("WAIC: ", waic_result_vague$estimates["waic", "Estimate"], 
    " (SE: ", waic_result_vague$estimates["waic", "SE"], ")\n", sep = "")
WAIC: 15703.04 (SE: 293.6181)
Code
cat("LOO: ", loo_result_vague$estimates["looic", "Estimate"], 
    " (SE: ", loo_result_vague$estimates["looic", "SE"], ")\n", sep = "")
LOO: 15703.06 (SE: 293.6184)

We decided to use standardized metrics to test our model’s predictive ability. The widely applicable information criterion (WAIC) and leave-one-out cross-validation (LOO) values can be used to compare models, with smaller values indicating a better model. Using the same priors as before, we can see that the best model is made using our initially selected prior (default) for both metrics (although it is very close).

9. Conclusions

9.1 Key Findings

Our Bayesian survival analysis reveals several insights about customer churn:

  1. Contract Type is Paramount: Two-year contracts reduce churn hazard by approximately 98.2% compared to month-to-month contracts, with very high posterior probability (>99.9%). This finding is consistent with the frequentist Cox model (98.4% reduction).

  2. Payment Method Matters: Electronic check payments increase churn risk by 48.7%, while automatic payment methods (bank transfer, credit card) show better retention.

  3. Service Bundle Effects: Customers with online security, device protection, and tech support show lower churn rates, suggesting that service integration increases customer retention.

  4. Total Charges Relationship: Higher total charges are associated with lower churn risk, though the effect size is small. This likely reflects customer tenure and engagement.

9.2 Advantages of the Bayesian Approach

  1. Full Uncertainty Quantification: Unlike point estimates with confidence intervals, this model provides posterior distributions for all parameters.

  2. Probabilistic Statements: We can make direct probability statements (e.g., “There’s a 99.9% probability that two-year contracts reduce churn”) rather than confidence statements.

  3. Predictive Distributions: Natural framework for generating predictions with uncertainty for new customers.

  4. Model Flexibility: The Weibull assumption is more flexible than the Cox model’s proportional hazards assumption, which was violated for most variables in the frequentist analysis.

  5. Prior Information: Ability to incorporate domain knowledge through priors (though our analysis used weakly informative priors).

9.3 Comparison with Frequentist Results

The Bayesian and frequentist approaches yielded similar estimates for key effects:

  • Contract effects differ by less than 2%

  • Payment method effects are consistent in direction and similar in magnitude

  • Both identify the same key predictors of churn

The main differences are:

  • The Bayesian approach doesn’t suffer from proportional hazards violations

  • We get full posterior distributions rather than point estimates

9.4 Practical Implications

  1. Retention Strategy: Focus on converting month-to-month customers to longer contracts
  2. Payment Method: Encourage automatic payment methods over electronic checks
  3. Service Bundling: Promote security and support services to increase retention
  4. Early Intervention: The high early hazard suggests focusing retention efforts on new customers