Statistical Analysis of Soil Data

Author

Andrew Kerr, Kyle Bistrain, Rachel Roggenkemper, Liam Quach

Introduction and Data Preparation

Here, we are loading the data and preparing it for analysis. This includes cleaning the column names, converting data types, and creating new variables that will be useful for modeling.

Data Read-in

This section reads in the data from the master Excel file. We’ll be working with the combined dataset for the analyses.

Code

# Read in the data from the Excel file
df_combined <- read_xlsx(here::here("data", "Master.xlsx"), sheet = "combined")

# Convert relevant columns to factors for modeling
df_combined <- df_combined %>%
  mutate(
    sample_id = as.factor(sample_id),
    treatment = as.factor(treatment),
    site = as.factor(site),
    block = as.factor(block),
    study_year = as.factor(study_year)
  )

# Define the response variables we'll be analyzing
response_vars <- c("nh4", "no3", "mg", "p", "ec", "p_h")

Handling Values Below the Limit of Detection (LOD)

A common challenge in environmental data is dealing with measurements that are below the limit of detection (LOD). These are recorded as “LOD” in the dataset. We will explore two common approaches to handle these values:

Treating LOD as Censored Data: This is a more statistically robust approach. It acknowledges that the true value is somewhere between 0 and the LOD, rather than assuming it is exactly 0.
Replacing LOD with 0: This is a simple approach, but it can bias the results, especially if there are many LOD values.

Here, we create two datasets to reflect these two approaches.

Code

# Create a dataset where LOD is replaced with 0
df_combined_LOD_0 <- df_combined %>%
  mutate(across(
    all_of(response_vars),
    ~ as.numeric(ifelse(.x == "LOD", 0, .x))
  ),
  
    date = as.factor(date),
    # Create combined grouping factors
    site_block_id = interaction(site, block, drop = TRUE),
    trt_site_block_id = interaction(treatment, site, block, drop = TRUE)
  )

# Create a dataset for censored models
# A 'censored' column is added for each response variable.
# The actual response variable is set to a placeholder (e.g., the detection limit) when censored.
lc_model_data <- df_combined %>%
  mutate(
    nh4_cens = ifelse(nh4 == "LOD", -1, 0),
    no3_cens = ifelse(no3 == "LOD", -1, 0),
    mg_cens = ifelse(mg == "LOD", -1, 0),
    p_cens = ifelse(p == "LOD", -1, 0),
    
    nh4 = ifelse(nh4 == "LOD", nh4_lod, as.numeric(nh4)),
    no3 = ifelse(no3 == "LOD", n03_lod, as.numeric(no3)),
    mg = ifelse(mg == "LOD", mg_lod, as.numeric(mg)),
    p = ifelse(p == "LOD", p_lod, as.numeric(p)),

    date = as.factor(date),
    # Create combined grouping factors
    site_block_id = interaction(site, block, drop = TRUE),
    trt_site_block_id = interaction(treatment, site, block, drop = TRUE)
  )

Depending on which method we use, the models answer slightly different research questions.

Treating LOD as 0

Assumption: You are assuming that any measurement below the Limit of Detection (LOD) is a true zero. However, since your data does not have any true zeros this allows us to use a zero-inflated model to model (1) probability below LOD and (2) mean if above LOD.
Research Question(s) it Answers: This approach splits your research question into two parts:
1. Presence/Absence: Does the treatment affect the probability that the response is LOD?
2. Concentration when Present: If the response is present (i.e., not LOD), how does the treatment affect its concentration?
Downside: In most environmental systems, a value below the LOD doesn’t mean the response is completely absent, just that its concentration is too low for your equipment to measure accurately. By substituting zero, you are artificially deflating the mean and variance of your data, which can bias your results and lead you to underestimate the true average concentration.

Treating LOD as Left-Censored

Assumption: You assume that a value below the LOD is not a known value, only that its true concentration is somewhere between 0 and the detection limit. The model doesn’t force it to be zero; it incorporates this uncertainty into the analysis.
Research Question(s) it Answers: This approach answers a single, more direct question: How does the treatment affect the true underlying concentration of the response, while properly accounting for the limitations and uncertainty of the measurement instrument?
Advantage: It doesn’t “invent” data by substituting zero. Instead, it uses all the information you have (i.e., “we know it’s less than X”) to provide a less biased and more accurate estimate of the treatment effects on the actual response concentrations.

Exploratory Data Analysis

Before we build our models, it’s important to explore the data to understand its structure and identify any patterns or issues.

Proportion of Zero/LOD Values

Here, we’ll look at the proportion of zeros (after converting LOD to 0) for each response variable. This will help us understand the extent of the “zero problem”.

Proportions in each Response
Response	LODs	Zeros
nh4	0.8220	0
no3	0.7165	0
p	0.2971	0
mg	0.0032	0
ec	0.0000	0
p_h	0.0000	0

As we can see, nh4 and no3 have a very high proportion of zero values (over 70%). p also has a substantial number of zeros. ec and p_h have no zeros. This confirms that standard linear models assuming normality will be inappropriate for the first three, but may be suitable for ec and p_h.

Visualizing the Data

Boxplots show the distribution of the non-zero data to reveal potential differences between treatments.

Code

# Function to create boxplots for each response variable
plot_concentration <- function(r, data) {
  data %>%
    filter(.data[[r]] > 0) %>% # We only plot non-zero values for clarity
    ggplot(aes(x = treatment, y = .data[[r]], fill = treatment)) +
    geom_boxplot(alpha = 0.7) +
    geom_point(alpha = 0.3) +
    facet_wrap(vars(study_year), scales = "free") +
    labs(
      title = paste(tools::toTitleCase(r), "Concentration (Non-Zero) by Treatment"),
      x = "Treatment",
      y = "Value",
      fill = "Treatment"
    ) +
    theme_bw()
}

# Create and display the plots
plots_0 <- map(response_vars, plot_concentration, data = df_combined_LOD_0)
wrap_plots(plots_0, ncol = 2, guides = "collect")

Modeling Approach 1: Censored Models for Data with LODs (`brms`)

This is a statistically robust approach for nh4, no3, mg, and p. It accounts for the fact that LOD values are not true zeros but are values below a certain threshold. We use the brms package for Bayesian modeling.

Note: Bayesian models are computationally intensive and can take a long time to run. For this report, we are loading pre-fitted model objects.

Censored Model for Nitrate (`no3`)

Code

# fit_no3 <- brm(
#   bf(no3 | cens(no3_cens) ~ treatment * date + (1 | site_block_id) + (1 | trt_site_block_id)),
#   data = lc_model_data, family = Gamma(link = "log"),
#   iter = 4000, warmup = 1000, chains = 4, cores = 4, seed = 12345,
#   file = "models/fit_no3"
# )
fit_no3 <- read_rds(here::here("models", "fit_no3.rds"))

Code

# summary(fit_no3)
pp_check(fit_no3, ndraws = 100) + xlim(0, 100)

Warning: Censored responses are not shown in 'pp_check'.

Warning: Removed 3787 rows containing non-finite outside the scale range
(`stat_density()`).

Warning: Removed 41 rows containing non-finite outside the scale range
(`stat_density()`).

The cens(no3_cens) term tells brms to treat the data as left-censored. The model summary shows the estimated effects. The pp_check plot shows the model fits the overall distribution reasonably well, though the many warnings about divergent transitions indicate potential convergence problems. The results should be interpreted with caution.

Censored Model for Ammonium (`nh4`)

Code

# fit_nh4 <- brm(
#   bf(nh4 | cens(nh4_cens) ~ treatment * date + (1 | site_block_id) + (1 | trt_site_block_id)),
#   data = lc_model_data, family = Gamma(link = "log"),
#   iter = 4000, warmup = 1000, chains = 4, cores = 4, seed = 12345,
#   file = "models/fit_no3"
# )
fit_nh4 <- read_rds(here::here("models", "fit_nh4.rds"))

Code

# summary(fit_nh4)
pp_check(fit_nh4, ndraws = 100) + xlim(0, 10)

Warning: Censored responses are not shown in 'pp_check'.

Warning: Removed 728 rows containing non-finite outside the scale range
(`stat_density()`).

Similar to the nitrate model, the nh4 model shows many divergent transitions, which is a red flag for model reliability. The posterior predictive check (pp_check) suggests the model struggles to capture the shape of the data perfectly, which is common with highly zero-inflated datasets.

Censored Model for Magnesium (`mg`)

Code

# fit_mg <- brm(
#   bf(mg | cens(mg_cens) ~ treatment * date + (1 | site_block_id)) + (1 | trt_site_block_id)),
#   data = lc_model_data, family = Gamma(link = "log"),
#   iter = 4000, warmup = 1000, chains = 4, cores = 4, seed = 12345,
#   file = "models/fit_mg"
# )
fit_mg <- read_rds(here::here("models", "fit_mg.rds"))

Code

summary(fit_mg)

Warning: There were 149 divergent transitions after warmup. Increasing
adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

 Family: gamma 
  Links: mu = log; shape = identity 
Formula: mg | cens(mg_censored) ~ date * treatment + (1 | site_block_id) + (1 | trt_site_block_id) 
   Data: lc_model_data (Number of observations: 633) 
  Draws: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
         total post-warmup draws = 16000

Multilevel Hyperparameters:
~site_block_id (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.40      0.18     0.17     0.87 1.00     2032      743

~trt_site_block_id (Number of levels: 48) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.38      0.05     0.29     0.50 1.00     6068     9749

Regression Coefficients:
                            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
Intercept                       3.90      0.30     3.29     4.45 1.00      869
date2024M01M12                  0.20      0.24    -0.28     0.68 1.00     2141
date2024M01M23                  0.25      0.24    -0.23     0.71 1.00     1933
date2024M02M07                 -0.01      0.24    -0.48     0.46 1.00     2122
date2024M02M21                  0.12      0.25    -0.36     0.62 1.00     2153
date2024M03M06                  0.19      0.25    -0.30     0.66 1.00     1806
date2024M03M22                  0.38      0.24    -0.09     0.86 1.00     1848
date2024M04M04                  0.48      0.26    -0.02     0.98 1.00     2203
date2024M04M16                  0.34      0.24    -0.14     0.81 1.00     2128
date2025M02M08                 -0.66      0.24    -1.14    -0.19 1.00     1980
date2025M02M15                 -0.48      0.25    -0.97     0.00 1.00     2085
date2025M03M04                 -0.35      0.26    -0.85     0.16 1.00     2299
date2025M03M16                 -0.14      0.24    -0.61     0.33 1.00     2104
date2025M03M27                  0.03      0.26    -0.47     0.53 1.00     2299
date2025M04M11                  0.21      0.27    -0.31     0.75 1.00     1922
treatmentMC1                    0.86      0.31     0.25     1.47 1.00     2513
treatmentMC2                    0.58      0.32    -0.04     1.22 1.00     2455
treatmentPC0                    0.01      0.32    -0.62     0.62 1.00     2577
treatmentPC1                    0.48      0.31    -0.12     1.09 1.00     2187
treatmentPC2                    0.10      0.35    -0.57     0.79 1.00     1728
date2024M01M12:treatmentMC1    -0.27      0.33    -0.92     0.38 1.00     2988
date2024M01M23:treatmentMC1    -0.32      0.33    -0.96     0.33 1.00     2830
date2024M02M07:treatmentMC1    -0.55      0.32    -1.20     0.07 1.00     2961
date2024M02M21:treatmentMC1    -0.73      0.33    -1.39    -0.09 1.00     2731
date2024M03M06:treatmentMC1    -0.81      0.33    -1.47    -0.15 1.00     2874
date2024M03M22:treatmentMC1    -0.80      0.33    -1.44    -0.15 1.00     2825
date2024M04M04:treatmentMC1    -0.92      0.35    -1.60    -0.23 1.00     2874
date2024M04M16:treatmentMC1    -1.01      0.33    -1.66    -0.37 1.00     2227
date2025M02M08:treatmentMC1     0.01      0.34    -0.67     0.66 1.00     2500
date2025M02M15:treatmentMC1    -0.07      0.33    -0.72     0.57 1.00     2770
date2025M03M04:treatmentMC1    -0.10      0.34    -0.78     0.56 1.00     3144
date2025M03M16:treatmentMC1    -0.47      0.32    -1.12     0.16 1.00     2787
date2025M03M27:treatmentMC1    -0.50      0.34    -1.19     0.16 1.00     2750
date2025M04M11:treatmentMC1    -0.40      0.36    -1.11     0.29 1.00     3045
date2024M01M12:treatmentMC2    -0.04      0.33    -0.69     0.62 1.00     2875
date2024M01M23:treatmentMC2     0.06      0.33    -0.59     0.70 1.00     2775
date2024M02M07:treatmentMC2    -0.19      0.33    -0.84     0.45 1.00     2798
date2024M02M21:treatmentMC2    -0.14      0.34    -0.82     0.51 1.00     2710
date2024M03M06:treatmentMC2    -0.10      0.34    -0.76     0.57 1.00     2825
date2024M03M22:treatmentMC2    -0.11      0.33    -0.77     0.53 1.00     2511
date2024M04M04:treatmentMC2    -0.19      0.34    -0.86     0.48 1.00     2886
date2024M04M16:treatmentMC2    -0.37      0.34    -1.04     0.29 1.00     2516
date2025M02M08:treatmentMC2     0.19      0.35    -0.49     0.86 1.00     2463
date2025M02M15:treatmentMC2     0.66      0.34    -0.01     1.31 1.00     2887
date2025M03M04:treatmentMC2     0.55      0.35    -0.15     1.23 1.00     3094
date2025M03M16:treatmentMC2     0.21      0.33    -0.44     0.85 1.00     2700
date2025M03M27:treatmentMC2     0.12      0.34    -0.56     0.79 1.00     3062
date2025M04M11:treatmentMC2     0.12      0.39    -0.64     0.88 1.00     3304
date2024M01M12:treatmentPC0    -0.31      0.33    -0.95     0.35 1.00     2992
date2024M01M23:treatmentPC0    -0.56      0.32    -1.19     0.07 1.00     2745
date2024M02M07:treatmentPC0    -0.40      0.33    -1.05     0.23 1.00     2922
date2024M02M21:treatmentPC0    -0.37      0.33    -1.02     0.28 1.00     3116
date2024M03M06:treatmentPC0    -0.36      0.33    -1.00     0.30 1.00     2483
date2024M03M22:treatmentPC0    -0.54      0.33    -1.19     0.11 1.00     2706
date2024M04M04:treatmentPC0    -0.59      0.34    -1.25     0.09 1.00     2953
date2024M04M16:treatmentPC0    -0.51      0.33    -1.17     0.13 1.00     3257
date2025M02M08:treatmentPC0    -0.59      0.39    -1.35     0.17 1.00     3075
date2025M02M15:treatmentPC0    -0.55      0.33    -1.21     0.10 1.00     2969
date2025M03M04:treatmentPC0    -0.46      0.36    -1.17     0.26 1.00     3014
date2025M03M16:treatmentPC0    -0.81      0.34    -1.47    -0.14 1.00     3140
date2025M03M27:treatmentPC0    -0.67      0.38    -1.39     0.08 1.00     3335
date2025M04M11:treatmentPC0    -0.53      0.37    -1.25     0.19 1.00     2247
date2024M01M12:treatmentPC1    -0.46      0.33    -1.11     0.19 1.00     2995
date2024M01M23:treatmentPC1    -0.60      0.33    -1.23     0.05 1.00     2892
date2024M02M07:treatmentPC1    -0.91      0.32    -1.54    -0.29 1.00     2752
date2024M02M21:treatmentPC1    -0.84      0.33    -1.48    -0.21 1.00     2476
date2024M03M06:treatmentPC1    -0.73      0.33    -1.37    -0.08 1.00     2880
date2024M03M22:treatmentPC1    -0.82      0.33    -1.46    -0.18 1.00     2772
date2024M04M04:treatmentPC1    -0.95      0.34    -1.61    -0.28 1.00     2979
date2024M04M16:treatmentPC1    -1.03      0.34    -1.70    -0.37 1.00     2945
date2025M02M08:treatmentPC1    -1.01      0.33    -1.66    -0.36 1.00     2785
date2025M02M15:treatmentPC1    -1.05      0.34    -1.72    -0.39 1.00     2882
date2025M03M04:treatmentPC1    -1.12      0.34    -1.80    -0.45 1.00     3182
date2025M03M16:treatmentPC1    -1.05      0.34    -1.71    -0.39 1.00     3087
date2025M03M27:treatmentPC1    -1.04      0.35    -1.72    -0.34 1.00     3212
date2025M04M11:treatmentPC1    -1.23      0.40    -2.00    -0.44 1.00     3125
date2024M01M12:treatmentPC2    -0.29      0.36    -1.00     0.43 1.00     2778
date2024M01M23:treatmentPC2    -0.55      0.36    -1.25     0.14 1.00     2333
date2024M02M07:treatmentPC2    -0.78      0.36    -1.47    -0.07 1.00     2159
date2024M02M21:treatmentPC2    -0.61      0.36    -1.31     0.11 1.00     2491
date2024M03M06:treatmentPC2    -0.58      0.37    -1.30     0.13 1.00     1856
date2024M03M22:treatmentPC2    -0.55      0.36    -1.26     0.15 1.00     2027
date2024M04M04:treatmentPC2    -0.62      0.37    -1.34     0.12 1.00     2706
date2024M04M16:treatmentPC2    -0.68      0.36    -1.39     0.02 1.00     3141
date2025M02M08:treatmentPC2    -0.65      0.37    -1.37     0.07 1.00     2916
date2025M02M15:treatmentPC2    -0.36      0.36    -1.07     0.35 1.00     2638
date2025M03M04:treatmentPC2    -0.22      0.38    -0.96     0.52 1.00     2541
date2025M03M16:treatmentPC2    -0.40      0.36    -1.11     0.30 1.00     2830
date2025M03M27:treatmentPC2    -0.38      0.38    -1.14     0.36 1.00     2334
date2025M04M11:treatmentPC2    -0.56      0.43    -1.38     0.30 1.00     2570
                            Tail_ESS
Intercept                        489
date2024M01M12                  4611
date2024M01M23                  4161
date2024M02M07                  4299
date2024M02M21                  5113
date2024M03M06                  5267
date2024M03M22                  5165
date2024M04M04                  5257
date2024M04M16                  5252
date2025M02M08                  4562
date2025M02M15                  5271
date2025M03M04                  5281
date2025M03M16                  5087
date2025M03M27                  5755
date2025M04M11                  5094
treatmentMC1                    5223
treatmentMC2                    6142
treatmentPC0                    6150
treatmentPC1                    5902
treatmentPC2                    1876
date2024M01M12:treatmentMC1     7291
date2024M01M23:treatmentMC1     6716
date2024M02M07:treatmentMC1     6611
date2024M02M21:treatmentMC1     5988
date2024M03M06:treatmentMC1     6631
date2024M03M22:treatmentMC1     6946
date2024M04M04:treatmentMC1     6233
date2024M04M16:treatmentMC1     5203
date2025M02M08:treatmentMC1     6029
date2025M02M15:treatmentMC1     5923
date2025M03M04:treatmentMC1     6228
date2025M03M16:treatmentMC1     5853
date2025M03M27:treatmentMC1     7492
date2025M04M11:treatmentMC1     7125
date2024M01M12:treatmentMC2     7081
date2024M01M23:treatmentMC2     6424
date2024M02M07:treatmentMC2     6043
date2024M02M21:treatmentMC2     6387
date2024M03M06:treatmentMC2     7069
date2024M03M22:treatmentMC2     5811
date2024M04M04:treatmentMC2     6399
date2024M04M16:treatmentMC2     7370
date2025M02M08:treatmentMC2     6477
date2025M02M15:treatmentMC2     6186
date2025M03M04:treatmentMC2     6360
date2025M03M16:treatmentMC2     6414
date2025M03M27:treatmentMC2     7132
date2025M04M11:treatmentMC2     6804
date2024M01M12:treatmentPC0     6866
date2024M01M23:treatmentPC0     7129
date2024M02M07:treatmentPC0     6789
date2024M02M21:treatmentPC0     6619
date2024M03M06:treatmentPC0     5943
date2024M03M22:treatmentPC0     7353
date2024M04M04:treatmentPC0     6545
date2024M04M16:treatmentPC0     7638
date2025M02M08:treatmentPC0     8050
date2025M02M15:treatmentPC0     6781
date2025M03M04:treatmentPC0     7508
date2025M03M16:treatmentPC0     6850
date2025M03M27:treatmentPC0     7941
date2025M04M11:treatmentPC0     5367
date2024M01M12:treatmentPC1     6645
date2024M01M23:treatmentPC1     6957
date2024M02M07:treatmentPC1     7158
date2024M02M21:treatmentPC1     6270
date2024M03M06:treatmentPC1     7322
date2024M03M22:treatmentPC1     6564
date2024M04M04:treatmentPC1     6755
date2024M04M16:treatmentPC1     7499
date2025M02M08:treatmentPC1     6374
date2025M02M15:treatmentPC1     7465
date2025M03M04:treatmentPC1     7572
date2025M03M16:treatmentPC1     7202
date2025M03M27:treatmentPC1     7874
date2025M04M11:treatmentPC1     8764
date2024M01M12:treatmentPC2     6428
date2024M01M23:treatmentPC2     6396
date2024M02M07:treatmentPC2     6143
date2024M02M21:treatmentPC2     6478
date2024M03M06:treatmentPC2     3113
date2024M03M22:treatmentPC2     4921
date2024M04M04:treatmentPC2     7305
date2024M04M16:treatmentPC2     6883
date2025M02M08:treatmentPC2     7278
date2025M02M15:treatmentPC2     7244
date2025M03M04:treatmentPC2     6715
date2025M03M16:treatmentPC2     6105
date2025M03M27:treatmentPC2     3233
date2025M04M11:treatmentPC2     7080

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     5.78      0.35     5.11     6.48 1.00    13355    11217

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Code

pp_check(fit_mg, ndraws = 100)

Warning: Censored responses are not shown in 'pp_check'.

The mg model appears to have converged much better than the nh4 and no3 models, with few warnings. The pp_check shows a good fit between the model’s predictions and the observed data. This suggests the censored Gamma model is a good choice for mg.

Censored Model for Phosphate (`p`)

Code

# fit_p <- brm(
#   bf(p | cens(p_cens) ~ treatment * date + (1 | site_block_id)) + (1 | trt_site_block_id)),
#   data = lc_model_data, family = Gamma(link = "log"),
#   iter = 4000, warmup = 1000, chains = 4, cores = 4, seed = 12345,
#   file = "models/fit_p"
# )
fit_p <- read_rds(here::here("models", "fit_p.rds"))

Code

# summary(fit_p)
pp_check(fit_p, ndraws = 100) + xlim(0, 1)

Warning: Censored responses are not shown in 'pp_check'.

Warning: Removed 1714 rows containing non-finite outside the scale range
(`stat_density()`).

Warning: Removed 18 rows containing non-finite outside the scale range
(`stat_density()`).

The phosphate model also shows signs of convergence issues. This is likely due to the combination of censored data and the complex interaction structure. The model fit, as seen in the pp_check, is reasonable but not perfect.

Modeling Approach 2: Zero-Inflated Models (`glmmTMB`)

This approach models the data in two parts: a logistic regression for the probability of a zero vs. a non-zero value, and a separate model (e.g., Gamma or Lognormal) for the non-zero values. This is an alternative for nh4, no3, and p.

Note: We are only modeling Study Year 1 for these models.

Zero-Inflated Model for Nitrate (`no3`)

Code

fit_no3_zi <- glmmTMB(
  no3 ~ treatment * date + (1 | site_block_id) + (1 | trt_site_block_id),
  ziformula = ~treatment,
  data = df_model,
  family = glmmTMB::lognormal(link = "log")
)

Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

Code

# summary(fit_no3_zi)

The warning about the non-positive-definite Hessian matrix indicates convergence problems, meaning the results may not be reliable. The DHARMa residual plots also show some some deviations, suggesting this model is not a great fit.

Zero-Inflated Model for Ammonium (`nh4`)

Code

fit_nh4_zi <- glmmTMB(
  nh4 ~ treatment * date + (1 | site_block_id) + (1 | trt_site_block_id),
  ziformula = ~treatment,
  data = df_model,
  family = glmmTMB::lognormal(link = "log")
)

Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

Code

# summary(fit_nh4_zi)

Code

emmeans(fit_nh4_zi, specs = ~ treatment | date, type = "response") %>%
   cld(Letters = LETTERS, adjust = "tukey")

pred_effects <- ggpredict(fit_nh4_zi, terms = c("date", "treatment"))
pred_data <- as.data.frame(pred_effects)

ggplot() +

  geom_point(
    data = df_model %>% mutate(group = treatment),
    aes(x = date, y = nh4),
    alpha = 0.5, 
    color = "gray50", 
    position = position_jitter(width = 0.2, height = 0) 
  ) +

  geom_ribbon(
    data = pred_data,
    aes(x = x, ymin = conf.low, ymax = conf.high, group = group),
    alpha = 0.2,
    fill = "cornflowerblue"
  ) +

  geom_line(
    data = pred_data,
    aes(x = x, y = predicted, group = group),
    color = "cornflowerblue",
    linewidth = 1
  ) +

  facet_wrap(~ group, scales = "free_y") +
  
  labs(
    x = "Date",
    y = "Ammonium (NH4) Concentration",
    title = "Predicted Values of NH4, Faceted by Treatment"
  ) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
# print(pred_effects)

Similar to the nitrate model, the zero-inflated model for nh4 fails to converge properly. The DHARMa diagnostics, however, look slightly better than for nitrate. Still, the convergence failure is a major issue.

Zero-Inflated Model for Phosphate (`p`)

Code

fit_p_zi <- glmmTMB(
  p ~ treatment * date + (1 | site_block_id) + (1 | trt_site_block_id),
  ziformula = ~treatment,
  data = df_model,
  family = glmmTMB::lognormal(link = "log")
)

Anova(fit_p_zi)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: p
                 Chisq Df Pr(>Chisq)    
treatment       26.422  5  7.389e-05 ***
date           358.255  8  < 2.2e-16 ***
treatment:date  36.069 40     0.6479    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

simulationOutput <- simulateResiduals(fittedModel = fit_p_zi, n = 500)
plot(simulationOutput)

Code

emmeans(fit_p_zi, specs = ~ treatment, type = "response") %>%
  cld(Letters = LETTERS, adjust = "tukey")

NOTE: Results may be misleading due to involvement in interactions

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

 treatment response      SE  df asymp.LCL asymp.UCL .group
 PC0         0.0580 0.00523 Inf    0.0457    0.0735  A    
 MC1         0.0625 0.00578 Inf    0.0490    0.0797  A    
 MC2         0.0632 0.00599 Inf    0.0493    0.0811  A    
 PC1         0.0693 0.00583 Inf    0.0555    0.0865  AB   
 MC0         0.0726 0.00636 Inf    0.0576    0.0914  AB   
 PC2         0.0878 0.00689 Inf    0.0714    0.1079   B   

Results are averaged over the levels of: date 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 6 estimates 
Intervals are back-transformed from the log scale 
P value adjustment: tukey method for comparing a family of 6 estimates 
Tests are performed on the log scale 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same.

Code

pred_effects <- ggpredict(fit_p_zi, terms = c("date", "treatment"))

You are calculating adjusted predictions on the population-level (i.e.
  `type = "fixed"`) for a *generalized* linear mixed model.
  This may produce biased estimates due to Jensen's inequality. Consider
  setting `bias_correction = TRUE` to correct for this bias.
  See also the documentation of the `bias_correction` argument.

Code

pred_data <- as.data.frame(pred_effects)

ggplot() +

  geom_point(
    data = df_model %>% mutate(group = treatment),
    aes(x = date, y = p),
    alpha = 0.5, 
    color = "gray50", 
    position = position_jitter(width = 0.2, height = 0) 
  ) +

  geom_ribbon(
    data = pred_data,
    aes(x = x, ymin = conf.low, ymax = conf.high, group = group),
    alpha = 0.2,
    fill = "cornflowerblue"
  ) +

  geom_line(
    data = pred_data,
    aes(x = x, y = predicted, group = group),
    color = "cornflowerblue",
    linewidth = 1
  ) +

  facet_wrap(~ group, scales = "free_y") +
  
  labs(
    x = "Date",
    y = "Phosphate (P) Concentration",
    title = "Predicted Values of P, Faceted by Treatment"
  ) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Warning: Removed 31 rows containing missing values or values outside the scale range
(`geom_point()`).

Code

# print(pred_effects)

The zero-inflated model for p converges properly, and the DHARMa diagnostics show some deviations, suggesting this model may not be a good fit.

Models for Data with No LODs (`p_h` and `ec`)

For variables like pH and EC that do not have a zero-inflation problem, we can use more standard generalized linear mixed models.

Model for pH (`p_h`)

Code

fit_lme_ph <- lme(
  fixed = p_h ~ as.factor(date) * treatment,
  random = list(
    site_block_id = ~ 1, # Random intercept for site:block combinations
    trt_site_block_id = ~ 1 # Random intercept for treatment:site:block combinations
  ),
  data = df_model,
  na.action = na.exclude
)

anova(fit_lme_ph)

                          numDF denDF  F-value p-value
(Intercept)                   1   302 9775.941  <.0001
as.factor(date)               8   302    7.600  <.0001
treatment                     5    35    4.906  0.0017
as.factor(date):treatment    40   302    0.519  0.9933

Code

plot(fit_lme_ph)

Code

emmeans(fit_lme_ph, specs = ~ treatment) %>%
  cld(Letters = LETTERS, adjust = "tukey")

NOTE: Results may be misleading due to involvement in interactions

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

 treatment emmean     SE df lower.CL upper.CL .group
 MC2         7.69 0.0928  7     7.35     8.02  A    
 PC0         7.75 0.0925  7     7.42     8.09  A    
 PC2         7.80 0.0933  7     7.47     8.14  A    
 PC1         7.81 0.0929  7     7.48     8.15  A    
 MC1         7.83 0.0929  7     7.49     8.16  AB   
 MC0         8.04 0.0932  7     7.70     8.38   B   

Results are averaged over the levels of: date 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 6 estimates 
P value adjustment: tukey method for comparing a family of 6 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same.

Code

emmeans(fit_lme_ph, specs = ~ treatment | date) %>%
  cld(Letters = LETTERS, adjust = "tukey")

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

date = 2023-12-27:
 treatment emmean    SE df lower.CL upper.CL .group
 MC1         7.60 0.133  7     7.12     8.08  A    
 PC1         7.67 0.127  7     7.20     8.13  A    
 MC2         7.67 0.133  7     7.19     8.15  A    
 PC2         7.77 0.152  7     7.22     8.32  A    
 PC0         7.90 0.127  7     7.44     8.36  A    
 MC0         8.07 0.141  7     7.56     8.58  A    

date = 2024-01-12:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.77 0.123  7     7.32     8.21  A    
 PC2         7.84 0.133  7     7.36     8.32  A    
 PC1         7.85 0.127  7     7.38     8.31  A    
 PC0         7.86 0.123  7     7.42     8.31  A    
 MC1         7.94 0.127  7     7.48     8.40  A    
 MC0         8.18 0.127  7     7.72     8.64  A    

date = 2024-01-23:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.78 0.133  7     7.30     8.26  A    
 PC0         7.84 0.123  7     7.39     8.28  A    
 PC1         7.92 0.127  7     7.46     8.38  A    
 PC2         8.01 0.123  7     7.56     8.45  A    
 MC1         8.04 0.133  7     7.55     8.52  A    
 MC0         8.11 0.123  7     7.67     8.55  A    

date = 2024-02-07:
 treatment emmean    SE df lower.CL upper.CL .group
 PC0         7.82 0.123  7     7.38     8.26  A    
 PC2         7.88 0.123  7     7.43     8.32  A    
 MC2         7.91 0.123  7     7.47     8.35  A    
 MC1         7.96 0.123  7     7.52     8.41  A    
 PC1         7.99 0.123  7     7.55     8.44  A    
 MC0         8.17 0.123  7     7.73     8.62  A    

date = 2024-02-21:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.63 0.123  7     7.18     8.07  A    
 PC0         7.70 0.123  7     7.26     8.15  A    
 MC1         7.75 0.123  7     7.30     8.19  A    
 PC2         7.76 0.123  7     7.32     8.21  A    
 PC1         7.78 0.123  7     7.34     8.22  A    
 MC0         7.96 0.127  7     7.50     8.42  A    

date = 2024-03-06:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.67 0.123  7     7.22     8.11  A    
 PC0         7.73 0.123  7     7.28     8.17  A    
 PC1         7.80 0.123  7     7.35     8.24  A    
 PC2         7.80 0.127  7     7.34     8.26  A    
 MC1         7.85 0.123  7     7.41     8.29  A    
 MC0         8.04 0.123  7     7.59     8.48  A    

date = 2024-03-22:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.55 0.123  7     7.10     7.99  A    
 PC0         7.64 0.123  7     7.20     8.09  A    
 PC2         7.75 0.123  7     7.30     8.19  A    
 MC1         7.79 0.123  7     7.35     8.23  A    
 PC1         7.85 0.123  7     7.41     8.30  A    
 MC0         7.91 0.127  7     7.45     8.37  A    

date = 2024-04-04:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.74 0.123  7     7.29     8.18  A    
 PC0         7.74 0.123  7     7.29     8.18  A    
 PC1         7.80 0.123  7     7.35     8.24  A    
 MC1         7.80 0.127  7     7.34     8.26  A    
 PC2         7.80 0.127  7     7.34     8.26  A    
 MC0         7.99 0.133  7     7.51     8.47  A    

date = 2024-04-16:
 treatment emmean    SE df lower.CL upper.CL .group
 MC2         7.49 0.127  7     7.03     7.95  A    
 PC0         7.54 0.123  7     7.09     7.98  AB   
 PC2         7.62 0.123  7     7.18     8.07  AB   
 PC1         7.66 0.133  7     7.18     8.14  AB   
 MC1         7.71 0.123  7     7.26     8.15  AB   
 MC0         7.93 0.127  7     7.47     8.39   B   

Degrees-of-freedom method: containment 
Confidence level used: 0.95 
Conf-level adjustment: sidak method for 6 estimates 
P value adjustment: tukey method for comparing a family of 6 estimates 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same.

Code

emmip(fit_lme_ph, ~ date | treatment, CIs = T) + 
  geom_point(data = df_model,
             aes(x = date, y = p_h, color = treatment),
             alpha = 0.2,
             position = position_jitterdodge(jitter.width = 0.1, dodge.width = 0.1)) +
  theme_bw()

Warning: Removed 34 rows containing missing values or values outside the scale range
(`geom_point()`).

The model summary shows significant effects of treatment. The diagnostic plots show randomly scattered residuals, suggesting the model assumptions are met and this is a reliable model.

Model for EC (`ec`)

The residuals for the GLMM for EC showed fanning, therefore a Gamma GLMM is more appropriate than a standard linear model.

Code

fit_ec_gamma <- glmer(
  ec ~ date * treatment +
    (1 | site_block_id) +
    (1 | trt_site_block_id),
  data = df_model,
  family = Gamma(link = "log"),
  control = glmerControl(optimizer = "bobyqa"), # default optimizer did not converge
  na.action = na.exclude
)

Warning in commonArgs(par, fn, control, environment()): maxfun < 10 *
length(par)^2 is not recommended.

Code

Anova(fit_ec_gamma)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: ec
                 Chisq Df Pr(>Chisq)    
date           190.311  8  < 2.2e-16 ***
treatment       25.604  5  0.0001065 ***
date:treatment  71.103 40  0.0017755 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Code

plot(fit_ec_gamma)

Code

emmeans(fit_ec_gamma, specs = ~ treatment | date, type = "response") %>%
  cld(Letters = LETTERS, adjust = "tukey")

Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons

date = 2023-12-27:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.907 0.212 Inf     0.490      1.68  A    
 PC2          1.031 0.250 Inf     0.545      1.95  AB   
 PC1          1.283 0.297 Inf     0.697      2.36  AB   
 MC0          1.319 0.311 Inf     0.709      2.45  AB   
 MC2          1.862 0.439 Inf     1.002      3.46   B   
 MC1          1.874 0.436 Inf     1.016      3.46   B   

date = 2024-01-12:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.827 0.192 Inf     0.449      1.52  A    
 PC1          1.005 0.233 Inf     0.546      1.85  AB   
 PC2          1.054 0.246 Inf     0.570      1.95  AB   
 MC0          1.281 0.295 Inf     0.699      2.35  AB   
 MC1          1.674 0.390 Inf     0.907      3.09   B   
 MC2          1.781 0.411 Inf     0.969      3.27   B   

date = 2024-01-23:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.697 0.162 Inf     0.379      1.28  A    
 PC2          0.792 0.181 Inf     0.434      1.45  A    
 PC1          0.915 0.212 Inf     0.497      1.68  AB   
 MC0          1.129 0.257 Inf     0.620      2.06  ABC  
 MC1          1.614 0.373 Inf     0.879      2.96   BC  
 MC2          1.896 0.438 Inf     1.033      3.48    C  

date = 2024-02-07:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC1          0.841 0.193 Inf     0.460      1.54  A    
 PC2          0.865 0.198 Inf     0.474      1.58  A    
 PC0          0.883 0.205 Inf     0.480      1.63  A    
 MC0          1.166 0.266 Inf     0.640      2.12  A    
 MC1          1.306 0.301 Inf     0.711      2.40  A    
 MC2          1.519 0.351 Inf     0.827      2.79  A    

date = 2024-02-21:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC1          0.902 0.207 Inf     0.493      1.65  A    
 PC0          0.921 0.213 Inf     0.500      1.69  AB   
 PC2          0.921 0.211 Inf     0.504      1.68  AB   
 MC1          1.204 0.278 Inf     0.656      2.21  AB   
 MC0          1.220 0.281 Inf     0.667      2.23  AB   
 MC2          1.654 0.382 Inf     0.901      3.04   B   

date = 2024-03-06:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.733 0.170 Inf     0.399      1.35  A    
 PC2          0.764 0.175 Inf     0.418      1.39  A    
 PC1          0.778 0.179 Inf     0.426      1.42  A    
 MC1          1.050 0.242 Inf     0.572      1.93  AB   
 MC0          1.182 0.269 Inf     0.649      2.15  AB   
 MC2          1.703 0.393 Inf     0.928      3.13   B   

date = 2024-03-22:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.892 0.207 Inf     0.485      1.64  A    
 PC1          0.941 0.216 Inf     0.515      1.72  A    
 PC2          0.988 0.226 Inf     0.541      1.80  A    
 MC1          1.216 0.281 Inf     0.663      2.23  AB   
 MC0          1.343 0.309 Inf     0.733      2.46  AB   
 MC2          1.906 0.440 Inf     1.038      3.50   B   

date = 2024-04-04:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC0          0.695 0.161 Inf     0.378      1.28  A    
 PC1          0.746 0.171 Inf     0.408      1.36  A    
 PC2          0.770 0.178 Inf     0.420      1.41  A    
 MC1          1.054 0.245 Inf     0.571      1.94  AB   
 MC0          1.188 0.276 Inf     0.644      2.19  AB   
 MC2          1.688 0.390 Inf     0.919      3.10   B   

date = 2024-04-16:
 treatment response    SE  df asymp.LCL asymp.UCL .group
 PC1          0.569 0.133 Inf     0.307      1.05  A    
 PC2          0.604 0.138 Inf     0.331      1.10  A    
 PC0          0.616 0.143 Inf     0.335      1.13  A    
 MC1          0.779 0.180 Inf     0.424      1.43  AB   
 MC0          1.008 0.232 Inf     0.551      1.85  AB   
 MC2          1.228 0.286 Inf     0.665      2.27   B   

Confidence level used: 0.95 
Conf-level adjustment: sidak method for 6 estimates 
Intervals are back-transformed from the log scale 
P value adjustment: tukey method for comparing a family of 6 estimates 
Tests are performed on the log scale 
significance level used: alpha = 0.05 
NOTE: If two or more means share the same grouping symbol,
      then we cannot show them to be different.
      But we also did not show them to be the same.

Code

emmip(fit_ec_gamma, ~ date | treatment, type = "response", CIs = T) + 
  geom_point(data = df_model,
             aes(x = date, y = ec, color = treatment),
             alpha = 0.2,
             position = position_jitterdodge(jitter.width = 0.1, dodge.width = 0.1)) +
  theme_bw()

Warning: Removed 28 rows containing missing values or values outside the scale range
(`geom_point()`).

The model converged without warnings. The diagnostic plots show randomly scattered residuals, suggesting the model assumptions are met and this is a reliable model.

Summary and Recommendations

Models for p_h and ec: For pH (p_h) and EC (ec), the standard linear mixed model and Gamma GLMM, respectively, fit the data well, converged without issues, and passed diagnostic checks.
Censored Models are Promising: The Bayesian censored models (brms) are theoretically the most robust approach for the data with LODs. The mg model worked very well. However, the models for nh4, no3, and p struggled to converge.
- Recommendation: For the brms models that had issues, I suggest increasing the number of iterations (e.g., iter = 6000, warmup = 2000) and perhaps trying a more informative prior distribution (currently brms is selecting this for us) if you have prior knowledge about the parameters. This may help the models converge.
Zero-Inflated Models are a Good Alternative: The zero-inflated models (glmmTMB) also faced convergence issues.
- Recommendation: Try simplifying the zero-inflated models. For instance, you could remove the interaction term (treatment * date) and use additive effects (treatment + date) to see if that helps with convergence.
Since we did not find a good model fit for nh4, no3, and p, we suggest fitting a binomial GLMM where 1 = above LOD and 0 = below LOD since the convergence issues on our models may come from some combinations of treatment and date not having any non-LOD values.

Introduction and Data Preparation

Data Read-in

Handling Values Below the Limit of Detection (LOD)

Treating LOD as 0

Treating LOD as Left-Censored

Exploratory Data Analysis

Proportion of Zero/LOD Values

Visualizing the Data

Modeling Approach 1: Censored Models for Data with LODs (brms)

Censored Model for Nitrate (no3)

Censored Model for Ammonium (nh4)

Censored Model for Magnesium (mg)

Censored Model for Phosphate (p)

Modeling Approach 2: Zero-Inflated Models (glmmTMB)

Zero-Inflated Model for Nitrate (no3)

Zero-Inflated Model for Ammonium (nh4)

Zero-Inflated Model for Phosphate (p)

Models for Data with No LODs (p_h and ec)

Model for pH (p_h)

Model for EC (ec)

Summary and Recommendations

Modeling Approach 1: Censored Models for Data with LODs (`brms`)

Censored Model for Nitrate (`no3`)

Censored Model for Ammonium (`nh4`)

Censored Model for Magnesium (`mg`)

Censored Model for Phosphate (`p`)

Modeling Approach 2: Zero-Inflated Models (`glmmTMB`)

Zero-Inflated Model for Nitrate (`no3`)

Zero-Inflated Model for Ammonium (`nh4`)

Zero-Inflated Model for Phosphate (`p`)

Models for Data with No LODs (`p_h` and `ec`)

Model for pH (`p_h`)

Model for EC (`ec`)