Non-negative Matrix Factorization
to identify Latent Factors
Underlying Psychopathology

Laura Sità

Problem

Classification of psychopathology

Traditional taxonomies: categorical diagnostic systems

(e.g., DSM, ICD)

Recent models: dimensional and integrative approaches

(e.g. Kotov et al., 2017; Borsboom & Cramer, 2013)

Research Question

How can we identify latent factors?

  • Factor Analysis
  • Proposed approach (inspired by Landy et al., 2025a): Non-negative Matrix Factorization (NMF)

NMF

\(M_{kg} \in \mathbb{R}_{\ge 0}^{K \times G}\) is the full observed data matrix (test item x subject)

NMF decomposes \(M\) into two lower-rank nonnegative matrices:

  • \(P \in \mathbb{R}_{\ge 0}^{K \times N}\) → loadings of observed variables on \(N\) latent factors (test item x latent factor)
  • \(E \in \mathbb{R}_{\ge 0}^{N \times G}\) → expression of latent factors across individuals (latent factor x subject)

\[ M_{kg} = \sum_{n=1}^{N} P_{kn} E_{ng} \]

First step

  • Dataset composed of ordinal variables (e.g., items measured on a Likert scale)

  • Estimate latent dimensions using both Factor Analysis and Non-negative Matrix Factorization (NMF)

  • Compare the two methods by examining the correlation matrix of factor loadings

EFA vs NMF: correlation matrix of factor loadings

CFA vs NMF: correlation matrix of factor loadings

Second Step

How can NMF capture the underlying structure of a questionnaire (items → factor)?

confirmatory NMF?

Estensions

If the results are encouraging …

  • Third step: use NMF to identify latent factors shared within the same spectrum of symptoms

  • Fourth step: use causal NMF (Landy et al., 2025b) to study the effect of different treatments on the same spectrum of symptoms

Materials

All materials are available on GitHub at laurasitaunipd/nmf

Bibliography

Borsboom, D., & Cramer, A. O. (2013). Network analysis: an integrative approach to the structure of psychopathology. Annual review of clinical psychology, 9(1), 91-121.

Kotov, R., Krueger, R. F., Watson, D., Achenbach, T. M., Althoff, R. R., Bagby, R. M., … & Zimmerman, M. (2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of abnormal psychology, 126(4), 454.

Landy, J. M., Basava, N., & Parmigiani, G. (2025a). bayesNMF: Fast Bayesian Poisson NMF with Automatically Learned Rank Applied to Mutational Signatures. arXiv preprint arXiv:2502.18674.

Landy, J. M., Zorzetto, D., De Vito, R., & Parmigiani, G. (2025b). Causal Inference for Latent Outcomes Learned with Factor Models. arXiv preprint arXiv:2506.20549.

Supplemental Materials

EFA

risultato_fa <- factanal(item_data_complete, 
                         factors = 5, 
                         rotation = "promax", # possible factors corr
                         scores = "regression")

R package bayesNMF

result <- bayesNMF(
    data = M_t,
    likelihood = "normal",
    prior = "truncnormal",
    rank = 5
  )

MAP <- result$get_MAP()
P <- MAP$P
P <- as.matrix(P)

Comparison of Item-by-Factor matrices

obtained through EFA and NMF

allLoadings = cbind(P,efa_loadings)
round(cor(allScores),3)

corrplot(
  cor(allScores),
  method = "color",   # fill cells with colors
  type = "full",      # show full matrix
  addCoef.col = "black",   # show correlation coefficients
  number.cex = 0.7,        
  tl.cex = 0.8            
)

CFA

model = "
SMD =~ bessi_1 + bessi_6 + bessi_11 + bessi_16 + bessi_21 + bessi_26 + bessi_31 + bessi_36 + bessi_41 
IND =~ bessi_5 + bessi_10 + bessi_15 + bessi_20 + bessi_25 + bessi_30 + bessi_35 + bessi_40 + bessi_45
COD =~ bessi_3 + bessi_8 + bessi_13 + bessi_18 + bessi_23 + bessi_28 + bessi_33 + bessi_38 + bessi_43 
SED =~ bessi_2 + bessi_7 + bessi_12 + bessi_17 + bessi_22 + bessi_27 + bessi_32 + bessi_37 + bessi_42 
ESD =~ bessi_4 + bessi_9 + bessi_14 + bessi_19 + bessi_24 + bessi_29 + bessi_34 + bessi_39 + bessi_44
"
fit = cfa(model=model, data=dati, ordered=T)
summary(fit, standardized=T)
fitMeasures(fit, fit.measures=c("rmsea","srmr","cfi","nnfi"))

modificationIndices(fit, sort.=T)[1:10,]

Comparison of Item-by-Factor matrices

obtained through CFA and NMF

lambda = inspect(fit, what = "std")$lambda
lambda <- lambda[order(as.numeric(gsub("bessi_", "", rownames(lambda)))), ]

allLoadings = cbind(P,lambda)

corrplot(
  cor(allLoadings),
  method = "color",   # fill cells with colors
  type = "full",      # show full matrix
  addCoef.col = "black",   # show correlation coefficients
  number.cex = 0.7,        # text size for numbers
  tl.cex = 0.8             # text size for labels
)