11. February 2026

Samuel Okoe Mensah (University of Bergen)

Research Seminar: Text Classification for the Automation of Article Selection in Medical Systematic Reviews

Abstract:

Medical researchers conducting systematic reviews often spend up to six months manually screening thousands of research abstracts to identify relevant studies. This presentation examines how computational linguistic methods can address this bottleneck, focusing on supervised text classification enriched with domain-specific ontologies.
The research centered on a key challenge: achieving high classification accuracy with severely limited training data (150 labeled abstracts). Traditional machine learning approaches typically require thousands of examples, making them impractical for specialized medical domains where expert annotation is expensive and time-consuming.
The proposed solution integrates structured medical knowledge from SNOMED-CT (Systematized Nomenclature of Medicine-Clinical Terms) and neurological ontologies directly into the feature space. This approach allows the classifier to leverage semantic relationships between medical concepts rather than relying solely on surface-level word patterns. For instance, when the model encounters terms like “cognitive decline” or “memory impairment,” the ontology provides hierarchical context that connects these to broader categories of neurological conditions.
The methodology involved systematic comparison of eleven different model configurations, testing the impact of stopword removal, n-gram features, and varying levels of ontological enrichment. The best-performing model achieved 93% accuracy (F1-score), enabling a reduction in screening time from approximately six months to one week in a pilot setting.
This work demonstrates how applied computational linguistics can bridge theoretical knowledge representation with practical natural language processing, offering insights relevant to fields beyond medicine wherever domain expertise must be encoded into automated systems.
The presentation will discuss both the technical approach and broader implications for the intersection of linguistics, artificial intelligence, and specialized knowledge domains.