Hi, I'm Sam
An NLP engineer with a background in linguistics. That
shapes how I approach AI , I tend to start from how domain experts actually reason and work outward from there.I build systems for healthcare and legal contexts where the people using the tools need to trust and understand what the model is doing. Most of my work uses ontologies and structured domain knowledge to get there.
Research Background
MPhil in Linguistics (Computational focus) — University of Bergen (2024)• Thesis: Ontology-enhanced ML for Medical Literature Screening → achieved 90% F1-score, reduced review time from 6 months to 1 week.
• Experience mentoring graduate students and teaching technical topics.
• Legal-AI startup experience (Innovation Norway–supported).Focus areas: ontology-enhanced ML, leakage-safe evaluation, calibration, iterative human-in-the-loop reviews, and reproducible pipelines.
Projects
Featured Projects**Featured Projects
1. Medical Intervention Text Triage (Systematic Reviews)
Automated medical literature screening using ontology-augmented classifiers (SNOMED-CT).
Outcome: Achieved 90% F1-score with 150 training samples. Reduced prescreening time from 6 months to 1 week in pilot settings.
Approach: Started from a simple baseline, performed error analysis, and introduced targeted model complexity with a full audit trail for transparency.
Assurance: External cohort checks and reviewer-level agreement testing.
Stack: Python, TensorFlow, scikit-learn, spaCy, SNOMED-CT ontology2. GDPR Article 9 Compliance Checker (Healthcare AI)Open-source rule engine for scanning healthcare privacy documents and DPIAs against 42 GDPR Article 9 requirements on special-category data.
Outcome: Automatically detects missing legal bases and documentation gaps.
Approach: YAML-based rule logic with evidence extraction and versioned decisions for transparency.
Assurance: Keyword-driven scoring contextualized for focused DPIAs (10–30 % typical coverage); explicit limitations documented (semantic scope, English-only).
Stack: Python, Streamlit, PyMuPDF, YAML, Pandas.3. Legal AI Analysis System (Oslo Startup)Production RAG system for regulatory text analysis, processing 25+ legal cases weekly.
Outcome: Achieved 98% accuracy through iterative prompt engineering (improved from 75% initial). Reduced manual review load and ensured reproducible outputs for compliance teams.
Approach: Retrieval-Augmented Generation combining LLM analysis with Norwegian legal case database. Collaborated with 3 lawyers and 2 developers to validate outputs.
Assurance: Version-controlled prompts, traceable rationales, and governance hooks to meet audit standards.
Stack: Python, Claude API, MongoDB, Azure4. Customer Analytics with Uncertainty (Selected Non-Medical)Built explainable churn prediction models with calibrated confidence intervals.
Outcome: Delivered interpretable drivers of churn and improved decision confidence in retention models.
Assurance: Leakage detection, stability testing, and calibration across time splits.
Stack: Python, scikit-learn, SHAP, XGBoost.5. Human–AI Creative Analysis (Research)Studied 1,298 prompt–image interactions in generative models to understand creative decision patterns.
Outcome: Produced reproducible methodology for prompt analysis and interpretability insights into multimodal model behavior.
Assurance: Versioned datasets and transparent evaluation scripts.
Stack: Python, Hugging Face Transformers, CLIP, Pandas.Core Tools & Methods
Programming / Data: Python, R, Bash, SQL
ML / NLP: PyTorch, TensorFlow, Hugging Face, spaCy, scikit-learn
LLM Integration: Claude API, OpenAI API, RAG architectures, Prompt engineering
Regulatory / Audit: YAML-based rule engines, PyMuPDF, pdfminer
MLOps / Deployment: Streamlit, FastAPI, Docker, GitHub Actions, Azure
Databases: MongoDB, PostgreSQL, SQL
Research / Reproducibility: Jupyter, Pandas, Versioned datasets, Prompt auditing
Current Projects
GDPR Healthcare AI Compliance Scorer
GDPR Article 9 Compliance Checker
Open-source tool that scans healthcare AI documentation for GDPR Article 9 compliance.
• Checks privacy policies, DPIAs, and compliance docs against 42 special category data requirements
• Identifies which legal bases are documented and highlights gaps
•Tested on real DPIAs from healthcare organizations
• Built with: Python, Streamlit, PyMuPDF, YAML-based rules engine🔗 GitHub | 🎯 Try Demo
TeachingSharing knowledge matters to me. I'm currently teaching at the
University of Ghana:• Python for NLP
• Language as Data
• Advanced NLPPreviously at the University of Bergen, I taught Python programming
to 30+ graduate students over two semesters. Most came from
non-technical backgrounds, linguistics, philosophy, and social
sciences. 91% completion rate.
Outside work, you'll usually find me at a piano working through
jazz standards, or in the kitchen trying to get a new recipe
right. I read a lot of sci-fi.I've lived and worked across Ghana, China, and Norway ,three
years on the Volta River coordinating between Ghanaian and Chinese engineering teams, then three years in Bergen. These
days I'm trying to keep my Norwegian from getting rusty and
picking up bits of German and Japanese on the side.I judge cities by their coffee shops.
Technical blog coming soon - insights on explainable AI and healthcare compliance
Built with Carrd — Content © Samuel Okoe-Mensah 2025.