MERKIUM

Research

We approach AI as a scientific discipline

Our research is published openly across interpretability, alignment, and evaluation — because understanding how these systems work is the prerequisite for making them safe.

Interpretability

We map the internal representations of Kili models to understand the mechanisms behind their outputs — moving beyond black-box evaluation toward genuine transparency.

Alignment

We develop training methodologies, including Constitutional training, that systematically steer models toward honesty, helpfulness, and harmlessness without sacrificing capability.

Evaluation

We design rigorous benchmarks that measure reliability, reasoning depth, and agentic performance across long-horizon, multi-step tasks in realistic environments.

Recent publications