About us:
Articul8 was born from a simple belief: GenAI should work for the enterprise, not the other way around. Our platform combines domain-specific models, autonomous agentic reasoning through ModelMesh(TM), reliable model evaluation through LLM-IQ(TM), and multimodal understanding to serve regulated industries including energy, semiconductor, finance, aerospace, and supply chain. Trusted by Fortune 500 enterprises, we bring together research, engineering, product, and domain expertise to deliver AI that meets the accuracy, explainability, and auditability standards that high-stakes environments demand.
Job Description:
Articul8 AI is seeking a Principal Research Scientist to define how we build, evaluate, and scale domain-specific models as a durable source of competitive advantage. You will lead research across the full model development lifecycle: domain data strategy, continued pre-training, supervised fine-tuning, post-training, evaluation methodology, and the strategic decisions that determine where Articul8 can create and sustain model superiority in the market.
Responsibilities:
- Set company-level technical direction for domain-specific model strategy β define how Articul8 builds, evaluates, scales, and sustains model superiority across continued pre-training, fine-tuning, post-training, and release quality standards, leveraging massively parallel agentic AI systems to compress strategic exploration cycles from months to days
- Architect the agentic model development paradigm for the organization β design the agent-orchestrated research infrastructure (experiment orchestration, data pipeline automation, continuous evaluation, competitive benchmarking) that enables every researcher at Articul8 to operate at a fundamentally higher level of depth, breadth, and velocity than would be possible alone
- Go deep: push the frontier of domain-specific model science β lead research on model adaptation methodology, data curation strategies, post-training methods (preference optimization, reward modeling, reasoning improvement, alignment), and training dynamics, deploying fleets of agentic systems to run exhaustive ablation studies, mixture experiments, and failure analyses in parallel
- Go broad: shape model strategy across all of Articul8's domains and verticals β define how the company identifies, prioritizes, and enters new model domains based on technical feasibility, customer value, and strategic differentiation, using agent-driven competitive intelligence and market analysis to scan the landscape continuously
- Define evaluation strategy as an agentic discipline β establish benchmark design, expert-grounded assessment, model failure analysis, and robustness standards, building always-on agentic evaluation harnesses that compare Articul8 models against leading open and closed alternatives and translate findings into concrete investment decisions in real time
- Lead cross-cutting research initiatives that multiply organizational capability β ensure advances in data perception, retrieval, post-training, and runtime orchestration strengthen the model layer, orchestrating parallel agent-driven research tracks across pillars so breakthroughs in one area compound across the platform
- Influence platform-level decisions β shape model lifecycle management, portfolio strategy, release criteria, and integration architecture, ensuring the platform is designed for humans and agentic systems to co-evolve and amplify each other
- Mentor senior researchers and raise the ceiling on human potential β coach Staff and Senior researchers on designing agent-augmented research programs, raise the bar on technical judgment and experimental rigor, and shape hiring for researchers who are driven to redefine what's possible
- Maintain hands-on research impact at the highest level β sustain a meaningful personal research contribution through technical work, publications, patents, and externally visible output, modeling what it means to be a world-class researcher who uses massively parallel agentic systems to achieve what was previously impossible
Required Qualifications:
- Education: PhD or MSc in Computer Science, Machine Learning, NLP, or a related field.
- Experience: 10+ years in AI/ML research with an exceptional track record of impact β models or systems you built are in production and measurably changed outcomes. 4+ years developing LLM-based systems.
- Model lifecycle mastery: Deep hands-on experience across the full model development lifecycle β continued pretraining, supervised fine-tuning, post-training alignment, and production evaluation. You've made the hard calls about when a model is ready to ship and when it isn't.
- Evaluation rigor: You have designed evaluation methodology that goes beyond leaderboard metrics β domain-expert grounded assessment, systematic error analysis, robustness under distribution shift, and readiness criteria for high-stakes deployment.
- Training at scale: Direct experience training or adapting models on large GPU clusters using distributed frameworks (DeepSpeed, FSDP, Megatron-LM). You understand the interplay between data mixture, training compute, and model quality at a level that informs strategic decisions.
- Software engineering: Proficient in Python and PyTorch. You still write code, review code, and go deep when the problem demands it.
- Strategic leadership: You have shaped research direction at the organizational level β defining what bets to make, what to stop, and how to allocate research investment across competing priorities. People follow your direction because your judgment has been proven right.
Preferred Qualifications:
- Experience building domain-specialized models that outperform general-purpose alternatives on specific, measurable tasks β not just fine-tuned checkpoints, but models with genuine domain understanding.
- Hands-on experience with post-training methods (RLHF, DPO, reward modeling, constitutional approaches) applied to real alignment problems, not just benchmark reproduction.
- Deep experience in data curation for model development β deduplication, mixture design, quality scoring β where your data decisions measurably changed model outcomes.
- Track record of designing evaluation frameworks for enterprise or regulated-industry use cases where a wrong answer has real consequences.
- Publication record at top-tier venues with evidence of sustained research leadership and influence on the field.
- Experience taking model research from prototype to production in a commercial setting where customers depend on the output.
- Domain expertise in one or more of: energy, semiconductor, finance, aerospace, or supply chain β you understand the data, the workflows, and why off-the-shelf models fail.
Professional Attributes (Code42):
- Practice Humility: You lead with questions, not answers. You actively seek evidence that contradicts your strategy and revise publicly when warranted. You build an environment where senior researchers feel safe challenging your direction β because that's how the best decisions get made.
- Bias for Outcomes: You measure your impact by whether Articul8's models win in the market, not by the elegance of the research agenda. You make the hard calls about what to stop, what to double down on, and what to defer β and you own the results.
- Care Deeply: You treat the researchers you mentor as whole people, not output functions. You care about the quality of every model that ships under Articul8's name and intervene personally when standards are at risk. You build systems of feedback and recognition that make excellence visible.
- Dare to Do the Impossible & Embrace Scarcity: You define research bets that could change Articul8's competitive position for years. You don't let current scale limit the ambition of the model strategy. When resources are tight, you find the highest-leverage experiments and execute them with precision.
- Build a Better World: You ensure Articul8's model strategy serves not just business value but the industries and people who depend on these models for critical decisions. You hold the organization accountable for building AI that is trustworthy, auditable, and genuinely useful β because that's the only kind worth building.