Senior Linux/Systems Engineer

Join FlexAI:

FlexAI is at the forefront of revolutionizing AI computing by reengineering infrastructure at the system level. Our groundbreaking architecture, combined with sophisticated software intelligence, abstraction, and an orchestration layer, allows developers to leverage a diverse array of compute, resulting in efficient, more reliable computing at a fraction of the cost. We are seeking a skilled and experienced Senior Linux/Systems Engineer.

Founded by Brijesh Tripathi , who bring experience from Nvidia, Apple, Tesla, Intel, Lifen, and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Paris, Silicon Valley, and Bangalore, united by a shared mission: to deliver more compute with less complexity.

If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem, FlexAI is the place for you !

What we are looking for:

A Senior Linux/Systems Engineer to design, build, and operate bare-metal AI/HPC GPU clusters. You’ll own platform bring-up (UEFI/BIOS → bootloaders → OS), kernel/device enablement, low-level networking (RoCEv2/InfiniBand), GPU/accelerator stack readiness, and repeatable automation for provisioning and compliance.
This role suits someone who enjoys getting hands-dirty in firmware, kernel and PCIe, and then scaling that knowledge with Ansible/Python.

Key Responsibilities:

Platform & Boot Enablement:

Own server bring-up: UEFI/BIOS configuration, Secure Boot/TPM/Measured Boot, GRUB, PXE/iPXE flows.
Integrate and automate BMC/IPMI/Redfish workflows for out-of-band provisioning and fleet management.

OS & Kernel Engineering:

Build, customize, and harden Ubuntu images (cloud-init, Debos) and tune systemd/init for low-latency, high-throughput workloads.
Diagnose and fix kernel/user-space issues using perf, ftrace, eBPF/bpftrace; configure NUMA, IRQ affinity, cgroups/namespaces.

PCIe/Driver Enablement:

Validate PCIe topologies and features (ACS/ARI/ATS), SR-IOV, IOMMU/VFIO; bring up NIC/GPU drivers and firmware.
Root-cause device initialization and performance regressions across kernel, drivers, and userspace.

Provisioning & Automation at Scale:

Author idempotent Ansible playbooks/roles; implement Python/Pytest test harnesses for pre/post-provision validation.
Build golden images and repeatable pipelines for server provisioning, configuration drift detection, and remediation.

GPU/Accelerator & HPC Stack Readiness:

Enable NVIDIA CUDA/NCCL/GPUDirect RDMA and AMD ROCm; validate multi-GPU/multi-node performance.
Stand up and tune NCCL/UCX, MPI (OpenMPI), torchrun/PyTorch for distributed training workloads.

Containers & Build Tooling:

Build and maintain minimal, reproducible Docker images and docker-compose environments for CI and validation.
Use C/Go/Python, Make/CMake, and CI (GitHub Actions/GitLab CI) to publish and maintain Validation and automation tools.

High-Performance Networking:

Configure and tune RoCEv2 and/or InfiniBand fabrics; validate rdma-core/libibverbs paths end-to-end.
Optimize congestion control, MTU/jumbo frames, NUMA/RSS/IRQ steering for consistent throughput/latency.

Security & Compliance:

Apply CIS hardening baselines; maintain Secure Boot policy, measured boot attestations, and patch compliance.
Implement access controls and auditability across firmware, OS, and cluster automation.

What you'll need to be successful:

Educational Background:

Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.

Technical Skills:

Platform/Boot: UEFI/BIOS, GRUB, Secure Boot, PXE/iPXE, BMC/IPMI/Redfish.
OS/Kernel: Linux (Ubuntu), systemd/init, eBPF, perf/ftrace/bpftrace, cgroups, namespaces, NUMA, IRQ affinity.
Drivers/PCIe: PCIe fundamentals (ACS/ARI/ATS), SR-IOV, VFIO, IOMMU, NIC/GPU drivers.
Provisioning/Automation: Ansible, Python, Pytest, Debos, cloud-init.
Containers: Docker, docker-compose.
Build/Dev: C, Python, Go (optional), Make, CMake, CI (GitHub Actions/GitLab CI).
Networking (HPC): RoCEv2, InfiniBand, libibverbs/rdma-core, NCCL/UCX, MPI (OpenMPI).
GPU/Accel: NVIDIA (CUDA, NCCL, GPUDirect RDMA), AMD ROCm.
Security/Compliance: CIS hardening, Secure Boot, TPM/Measured Boot.

Professional Experience:

7+ years in Linux systems engineering, including kernel/userspace debugging and performance tuning.
Proven ownership of bare-metal server bring-up and fleet-scale provisioning via Ansible/Python.
Hands-on with PCIe device enablement (SR-IOV/VFIO/IOMMU) and NIC/GPU driver stacks.
Demonstrated success enabling multi-GPU/multi-node training over RoCEv2 or InfiniBand.
Track record building reproducible OS images and container artifacts for production use.

Soft Skills:

Ability to mentor peers, partner with researchers/ML engineers, and influence cross-functional roadmaps.
Clear, concise documentation habits; you turn tribal knowledge into automation and runbooks.

Preferred Qualifications:

Experience in cloud-based AI solutions and infrastructure.
Familiarity with performance benchmarking and optimization.
Knowledge of modern development practices and Agile methodologies.

What we offer:

A competitive salary and benefits package, tailored to recognize your dedication and contributions.
The opportunity to collaborate with leading experts in AI and cloud computing, learning from the best and the brightest, fostering continuous growth.
An environment that values innovation, collaboration, and mutual respect.
Support for personal and professional development, empowering you with the tools and resources to elevate your skills and leave a lasting impact.
A pivotal role in the AI revolution, shaping the technologies that power the innovations of tomorrow.

Offices :

Our teams are strategically distributed across three continents—Europe, North America, and Asia—united by a shared mission: to deliver more compute with less complexity.

Paris - HQ
San Francisco (Bay Area) - US office

Apply NOW!

You’ve seen what this role entails. Now we want to hear from you! Does this opportunity align with your aspirations? If you’re even slightly curious, we encourage you to apply – it could be the start of something extraordinary!

At FlexAI, we believe diverse teams are the most innovative teams. We’re committed to creating an inclusive environment where everyone feels valued, and we proudly offer equal opportunities regardless of gender, sexual orientation, origin, disabilities, veteran status, or any other facets of your identity that make you uniquely you.

Senior Linux/Systems Engineer

Join us ! 🚀

Join FlexAI:

Offices :

Apply NOW!

Senior Linux/Systems Engineer

Join us ! 🚀

Already working at FlexAI?