AI Stack Engineer - Features (Confirmed/Senior)
Join us ! 🚀
We usually respond within a day
Join FlexAI:Â
FlexAI is at the forefront of revolutionizing AI computing by reengineering infrastructure at the system level. Our groundbreaking architecture, combined with sophisticated software intelligence, abstraction, and an orchestration layer, allows developers to leverage a diverse array of compute, resulting in efficient, more reliable computing at a fraction of the cost. We are seeking a skilled and experienced AI Stack Engineer.
Founded by Brijesh Tripathi and Dali Kilani, who bring experience from Nvidia, Apple, Tesla, Intel, Lifen, and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Paris, Silicon Valley, and Bangalore, united by a shared mission: to deliver more compute with less complexity.
Â
Position Overview:
Â
FlexAI is seeking a talented and driven AI Stack Engineer to join our innovative team. In this role, you will be responsible for improving the reliability, scalability, and performance of our custom AI software stack, with a focus on PyTorch optimization and model deployment. The ideal candidate will have significant experience in software engineering for large-scale distributed systems, with a passion for enhancing AI infrastructure and ensuring seamless model training and deployment processes.
What you’ll do:
- Improve LLM Training Reliability and Elasticity:
Design and implement solutions to make PyTorch more elastic and resilient, enabling fault tolerance and dynamic scaling of training jobs.
Collaborate with teams to enhance PyTorch functionalities and reduce training downtime, optimizing large language model (LLM) training workflows.
- Optimize Model Packaging and Production Runtime:
Ensure seamless integration of customer models with our custom PyTorch stack.
Manage and improve the production runner, focusing on performance, scalability, and deployment processes to ensure efficient model execution.
- Develop and Maintain Internal Tools for Model Training:
Contribute to the development of internal libraries and tools to improve the training process, including implementing asynchronous operations and fault recovery mechanisms.
Maintain code quality, enforce best practices, and ensure continuous integration for production readiness.
What you’ll need to be successful:
5+ years of experience in software engineering, with a focus on runtime systems or performance optimization for large-scale distributed systems.
Strong expertise in low-level performance optimizations and systems programming (C/C++, Go, etc.), with Python experience preferred.
A Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related technical field.
Proven experience working with distributed systems, and the ability to optimize runtime environments for AI workloads.
Excellent problem-solving skills with the ability to innovate and think outside the box in a fast-paced, evolving environment.
Strong communication and collaboration skills, comfortable working cross-functionally with engineering, research, and product teams.
Bonus Points:
Experience working with PyTorch or similar deep learning frameworks.
Familiarity with AI model training pipelines, model deployment processes, or high-performance computing (HPC) environments.
Experience working in start-up environments or high-growth tech companies with an entrepreneurial mindset.
What we offer:
- A competitive salary and benefits package, tailored to recognize your dedication and contributions.
- The opportunity to collaborate with leading experts in AI and cloud computing, learning from the best and the brightest, fostering continuous growth.
- An environment that values innovation, collaboration, and mutual respect.
- Support for personal and professional development, empowering you with the tools and resources to elevate your skills and leave a lasting impact.
- A pivotal role in the AI revolution, shaping the technologies that power the innovations of tomorrow.
Offices :
Our teams are strategically distributed across three continents—Europe, North America, and Asia—united by a shared mission: to deliver more compute with less complexity.
- Paris - HQ
- San Francisco (Bay Area) - US office
- Bangalore - India office
Apply NOW!
You’ve seen what this role entails. Now we want to hear from you! Does this opportunity align with your aspirations? If you’re even slightly curious, we encourage you to apply – it could be the start of something extraordinary!
At FlexAI, we believe diverse teams are the most innovative teams. We’re committed to creating an inclusive environment where everyone feels valued, and we proudly offer equal opportunities regardless of gender, sexual orientation, origin, disabilities, veteran status, or any other facets of your identity that make you uniquely you.
- Department
- R&D SW
- Role
- AI Stack Engineer
- Locations
- Paris
- Remote status
- Hybrid
- Employment type
- Full-time
AI Stack Engineer - Features (Confirmed/Senior)
Join us ! 🚀
Loading application form
Already working at FlexAI?
Let’s recruit together and find your next colleague.