IT DevOps Director for an AI and Machine Learning Company
Purpose
We are looking for a DevOps Director who maintains, grows, and optimizes the development/production infrastructure. In this role, you work closely with the product development team and enable them to build/ship faster and more securely.
Responsibilities
- Provide for and anticipate horizontal scaling of environments and anticipate infrastructure growth as the business grows
- Provide senior networking expertise to the team
- Support the large infrastructure running Linux in Cloud (AWS, Azure etc.)
- Work collaboratively to provide senior infrastructure guidance to the team for short-term issues and long-term planning and vision
- Provide Tactical Production Support to the operations team and customer
- Plan, build, maintain and evolve our infrastructure
- Improve the continuous testing and deployment process
- Work heavily with Kubernetes and Docker for configuration management
- Proficiency in cloud technologies, serverless, microservices and kubernetes
- Work closely with technology team, to understand and anticipate the agile infrastructure needs of the development and testing processes; create an environment where dev and QA work can proceed securely, efficiently, and flexibly that will help unlock value AI financial microservices
- Develop platform for the monitoring for all our services and AI bots interaction/feedback data
- Scale our infrastructure with high availability in mind
- Share your experience about building a reliable and immutable infrastructure by working with team on documenting processes and consulting your colleagues
Requirements
- Experience in Software Engineering, DevOps/Automation, designing reusable serverless platform, Web Infrastructure and Open Source with docker
- Proficiency in system administration (Linux, BSD etc.)
- A good understanding of UNIX/Linux platforms (RedHat, Debian, Ubuntu, FreeBSD), SQL/NoSQL databases (MySQL, Redis), GraphQL, protocols (TCP/IP, HTTP) and containers (Docker, Kubernetes, OpenShift, Aks etc.)
- Understanding of AI frameworks and deployment models (keras, tensorflow, pytorch, servings, NVIDIA etc.)
- Experience supporting cloud and pub/sub platform technologies (S3, MySQL, MongoDB, Kafka, etc.)
- Experience supporting serverless container environments (Athena, OpenFaas, Fission etc.)
- Experience with agile/scrum methodologies and continuous deployment
- Experience with data microservice and catalog management platform
- Experience with cloud management and data lake technologies (Glue, Athena, S3, ElasticSearch EMR, ELK etc.)