IT DevOps Director for an AI and Machine Learning Company


Purpose

We are looking for a DevOps Director who maintains, grows, and optimizes the development/production infrastructure. In this role, you work closely with the product development team and enable them to build/ship faster and more securely.


Responsibilities

  1. Provide for and anticipate horizontal scaling of environments and anticipate infrastructure growth as the business grows
  2. Provide senior networking expertise to the team
  3. Support the large infrastructure running Linux in Cloud (AWS, Azure etc.)
  4. Work collaboratively to provide senior infrastructure guidance to the team for short-term issues and long-term planning and vision
  5. Provide Tactical Production Support to the operations team and customer
  6. Plan, build, maintain and evolve our infrastructure
  7. Improve the continuous testing and deployment process
  8. Work heavily with Kubernetes and Docker for configuration management
  9. Proficiency in cloud technologies, serverless, microservices and kubernetes
  10. Work closely with technology team, to understand and anticipate the agile infrastructure needs of the development and testing processes; create an environment where dev and QA work can proceed securely, efficiently, and flexibly that will help unlock value AI financial microservices
  11. Develop platform for the monitoring for all our services and AI bots interaction/feedback data
  12. Scale our infrastructure with high availability in mind
  13. Share your experience about building a reliable and immutable infrastructure by working with team on documenting processes and consulting your colleagues


Requirements

  1. Experience in Software Engineering, DevOps/Automation, designing reusable serverless platform, Web Infrastructure and Open Source with docker
  2. Proficiency in system administration (Linux, BSD etc.)
  3. A good understanding of UNIX/Linux platforms (RedHat, Debian, Ubuntu, FreeBSD), SQL/NoSQL databases (MySQL, Redis), GraphQL, protocols (TCP/IP, HTTP) and containers (Docker, Kubernetes, OpenShift, Aks etc.)
  4. Understanding of AI frameworks and deployment models (keras, tensorflow, pytorch, servings, NVIDIA etc.)
  5. Experience supporting cloud and pub/sub platform technologies (S3, MySQL, MongoDB, Kafka, etc.)
  6. Experience supporting serverless container environments (Athena, OpenFaas, Fission etc.)
  7. Experience with agile/scrum methodologies and continuous deployment
  8. Experience with data microservice and catalog management platform
  9. Experience with cloud management and data lake technologies (Glue, Athena, S3, ElasticSearch EMR, ELK etc.)