Machine Learning Operations Platform Lead
Job description
This dynamic position involves collaborating closely with the GenAI lead to develop a platform that facilitates the deployment and scaling of GenAI products. The ideal candidate will be someone who possess a robust MLOps background and have experience in creating resilient and reliable AIML platforms.
Responsibilities
You will be the central point for ML models refactoring, optimization, containerization deployment and monitoring in their quality. Your responsibilities will include:
Conduct reviews for compliance of the ML models in accordance with overall platform governance principles such as versioning, data / model lineage, code best practices and provide feedback to data scientists for potential improvements.
Develop pipelines for continuous operation, feedback and monitoring of ML models leveraging best practices from the CI/CD vertical within the MLOps domain. This can include monitoring for data drift, triggering model retraining and setting up rollbacks.
Optimise AI development environments (development, testing, production) for usability, reliability and performance.
Have a strong relationship with the infrastructure and application development team in order to understand the best method of integrating the ML model into enterprise applications (e.g., transforming resulting models into APIs).
Work with data engineers to ensure data storage (data warehouses or data lakes) and data pipelines feeding these repositories and the ML feature or data stores are working as intended.
Evaluate open-source and AI/ML platforms and tools for feasibility of usage and integration from an infrastructure perspective. This also involves staying updated about the newest developments, patches and upgrades to the ML platforms in use by the data science teams.
Requirements
Proficiency in Python used both for ML and automation tasks.
Working knowledge of of Bash and Unix/Linux command-line toolkit.
Hands on experience developing CI/CD pipelines orchestration by Jenkins, GitLab CI, GitHub Actions or similar tools.
Knowledge of OpenShift / Kubernetes.
Demonstrable understanding of ML libraries such as Panda, NumPy, H2O, or TensorFlow.
Knowledge in the operationalisation of Data Science projects (MLOps) using at least one of the popular frameworks or platforms (e.g., Kubeflow, AWS Sagemaker, Google AI Platform, Azure Machine Learning, DataRobot, Dataiku, H2O, or DKube).
Knowledge of Distributed Data Processing framework, such as Spark or Dask.
Knowledge of Workflow Orchestrator, such as Airflow or Ctrl-M.
Knowledge of Logging and Monitoring tools, such as Splunk and Geneos.
Experience in defining the processes, standards, frameworks, prototypes and toolsets in support of AI and ML development, monitoring, testing and operationalisation.
Experience in ML operationalisation and orchestration (MLOps) tools, techniques and platforms. This includes scaling delivery of models, managing and governing ML Models and managing and scaling AI platforms.
Knowledge of cloud platforms (e.g. AWS, GCP) would be an advantage.
Benefits
22 days annual leave plus medical insurance
Salary: SGD 170K - SGD 210K per year (negotiable)
If you are interested in this job and would like to have a discussion, please contact joel@tenten-partners.com.
Equal Opportunity Statement
TENTEN Partners is an equal opportunity firm and is committed to providing equal employment opportunities to all qualified individuals without regard to race, colour, religion, sex, sexual orientation, gender identity, national origin, age, disability, or any other protected characteristic as outlined by applicable.