Platform Operations Engineer II

About the Team

We are building a new-generation Platform Operations team that reimagines the traditional CloudOps/SRE model. Instead of siloed specialists, every engineer on this team is expected to operate across Cloud Infrastructure, Databases, Networking, Operating Systems, and Data Pipelines—powered by Generative AI as a force multiplier.

While you will develop deeper expertise in chosen domains, the baseline expectation is full-stack operational capability, AI-first problem solving, and the ability to mentor junior engineers joining the team.

Role Summary

As a Platform Operations Engineer II, you will be a key contributor responsible for the reliability, performance, and security of customer cloud infrastructure. You bring hands-on experience across multiple technology domains and are comfortable leading incident response, driving automation, and integrating AI into operational workflows. You will help define processes, build runbooks, and mentor entry-level engineers as we scale the team.

Key Responsibilities

Own end-to-end incident management for customer cloud environments across GCP, Azure, and/or AWS—from detection through resolution and post-mortem.
Lead troubleshooting across the full stack: cloud services, compute/OS, networking, databases, and application-layer dependencies.
Design and implement automation for recurring operational tasks using scripting (Python, Bash, Go), configuration languages (YAML), and Infrastructure-as-Code (Terraform, Pulumi, or CloudFormation).
Perform advanced database operations: performance tuning, replication management, migration planning, and disaster recovery testing across RDBMS and NoSQL systems.
Configure and troubleshoot complex networking setups: hybrid connectivity, VPC peering, transit gateways, WAFs, and DDoS mitigation.
Manage OS hardening, patch management strategies, and security compliance across Linux and Windows fleets.
Monitor and troubleshoot data pipelines and ETL/ELT workflows on platforms such as Databricks and Snowflake; collaborate with data engineering teams on reliability and performance improvements.
Integrate Generative AI into daily operations using platforms such as Google Gemini, Anthropic Claude, OpenAI, and open-source LLMs: build AI-assisted runbooks, use LLM-powered diagnostics, and evaluate new AI tools for the team.
Define and track SLIs/SLOs for customer environments; produce capacity plans and reliability reports.
Mentor Platform Operations Engineer I team members; conduct knowledge-sharing sessions and contribute to team training programs.
Participate in architecture reviews and provide operational readiness assessments for new customer deployments.
Drive continuous improvement by identifying patterns in incidents and proposing systemic fixes.

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field.
2–5 years of hands-on experience in Cloud Operations, SRE, DevOps, or Infrastructure Engineering roles.
Solid working knowledge of at least one major cloud platform (GCP, Azure, or AWS) with practical experience in compute, storage, networking, IAM, and managed services.
Proficiency in Linux system administration; working knowledge of Windows Server is a plus.
Strong networking skills: TCP/IP, DNS, load balancing, VPNs, firewall rules, and network troubleshooting using tools like tcpdump, traceroute, and Wireshark.
Hands-on experience with relational databases (PostgreSQL, MySQL, or SQL Server) including query optimisation, index management, and backup/restore workflows.
Familiarity with NoSQL databases (MongoDB, DynamoDB, Redis, or Cassandra) in production environments.
Working knowledge of data engineering fundamentals: data pipeline architectures, data platforms (Databricks, Snowflake), scheduling tools (Airflow, Step Functions), and common data formats and stores.
Practical understanding of AI/ML concepts and demonstrated ability to use Generative AI tools (Google Gemini, Anthropic Claude, OpenAI, open-source LLMs) to accelerate operational work.
Proficiency in at least one scripting/programming language (Python, Bash, or Go) and configuration languages (YAML), with experience writing automation and tooling.
Experience with DevOps practices, Infrastructure-as-Code (Terraform preferred), and CI/CD pipelines.
Excellent communication skills: ability to lead incident bridges, write clear post-mortems, and present operational reports to stakeholders.

Preferred Qualifications

Associate cloud certification (e.g., GCP Associate Cloud Engineer, AZ-104, AWS Solutions Architect Associate).
Experience with container orchestration (Kubernetes, ECS/EKS) and service mesh technologies.
Familiarity with observability platforms (Datadog, New Relic, Grafana + Prometheus stack) and log management (ELK, Splunk).
Exposure to building or consuming APIs for operational automation (REST, GraphQL).
Experience with security and compliance frameworks (SOC 2, ISO 27001, CIS Benchmarks).
Track record of building internal tools, ChatOps integrations, or AI-powered automation workflows.
Prior experience mentoring junior engineers or leading small project teams.

What We Offer

An opportunity to build a team from the ground up and shape its culture, tooling, and operational philosophy.
Clear growth path toward specialisation (Cloud Architect, DBA Lead, Network Lead, or AI Operations Lead) or people management.
Hands-on access to cutting-edge AI-augmented infrastructure operations.
Exposure to diverse, complex customer environments across industries.
Investment in certifications, conference attendance, and continuous learning.

Role Snapshot

LevelEngineer II
Experience2–5 Years
LocationJaipur, India
DepartmentPlatform Operations
TeamAI-Native Infra Ops
EmploymentFull-Time

Tech You'll Own

GCP Azure AWS Terraform Kubernetes Python Go PostgreSQL MySQL MongoDB Redis Databricks Snowflake Airflow Datadog Prometheus Gemini Claude OpenAI

Ready to lead with us?

We review every application. Share your story and we'll get back to you within a week.

Apply for this Role →