Make a great move

companies

Jobs

My job alerts

Principal Software Development Engineer

Insight Engines

Software Engineering

Bengaluru, Karnataka, India

Posted on Jan 29, 2026

Apply now

At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation.

Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive.

If you're passionate about engineering robust solutions for large-scale distributed software infrastructures—especially those spanning both control and data planes—this position offers end-to-end ownership of mission-critical components that power advanced networking capabilities and future-proof connectivity.

Why This Role is Unique:

Our SaaS is hybrid – running across public cloud and a global network of 50+ PoPs, delivering terabits of capacity and many thousands of devices installed in diverse infrastructure.

What You'll Do:

Be the Architect Behind Driving Key Platform Enhancements with Resiliency and Scale

Serve as the technical expert for automation platform design and architecture, defining technical roadmaps and standards for automation systems and runbook execution development
Drive reliability and resiliency as part of the platform evolution
Drive re-architecture & scale of various key components of the automation and remediation platform
Own Roadmap of the platform
Lead the design and development of highly available, scalable automation, remediation, and runbook orchestration services across multi-cloud and hybrid environments
Research, investigate, and define new areas of technology in ML/AI-powered operations and intelligent runbook automation to enhance existing products or identify new product directions
Drive advancements in high-performance networking capabilities, ensuring the platform remains future-ready for evolving connectivity standards and global traffic demands
Scale and optimize essential control and data plane components to support seamless expansion and robust system reliability across diverse environments
Lead the development of innovative services that enhance operational efficiency and resource utilization in complex network architectures
Continuously reduce toil in the development lifecycle through comprehensive testing, intelligent automation, and strategic platform enhancements focused on resilient connectivity
Design and architect cloud-native solutions leveraging AWS, Azure, and GCP services for automation and remediation platforms
Design and implement self-healing systems with closed-loop automation and automated runbook execution capabilities
Lead cloud infrastructure strategy and multi-cloud deployment architectures for automation platforms

Collaborate & Mentor

Work closely with SREs, QA, and engineering teams to improve reliability and performance of the product
Mentor junior engineers, fostering a culture of SDLC thinking
Partner with leadership, product management, and cross-functional teams to shape technical direction and establish engineering best practices
Promote engineering excellence across the organization
Conduct presentations internally and externally on automation architecture, runbook orchestration, cloud best practices, and operational innovation
Participate in hiring and onboarding processes for engineering talent
Uphold company's Business Code of Ethics and perform other related duties as assigned

What Makes You a Great Fit?

Deep expertise in runbook automation platforms, observability tools, and automated remediation frameworks
Expert-level proficiency with Linux systems engineering, containerization, and Kubernetes orchestration
Strong SRE/DevOps mindset with hands-on experience building resilient, scalable infrastructure
Demonstrated expertise in re-architecting and scaling large-scale distributed systems—particularly in hybrid SaaS environments that integrate both cloud and global infrastructure
Proficiency with Kubernetes, Golang, and L3-L7 networking principles
Strong commitment to deep observability, network analytics, and edge automation at scale
Understanding of AI/ML technologies for intelligent automation, anomaly detection, and predictive operations
If you love creating a highly resilient platform that scales, automating everything with runbooks and observability, and working in a hybrid cloud + networking environment, we want to talk to you!

Qualifications:

Must-Have:

Reliability and Resiliency Expertise – Strong experience in developing highly resilient and reliable systems with SRE principles
Runbook Automation & Observability Expertise – Deep hands-on experience with:
- Runbook automation platforms (Rundeck, StackStorm, Ansible Tower/AWX)
- Workflow orchestration engines (Temporal, Apache Airflow, Camunda, Prefect)
- Observability platforms (Prometheus, Grafana, OpenTelemetry, Datadog, Splunk, ELK Stack)
- Time-series databases (Prometheus, InfluxDB, TimescaleDB, VictoriaMetrics)
- Distributed tracing and log aggregation systems
- Alerting and incident response automation
- Self-healing and closed-loop automation systems
- Integration with incident management and ITSM platforms (ServiceNow, PagerDuty, Jira)
Linux & Kubernetes Mastery – Expert-level experience with:
- Linux system administration, kernel tuning, and performance optimization
- Kubernetes architecture, operators, custom controllers, and CRDs
- Container runtimes (Docker, containerd, CRI-O) and image optimization
- Helm, Kustomize, and GitOps workflows (ArgoCD, Flux)
- Service mesh technologies (Istio, Linkerd)
- Kubernetes security, RBAC, network policies, and pod security standards
SRE/DevOps Excellence – Proven expertise in:
- Site Reliability Engineering principles (SLIs, SLOs, SLAs, error budgets)
- Infrastructure as Code (Terraform, CloudFormation, ARM Templates, Ansible)
- CI/CD pipelines and automation (Jenkins, GitLab CI/CD, GitHub Actions, ArgoCD)
- Chaos engineering and resilience testing
- Capacity planning and performance engineering
- Incident management, postmortem culture, and blameless retrospectives
AI/ML for Operations – Understanding and Knowledge of:
- AI/ML concepts and their application in IT operations
- Anomaly detection techniques and predictive analytics approaches
- Intelligent alerting and automated root cause analysis
- Log analysis and incident correlation using machine learning
- Predictive scaling and resource optimization strategies
Cloud & Hybrid SaaS Experience – Hands-on experience in developing services that are cloud-native (AWS/GCP/Azure)
- AWS: EC2, Lambda, ECS/EKS, CloudWatch, CloudFormation, Step Functions, EventBridge, Systems Manager, SNS/SQS, DynamoDB, RDS, S3, IAM, VPC
- Azure: Virtual Machines, Azure Functions, AKS, Azure Monitor, ARM Templates/Bicep, Logic Apps, Event Grid, Azure Automation, Service Bus, Cosmos DB, Azure SQL, Blob Storage, Azure AD, Virtual Networks
- GCP: Compute Engine, Cloud Functions, GKE, Cloud Monitoring, Deployment Manager, Cloud Workflows, Pub/Sub, Cloud Firestore, Cloud SQL, Cloud Storage, IAM, VPC
L3-L7 Networking – Hands-on experience working across the networking stack
14+ years of software engineering experience, with 10+ years designing and implementing large-scale distributed systems
Strong coding proficiency in Python, Go, Java, or C/C++ with proven track record of leading complex software development efforts
Experience with cloud-native design patterns: serverless architectures, microservices, event-driven architectures, and container orchestration
Expertise in multi-cloud and hybrid cloud architectures, cloud migration strategies, and cloud cost optimization
Strong experience with streaming platforms (Kafka, Kinesis, Event Hubs, Pub/Sub), and event-driven architectures
Excellent analytical and debugging skills with ability to manage ambiguity
Excellent organizational agility and communication skills throughout the organization
Ability to present ideas verbally and in writing with clarity and precision

Nice-to-Have:

Advanced AI/ML for Operations – Hands-on experience with:
- ML frameworks for anomaly detection and predictive analytics (TensorFlow, PyTorch, Scikit-learn)
- AIOps techniques for intelligent alerting and root cause analysis
- MLOps pipelines and model deployment
- Natural language processing for log analysis and incident correlation
- Predictive scaling and intelligent resource optimization
Advanced Runbook & Observability Tools – Experience with additional automation and observability platforms:
- APM tools (New Relic, Dynatrace, AppDynamics)
- Configuration management tools (Chef, Puppet, SaltStack)
- ChatOps frameworks (Hubot, Slack/MS Teams integrations)
- Incident response automation platforms (Shoreline, Resolve, BigPanda)
- Network performance monitoring (ThousandEyes, Kentik)
Mentorship and Cross-Functional Collaboration – Proven ability to guide junior engineers and work effectively with SREs, QA, application developers, and network engineers on complex platform initiatives
Experience in High Availability & Disaster Recovery – Hands-on experience designing or migrating highly available systems, and implementing disaster recovery strategies across hybrid cloud infrastructures
Performance Tuning & Profiling – Skills in profiling distributed systems and optimizing for latency, throughput, and resource efficiency at scale
Experience with eBPF for advanced observability and networking
Knowledge of OpenTelemetry collector customization and extensions
Cloud certifications (AWS Solutions Architect Professional, Azure Solutions Architect Expert, GCP Professional Cloud Architect, CKA, CKAD, CKS)
Contributions to open-source observability, automation, or Kubernetes projects

Education:

Typically requires at least 18 years of related experience with a bachelor's degree, 15 years and a master's degree, or a PhD with 12 years' experience; or equivalent experience

Environment:

Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged
Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth
Team Cohesion: Join a collaborative and supportive team where you'll feel at home from day one, contributing to a positive and inspiring workplace

F5 Networks, Inc. is an equal opportunity employer and strongly supports diversity in the workplace.

The Job Description is intended to be a general representation of the responsibilities and requirements of the job. However, the description may not be all-inclusive, and responsibilities and requirements are subject to change.

Please note that F5 only contacts candidates through F5 email address (ending with @f5.com) or auto email notification from Workday (ending with f5.com or @myworkday.com).

Equal Employment Opportunity

It is the policy of F5 to provide equal employment opportunities to all employees and employment applicants without regard to unlawful considerations of race, religion, color, national origin, sex, sexual orientation, gender identity or expression, age, sensory, physical, or mental disability, marital status, veteran or military status, genetic information, or any other classification protected by applicable local, state, or federal laws. This policy applies to all aspects of employment, including, but not limited to, hiring, job assignment, compensation, promotion, benefits, training, discipline, and termination. F5 offers a variety of reasonable accommodations for candidates. Requesting an accommodation is completely voluntary. F5 will assess the need for accommodations in the application process separately from those that may be needed to perform the job. Request by contacting accommodations@f5.com.

Apply now

See more open positions at Insight Engines

Privacy policy Cookie policy