Job Summary
As Site Reliability/DevOps Engineer, you will introduce processes, tools, and methodologies to balance needs throughout the software development life cycle, from coding and deployment to maintenance and updates.
Responsibilities
- Focus on improving the scalability, robustness, and automation of our tools and processes, as well as expanding capabilities to support new features.
- Responsible for the deployment and optimization of our production systems.
- Ensure our support tools work and enable continuous integration, testing, and production deployment.
- Implementing strategic solutions to ensure our systems stay on the bleeding edge of technology.
- Contributing to the technology stack, covering web development, API development, DB communication/ handshake, performance, and security measures.
- Make sure systems are operational, visible, and designed for auto-recovery in case of disaster.
- Responsible for end-to-end development and production system operations including system maintenance, monitoring (application, system, log), notification, automation, and backend operation.
- Visibility and Improvement in system performance and stability.
- Propose new technologies and tools to improve development, testing, and production operations.
- Work closely with various teams across functions including the project team, product team, tech team, and QA team.
- Build, maintain, and scale infrastructure for Production, QA, and Dev environments
- Develop and maintain Continuous Integration/Continuous Delivery systems.
- Deploy automation solutions in a public cloud environment such as AWS.
- Write and maintain infrastructure documentation.
- Experience delivering high uptime Software-as-a-Service applications.
Qualifications and Skills
- +5 years of experience in the development and operation fields and one or several formal qualifications.
- bachelor's degree in Engineering IT Computer Science or any related field.
- Experience in handling high-traffic production systems, troubleshooting, automation, and regular operation.
- Familiarity with web development technology and exposure to their build/ deployment.
- Experience with Continuous Integration/ Deployment mechanisms using Jenkins, Nexus, Docker Registry, Gitlab, Ansible/Terraform.
- Good Knowledge of SaaS, Cloud Infrastructure, and other enterprise-related technologies (AWS).
- Experience with AWS and Google Cloud.
- Experience with Container orchestration (Kubernetes), and strong scripting skills (Shell scripting, Python, etc).
- Strong knowledge of Unix-based systems.
- Deployment and configuration tools (Ansible, Chef, Puppet, etc).
- Willingness to learn modern-day tools and processes.
- Good problem-solving, and troubleshooting skills.
- Creativity and accountability.
$ads={1}