- System Administration: Manage and maintain server environments, ensuring high availability and performance. Troubleshoot and resolve system issues promptly.
- Coding and Automation: Develop and maintain automation scripts and tools in Python to improve operational efficiency and reduce manual intervention.
- UNIX Scripting: Write and optimize UNIX shell scripts for monitoring, alerting, and automation of routine tasks.
- Collaboration: Work closely with development teams to implement reliable systems and applications, providing insights on best practices for reliability and performance.
- Incident Response: Participate in on-call rotations and respond to incidents, performing root cause analysis and implementing preventive measures.
- Documentation: Create and maintain documentation for system architecture, processes, and troubleshooting procedures.
- Continuous Improvement: Identify opportunities for process improvement and implement solutions to enhance system reliability and performance.
- Bachelor’s degree in Computer Science, Information Technology, or a related field.
- Proven experience in system administration and managing Linux/UNIX environments.
- Strong coding skills in Python and experience with scripting languages.
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
- Excellent problem-solving skills and ability to troubleshoot complex systems.
- Strong communication and collaboration skills.