Skip to content

SRE (LATAM)

  • Remote
    • Buenos Aires, Buenos Aires, Argentina
    • Bogotá, Distrito Capital de Bogotá, Colombia
    • São Paulo, São Paulo, Brazil
    • Buenos Aires, Buenos Aires, Argentina
    • Colombia, Distrito Capital de Bogotá, Colombia
    • Lima, Lima, Peru
    • Santiago, Región Metropolitana de Santiago, Chile
    • São Paulo, São Paulo, Brazil
    +7 more
  • Product

We are looking for a motivated and detail-oriented SRE to join our Infrastructure team. You will focus on incident response, system monitoring, and maintaining the reliability of our services.

Job description


Site Reliability Engineer

About RebelMouse

RebelMouse is the always-modern SaaS CMS where more than 100 enterprise brands and media companies grow their digital audience. Websites running on RebelMouse serve more than half a billion page views per month thanks to powerful tools and incredible distribution across search and social. We blend technology and strategy together to move the needle where it matters most to increase traffic, loyalty, and revenue.

Our People

Our fully distributed team lives in 33 countries around the world.. Led by Andrea Breanna, our Mexican-American, gender-fluid founder and CEO, we are a very safe, positive, and loving environment where diversity matters. We enjoy interesting tasks and strong challenges, value a sense of humor, and strive for work-life balance.

Job Summary

We are looking for a motivated and detail-oriented Site Reliability Engineer (SRE) to join our Infrastructure team. In this role, you will focus on incident response, system monitoring, and maintaining the reliability of our services. Over time, you will have the opportunity to take on broader responsibilities within the SRE function. We are seeking someone who is passionate about infrastructure, eager to learn, and ready to grow by supporting and improving the stability and performance of our platform.

Key Responsibilities:

  • Assist with incident investigation and root cause analysis

  • Design and implement preventive measures based on incident patterns

  • Create and update runbooks and documentation for operational procedures

  • Develop automation to prevent recurring incidents

  • Monitor service health and implement proactive improvements

  • Collaborate with existing SRE team members to enhance system reliability

  • Identify and address technical debt related to infrastructure stability

  • Help reduce alert noise by refining monitoring thresholds and rules

Growth Opportunities

  • Develop expertise in cloud infrastructure management

  • Learn advanced Kubernetes orchestration

  • Gain experience with performance optimization

  • Contribute to automation and tooling development

  • Participate in system architecture discussions

Benefits Package

  • Remote work forever

  • Monthly wellness subsidy

  • Flexible work hours

  • Flexible paid time off (PTO) with 12 national holidays and 20 days of vacation per year, as well as paid sick days and personal celebrations days : )


RebelMouse is committed to providing a diverse work environment. We appreciate the unique competencies that each person brings to the company, and we provide equal employment opportunity to all applicants and employees without regard to race, color, religion, age, sex, sexual orientation, gender identity/expression, protected veteran status, or disability status.



Job requirements

Technical Environment

You'll be working with a hybrid infrastructure including:

  • AWS services (EC2, EKS, RDS, ElastiCache, DocumentDB, OpenSearch)

  • Kubernetes for production applications

  • Multiple database technologies (MongoDB, Redis, Memcached, MySQL, PostgreSQL)

  • Monitoring systems (ELK Stack, Prometheus, Grafana, ClickHouse, OpenTelemetry)

Qualifications and Skills:

  • At least 2 years of experience in IT operations, DevOps, or related field

  • Basic knowledge of AWS cloud services (EC2, EKS, RDS)

  • Familiarity with Kubernetes and container orchestration

  • Experience with at least one database technology (MongoDB, Redis, MySQL, or PostgreSQL)

  • Understanding of monitoring systems (ELK, Prometheus, Grafana)

  • Experience with Linux systems administration

  • Basic scripting skills (Bash, Python)

  • Problem-solving mindset and ability to work under pressure

  • Good written and verbal communication skills in English

Nice to Have

  • Experience with OpenTelemetry or distributed tracing

  • Knowledge of ClickHouse or time-series databases

  • Experience with infrastructure-as-code tools (Terraform, Ansible)

  • Understanding of CI/CD pipelines

  • Previous experience in incident management

  • Familiarity with self-hosted and managed database environments

or