Job offer

< Back

Site Reliability Engineer

Salary:
1400 - 1600 pln/md
Type of employment:
B2B
Date:
2025.06.09
Location:
Kraków
Offer
  • We are open to the employment form according to your preferences 
  • Work with experienced and engaged team, willing to learn, share knowledge and open for growth and new ideas 
  • Hybrid working system (2x/week from the office in Kraków)
  • Mindbox is a dynamically growing IT company, but still not a large one – everybody can have a real impact on where we are going next
  • We invest in developing skills and abilities of our employees
  • We have attractive benefits and provide all the tools required for work f.e. computer
  • Interpolska Health Care, Multisport, Warta Insurance, training platform (Sages) 

Creating an inspiring place to thrive for the talented, we use their expertise and courage to introduce the technology of the future into your business. - This is the foundation of Mindbox and the goal of our business and technology journey. We operate and develop in four areas:

🤖 Autonomous Enterprise - automation of business processes using RPA, OCR, and AI.

🌐Business Managment Systems ERP - we implement, adapt, optimize, and maintain flexible, safe, and open ERP of production and distribution companies worldwide.

🤝Talent Network - we provide access to the best specialists.

☁️ Modern Architecture - we build integrated, sustainable, and open CI / CD environments based on containers enabling safe and more frequent delivery of proven changes in the application code.

We treat technology as a tool to achieve a goal. Thanks to our consultants' reliability and proactive approach, initial projects usually become long-term cooperation. For over 16 years, it has provided various services to support clients in digital transformation.

#LI-Hybrid

 

Tasks
  • Manage application support operations, focusing on resiliency, availability, and monitoring system health and performance.
  • Coordinate resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes.
  • Investigate, triage, and resolve production incidents with a focus on technical signals and root cause analysis.
  • Document post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base.
  • Actively participate in the service management community, engaging in Incident Management, Problem Management,
  • Define and deliver tactical and strategic service improvements across the technical and process landscape.
  • Apply SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability.
  • Develop observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety.
Requirements
  • 5+ years of experience in developing, supporting, distributed systems written in Java.
  • Experience of observability tool implementation like Grafana, Prometheus, Loki, Open telemetry etc. 
  • A methodical approach to troubleshooting and problem-solving skills.
  • Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation
  • Experience implementing and managing Logging, Monitoring and Alerting framework for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki or any other similar tools,
  • Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g. Control-m or similar tool
  • Ability to lead technical conversations with various technical support groups.
  • Excellent communication skills and experience working in Agile methodology.