We are seeking a Site Reliability Engineer to help with the building and operation of our Real Time Analytics Platform. The operations team leverages some of the most cutting edge technology to simplify an otherwise complex environment.
Candidates should have a passion for building infrastructure for high-performance, "Big Data" systems. In this role, you will leverage open-source tools like Zookeeper, Hadoop, HBase, Hive, and Couchbase.
Candidates will at least be familiar with the technologies we are using but may not have had the opportunity to acquire deep experience in previous job settings. However, you should have a true passion for systems engineering that is apparent in your past work.
Responsibilities:
- Build tools to ease provisioning and scaling of TubeMogul Analytics infrastructure
- Monitor and improve service performance and stability
- Continuously extend and improve infrastructure components to handle growth
- Investigate failures and offer suggestions for future improvement
- Work closely with development teams to ensure that platforms are designed with "operability" in mind
- Assist our software engineering team to ensure proper monitoring and metrics are being built into the applications before going to production