Fixing The ClickHouse Node Failure On Distributed Systems - A How-To Guide

Written by Instana | Published 2020/10/04
Tech Story Tags: sre | devops | distributed-systems | observability | monitoring | application-performance | cicd | good-company

TLDR Part One: ClickHouse Failures, by Marcel Birkner. We had data node failures, had to do root-cause analysis, fix issues and find ways to prevent the same problem from occurring again. We were able to quickly fix the broken ClickHouse node and have the cluster back in a stable state in less than 15 minutes. In part 2 of this mini blog series we will cover the problems we encountered with one of our Cassandra clusters. If you want to learn more on how we utilize ClickHouse check out this guest webinar Yoann Buch and myself.via the TL;DR App

no story

Written by Instana | www.instana.com
Published by HackerNoon on 2020/10/04