The service mesh era: Using Istio and Stackdriver to build an SRE service

The service mesh era: Using Istio and Stackdriver to build an SRE service(cloud.google.com)

85 points by crcsmnky 7 years ago | 30 comments

I am still praying that some day soon AWS will announce that they are joining Opencensus (along with Google, MS, Datadog, Prometheus)[1] in the hopes that we can move towards standard tooling for observability.

They also seriously need to give CloudWatch a UI/UX overhaul.

1. https://opencensus.io/introduction/#partners-contributors

sciurus 7 years ago | |

OpenCensus seems like it's really a Google-only project. OpenTracing and OpenMetrics appear to have more community and vendor engagement.

E.G. Datadog is basing their newer tracing libraries on OpenTracing, and Prometheus devs are behind OpenMetrics.

manigandham 7 years ago | | |

That's exactly the fragmentation that we don't need. OpenCensus has backing from Microsoft too and is a designed as a single API and library to support both tracing and metrics using the same context.

OpenTracing and OpenMetrics are more like API specs with libraries left to others to implement, and they're never really used standalone for them to be separate projects. The best option for the industry would be to fold OT and OM into OC and make a single stack, and hopefully include structured logging as well.

50cpermetric 7 years ago | |

Never happen. That might further reveal how absurd their pricing is.

andriosr 7 years ago |

Zero instrumentation visibility with the service mesh, but the demo app is instrumented. I’ve seen this point being sold everywhere for service mesh, but the vanilla tracing data given by Istio or others is not that useful by itself. There is no magic, you need to instrument your code

williamallthing 7 years ago | |

Distributed tracing requires instrumentation / app modification (for header forwarding if nothing else), but metrics don't.

E.g. Linkerd gives you service "golden metrics" (success rate, latency distribution, request volumes) without any app changes. It can draw the service topology too, since it's observing everything in realtime. https://linkerd.io/2/features/telemetry/

kcmastrpc 7 years ago | |

There is magic, (caveat: I work here), but Instana (https://instana.com) will instrument most major languages and frameworks auto-magically. As in, I don't have to declare a dependency, change a configuration, or anything - our agent finds processes running on the system and bootstraps the libraries while running and monkey patches a huge number of standard libraries and frameworks with no restarts. (don't believe me, give the trial a shot)

There is literally nothing else quite like it in the market, and it gives you distributed tracing, automatic metric collection, and pre-defined alerts for a reasonable price.

https://docs.instana.io/core_concepts/tracing/#supported-tec...

Diggsey 7 years ago | | |

That sounds horrific for whoever is going to be supporting that system...

The last thing I would want in a production environment is to have some 3rd party software monkey-patching the code at runtime.

What happens when: - a bug only occurs (due to timing or some other extremely subtle issue) when this monkey-patching is applied. - there's a bug in the monkey-patching itself (sounds like a fun debugging session!) - a library is accidentally monkey-patched with a slightly different version, or falsely detected as a known library (maybe it is a fork)

Give me statically compiled, reproducible, dependency free, bit-for-bit identical with what has been thoroughly tested in CI, musl binaries any day. That's how you avoid getting woken up at 4am.

This kind of magic should happen at compile time, if at all.

pmlnr 7 years ago |

SRE service. Not any kind of service - an SRE service!

Can we please stop the buzzword train?

bogomipz 7 years ago |

I would interested in anyones feedback of embracing and rolling out a service mesh/Istio in a non-GCP environment.

I apologize if this is a naive question but how come this wasn't included as part of the Kubernetes project given that it has the same Google origins?

twblalock 7 years ago | |

The Istio project is not supposed to be tied to Kubernetes. It is supposed to be a general-purpose service mesh.

That being said, I have been looking for a while and I can't find anyone who uses it in production on a platform other than Kubernetes.

barbecue_sauce 7 years ago | | |

Also worth noting that Istio is not part of the CNCF while Linkerd is.

pm90 7 years ago | |

So kubernetes seems to be one of Googles biggest efforts at really building a healthy open source project for Borg v2. Perhaps they wanted it to be mostly community driven? It’s for this reason that they collaborated with others to introduce ISTIO so the community doesn’t feel like Google is taking over or whatever fears a lot of OSS folks have of the company.

jcims 7 years ago | |

Google's a big place.

jcims 7 years ago |

n00b question, I always see service meshes used in the context of containers and mostly with kube. Would they work with more monolithic/traditional n-tier architecture deployed directly on host OS as well? Or maybe put another way, are there likely to be pain points that don't exist in containerized architectures?

twblalock 7 years ago | |

The main pain point at the moment is that meshes were written for containerized environments first, and attempts to extend full functionality to other environments are pretty immature at the moment.

Meshes are a lot more than just sidecar proxying -- they are what make sidecar proxying manageable, and they add a lot of other features like authentication, network policies, various other traffic control policies, service discovery, etc. They are an attempt to do for service-to-service communication what Kubernetes has done for container deployment -- make it abstract and declarative, with configurations that are independent from the underlying implementation.

The underlying implementation that works right now is the Kubernetes API and etcd, and alternate implementations need to be provided for those features to work well outside of Kubernetes. I think it will happen sometime in the next few years.

orthoxerox 7 years ago | |

Meshes are there to abstract away stuff that is much more manageable in a monolith. If you have 100 microservices implemented using 5 different platforms you need libraries and programming discipline to implement retries with backoffs, circuit breakers, health checks, tracing, service discovery and other similar stuff in every single one of them.

In a monolith you need to implement some of this stuff only once and you don't need a lot of it at all because you are not making remote procedure calls.

pm90 7 years ago | |

If your monolith interacts with many other monoliths, service meshes might be useful. If not, maybe not so much.