The lack of a billing API and the lack of centralized management are really painful if you're trying to adopt it across an organization.
There have been some recent introductions of account and key management that help too.
Disclaimer: I work on GCE.
I had a very simple question about billing (why was my bill higher than it seemed like it should be). Each reply would take a week and they'd often consist of copy and paste messages asking me to enter information I had already supplied or requests that I take screenshots of my console (all information they should readily have available). Then right at the end they swapped out who I was talking to for someone else and asked me to look up more information they already had and just ignored my question in the last email which would mean another week before I got an answer.
I had, luckily, experimented with the configuration and figured out what was wrong. The default instance class in reality is one higher (F2) than the documentation says it is ("If you do not specify a class, F1 is assigned by default."). Nowhere on the Console does it list what instance class is being used (which would have made the problem obvious) so there was really no way of knowing this without just guessing what the problem was. They never did answer my question "What is the default instance class?" (instead just abruptly ended the support ticket after proposed my theory about what was wrong).
Then I started getting emails about a billing account being past due. It was an old billing account from before I moved to High Replication (I have no idea how I ended up with two billing accounts...it was during the dark time when the console was even worse than it is now). That billing account was assigned to no projects and had no oustanding balance. I jumped in and just deleted the unused billing account. Then a few days later they sent a scary email saying that the billing account had been terminated (even though I had deleted it) which made my scramble to make sure they didn't close my in-use billing account out of nowhere (they hadn't thankfully).
None of this has left me with any confidence in Google's Cloud offerings.
I plan to migrate off GAE as soon as I can rewrite the app (luckily it's not very big).
One big benefit of Bigtable is its scalability. To scale up, you turn the 'scale' knob. By contrast, Cassandra and Hbase are headaches to scale (Apple has acquired Cassandra companies to aid in operation and scale).
Here's a couple of guys from Sungard, who scaled to about 3,000,000 writes per second with a couple weekends' worth of effort (something only few beyond the likes of Facebook, Netflix, and Apple can achieve) https://cloud.google.com/bigtable/pdf/SunGardCATCaseStudy.pd...
> "At the same time, analysts say, the company’s offerings in cloud development services — computing, storage, data analytics and others — are already comparable to Amazon’s."
AWS is far far ahead of GC and it is in no way comparable. Plus the ecosystem around AWS has evolved and is much more stable. There are a lot of articles explaining "how to fix X or how to do Y" with AWS than with GC.
I also don't think Google will ever have the level of customer obsession that Amazon has. Your account got hacked? No worries, AWS will waive the fee, but I honestly don't think Google will ever do that.
Google is a technology company and might outrun Amazon in terms of technical superiority, but I don't think they can simply outrun Amazon in cloud business.
> What is known about her life is that she grew up in Annapolis, on the coast of Maryland, in a house on the shores of the Chesapeake Bay. Her father was an engineer and her mother a teacher. It was on the north-eastern seaboard that she developed a passion for water sports, especially sailing and later windsurfing. She helped to organise the first windsurfing world championship in 1974 and two years later won the women's national double-handed dinghy championship.
> Her love of the sea influenced her choice of college education after she studied mechanical engineering at the University of Vermont. She moved to MIT to study naval architecture before a brief spell working for an oil consultancy based in San Francisco. She left that job relatively quickly to go to Hawaii to design windsurfing equipment, but returned to the US a few years later to study computer science at Berkeley. She worked for a succession of Silicon Valley stalwarts: Sybase, Silicon Graphics and Tandem. But her first big break came with the founding of her own media streaming business, VXtreme, in the early days of the dotcom boom. It was sold for a rumoured $75m in 1997.
It's just the first roadblock.
I wonder how effective the Microsoft style API lockin strategy will be for aws. My personal guess is very effective.
Depending on how much you're storing, the one time hit to move may make sense given our ~25% lower cost per byte (and as you mention way better GCE pricing).
By the way, it is often really hard to see just how much other people put into their projects and so from the outside their success does not seem very justified (hence the phenomena of ten year overnight success). I find myself making this mistake as well, but it is useful to remind yourself that it is the wrong way to think.
I'll check it out again, really the frustrating management aspect was the lack of org oversight over multiple projects started within your domain.
In addition to free egress within GCP regions, we also recently announced reduced egress pricing for major CDN partners: https://cloud.google.com/interconnect/cdn-interconnect .
So how much data do you store and serve? ;)
It did take a lot more work than "a couple weekends" though :).
It run's on all major cloud platforms, VM environments and bare metal. Supported by every major player except Amazon (for obvious reasons, why would they want to support something that makes it easy to migrate away from them?)
AWS, Microsoft, and Google.
When AWS first started, it was EC2 and S3, so the model was about VM without worrying the bare metal. But as the platform continues to grow to challenge its competitors, the platform will begin to add more services which are only available and are proprietary to the its own platform.
Our lack of IAM is beyond painful. We're sorry. We're fixing it.
PD has had snapshots since Day 1; they're differential, fast and we even encourage people to use them for super-fast "rsync"!
With Lambda, its the whole ecosystem around it which makes it better than App Engine. A file changes in S3 and you want to do something? Lambda, in a few simple lines of code.
Elastic Container Service - "Why the heck is it Elastic!?" was my reaction the first time I read the term.
I realise this isn't any better :-)
(Disclaimer: I work on GCP)
https://engineering.linkedin.com/distributed-systems/log-wha...
Yes. Google's strategy with Kubernetes is to commoditize the cloud - making them all functionally interchangeable. Write to Kubernetes, and your app runs on AWS, GCE, Azure, etc...
They are betting that they can deliver raw CPU cycles, network bandwidth, lower latency etc. - better/faster/cheaper than their competitors.
> When you create a subscription, the system establishes a sync point. That is, your subscriber is guaranteed to receive any message published after this point.
[1] https://cloud.google.com/pubsub/subscriber
With Kafka or Kinesis, I can write events to a stream/topic completely independently of any consumer. I can then bring as many consumers online as I want, and they can start processing from the beginning of my stream if they want. If one of my consumers has a bug in it, I can ask it to go back and start again. That's what I mean by an immutable stream in Kafka or Kinesis.
If I understand your point correctly, the only expectation we haven't matched is the ability to "go back and start again". We hear you.
1. Can you direct the consumer to a point in stream? (ideally time based i.e messages from 16 Nov UTC)
2. Can old events be auto removed defined by rules?
1. each group id represents a point in the stream that a consumer is processing off of. You could technically have multiple processes consuming off of a single group id.
2. there was a configuration on time to keep things there as well as space if I remember correctly, but basically, there has to be. There's a pretty hard limit on what all you can store on disk.
edit: changed consumer id to group id. If you want more info, feel free to ping me about the ecosystem
Let me see if I'm understanding the criticism: when creating a consumer, the sync point of a new consumer really should start from the very beginning of the topic, at a predictable explicit start point, rather than at the current end of the topic. This makes a lot of sense, and yes, there is a disconnect between the models. We think the capabilities you are talking about are great and those use cases are important. All I can say is keep your eyes open.
We went with defaults from Google's internal use of Pub/Sub, which is older than the public release of Kinesis and Kafka. Internal use involves an approach where topics and consumers are very long-lived. Topics are high throughput, in terms of bytes published per unit time. Retaining all messages and starting consumers from the very beginning wasn't a sensible default; our focus was more centered on making sure that, once topics and consumers were set up, consumers could keep up over time.
One example use case to help illustrate this thinking is doing real-time sentiment analysis on tweets: https://www.youtube.com/watch?v=O3mfuc-syTI
In the work described by that video, they were essentially publishing tweets in real time into a Cloud Pub/Sub topic, thus making an "all tweets on Twitter in realtime" topic. This is a great example of a topic where producers and consumers are completely decoupled from each other. It doesn't necessarily make sense to retain all tweets forever by default (although there certainly are use cases for that). There are plenty of use cases where a consumer might want to say "ok, please start retaining all tweets made from here on out" rather than starting from a specific tweet.
> when creating a consumer, the sync point of a new consumer really should start from the very beginning of the topic, at a predictable explicit start point, rather than at the current end of the topic
I'll talk about Kinesis because that's the technology we use more at Snowplow. When creating a Kinesis consumer, I can specify whether I want to start reading from a) TRIM_HORIZON (which is the earliest events in the stream which haven't yet been expired aka "trimmed"), b) LATEST which is the Cloud Pub/Sub capability, c) AT_SEQUENCE_NUMBER {x} which means from the event in the stream with the given offset ID or d) AFTER_SEQUENCE_NUMBER {x} which is the event immediately after c).
Kinesis streams or Kafka topics don't themselves care about the progress of any individual consumer - consumers are responsible for tracking their own position in the stream via sequence numbers / offset IDs.
> It doesn't necessarily make sense to retain all tweets forever by default (although there certainly are use cases for that)
Completely agree. I think a good point of distinction between pub/sub systems and unified log is: use pub/sub when the messages are a means-to-an-end (which is feeding one or more downstream apps); use unified log when the events are an end-in-themselves (i.e. you would still want to preserve the events even if there were no consumers live).
Anyway, I could talk about this stuff all day :-) - if you'd like to chat further, my details are in my profile!