The Changing of the Guard

Changing-of-the-Guard

 

IBM, Cisco, Intel, Microsoft, Dell, HP, and other large legacy hardware and software companies have something in common these days. A declining revenue stream. Big time.

IBM_logo     cisco      intel_logo

Its not just one or two companies, its most of the big ones. And the results will ultimately effect employees as each quarter passes and they are forced to reckon with Wall Street earnings and reports. The stock market takes no prisoners.

Online newcomers with ‘disruptive’ business models and software are flourishing. Box, Dropbox, Workday, Amazon, Salesforce, Facebook, LinkedIn, all have reported record quarters. Even Apple despite its recently declining stock price is still growing. Just about anything that has to do with mobile phones and tablets has the ‘Midas touch’.  Google Inc. said last Thursday that its revenue grew 31% in the first quarter, while profit rose 16%.

IBM last Thursday reported its revenue dropped.

Software giant Microsoft Corp., once known for rapid sales of PC software, reported that the business that includes its Windows operating system turned in essentially zero growth

Intel Corp., which has struggled to get its chips into mobile devices reported a first-quarter profit drop of 25% on revenue that declined 2.5%.

Oracle Corp., reported a 1% drop in its revenue in its most-recent quarter.

The disparities are the result of technology shifts—the rise of mobile devices and slowing growth in personal computers, conventional software replaced with online versions and cloud outsourcing by corporations. Companies want to rent software and computer systems. The deals are smaller and take less time to implement. Companies want to get out of the construction business—building and rolling out expensive software and hardware systems.

Web-based technology makes it easier for consumers and corporate employees to try new things and makes it harder for older technology suppliers to keep rolling out huge hardware and software deals month in and month out. Hardware, chips and hard drives get faster, smaller and cheaper now every 3-6 months making large purchases by corporations old before the equipment barely gets installed.

Workday Inc which was founded in 2005 and went public in October, reported that revenue for its fourth quarter ended in February rose 89%

Box Inc., founded in 2005 that lets customers store their data online and tap into it from mobile phones and PCs., revenue grew more than 150% in 2012 and it expects another doubling again this year.

“Their biggest challenge is they live in a world of legacy business models,” said Ed Anderson, an analyst with technology research firm Gartner Inc.

Advertisements

The Great Chaos Monkey!

Apr 25, 2011
Working with the Chaos Monkey

Late last year, the Netflix Tech Blog wrote about five lessons they learned moving to Amazon Web Services. AWS is, of course, the preeminent provider of so-called “cloud computing”, so this can essentially be read as key advice for any website considering a move to the cloud. And it’s great advice, too. Here’s the one bit that struck me as most essential:

We’ve sometimes referred to the Netflix software architecture in AWS as our Rambo Architecture. Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.

If our recommendations system is down, we degrade the quality of our responses to our customers, but we still respond. We’ll show popular titles instead of personalized picks. If our search system is intolerably slow, streaming should still work perfectly fine.

One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.

Which, let’s face it, seems like insane advice at first glance. I’m not sure many companies even understand why this would be a good idea, much less have the guts to attempt it. Raise your hand if where you work, someone deployed a daemon or service that randomly kills servers and processes in your server farm.

Now raise your other hand if that person is still employed by your company.

Who in their right mind would willingly choose to work with a Chaos Monkey?

Angry-monkey-family-guy

Sometimes you don’t get a choice; the Chaos Monkey chooses you. At Stack Exchange, we struggled for months with a bizarre problem. Every few days, one of the servers in the Oregon web farm would simply stop responding to all external network requests. No reason, no rationale, and no recovery except for a slow, excruciating shutdown sequence requiring the server to bluescreen before it would reboot.

We spent months — literally months — chasing this problem down. We walked the list of everything we could think of to solve it, and then some:

swapping network ports
replacing network cables
a different switch
multiple versions of the network driver
tweaking OS and driver level network settings
simplifying our network configuration and removing TProxy for more traditional X-FORWARDED-FOR
switching virtualization providers
changing our TCP/IP host model
getting Kernel hotfixes and applying them
involving high-level vendor support teams
some other stuff that I’ve now forgotten because I blacked out from the pain

At one point in this saga our team almost came to blows because we were so frustrated. (Well, as close to “blows” as a remote team can get over Skype, but you know what I mean.) Can you blame us? Every few days, one of our servers — no telling which one — would randomly wink off the network. The Chaos Monkey strikes again!

Even in our time of greatest frustration, I realized that there was a positive side to all this:

Where we had one server performing an essential function, we switched to two.
If we didn’t have a sensible fallback for something, we created one.
We removed dependencies all over the place, paring down to the absolute minimum we required to run.
We implemented workarounds to stay running at all times, even when services we previously considered essential were suddenly no longer available.

Every week that went by, we made our system a tiny bit more redundant, because we had to. Despite the ongoing pain, it became clear that Chaos Monkey was actually doing us a big favor by forcing us to become extremely resilient. Not tomorrow, not someday, not at some indeterminate “we’ll get to it eventually” point in the future, but right now where it hurts.
Now, none of this is new news; our problem is long since solved, and the Netflix Tech Blog article I’m referring to was posted last year. I’ve been meaning to write about it, but I’ve been a little busy. Maybe the timing is prophetic; AWS had a huge multi-day outage last week, which took several major websites down, along with a constellation of smaller sites.

Notably absent from that list of affected AWS sites? Netflix.

When you work with the Chaos Monkey, you quickly learn that everything happens for a reason. Except for those things which happen completely randomly. And that’s why, even though it sounds crazy, the best way to avoid failure is to fail constantly.

Guest Post by Jeff Atwood