To scale your web app, think Disney World
If you’re a successful growth-stage technology company, then you will inevitably run into technological scaling problems. They’ll hit you like a ton of bricks and you’ll need to rid yourself of them as quickly as possible.
For anybody who is not on the engineering team when this happens, it can be jarring. For anybody who is on the engineering team, the questions that arise can be frustrating. Questions such as:
Isn’t technology running in the cloud supposed to scale easily?
Weren’t these problems supposed to be dealt with during the initial architecting of the services?
Can’t we just throw money at this?
The answer to most of these questions, in most cases, is “no” for one simple reason: Scalability is a result of a complex web of choices made as a part of human-centered design. The ability to understand and then design around the needs of actual humans cannot usually be done effectively in advance of real-world experimentation (i.e. the product/market fit phase). It also can’t usually be purchased from a cloud provider or any other vendor - not without first investing in understanding how to use the tools they’re selling.
So for a few minutes, transport your mind to Disney World. We’re going to explore the scaling challenges such a theme park needs to deal with and the solutions they employ. By the end of this, I promise you’ll be convinced that scaling challenges are best solved by first focusing on user experience, then looking to technology for the tools that facilitate such experiences.
1. Selling the park
If you’re an executive running Disney World, there are a lot of activities your guests engage in with you prior to being in the park. They view content, they see advertisements, they purchase tickets, etc. A lot of these engagement points are highly scalable. But what happens once people actually come to the park?
2. If you build it, too many people might come
There are way too many people coming to the park. A team member has a great idea: Expand it! Perhaps new parks with new themes, all close to each other and part of the same experience. But another team member has a concern: Isn’t that wasteful? How do you know exactly how many parks to build? Isn’t there a ceiling somewhere?
3. Limitations
There is always an upper limit on a market, your team decides. In Disney World’s case, there are some key factors:
Intra-region population
Bandwidth for travel into the region
Lodging availability
Of course, some of this can be altered or even change naturally over time, but the key word here is time. Knowing what the next decade or so looks like is key because you don’t want to waste capital building and maintaining parks that will sit idle for years. Your team decides that this initial wave of guests is temporary and that you are comfortable with the idea that in the next decade or so, you will see an often full and sometimes overcrowded park.
You just learned about the importance of domain knowledge and requirement definition in scaling.
4. Letting guests in
This should be the easy part, but already there’s a bottleneck. Guests need their tickets scanned, bags checked, etc. It turns out that with more people coming to the park than expected, you need to build more gates so that guests don’t end up spending half of their days waiting to enter. You need to ensure traffic to each gate is distributed somewhat evenly and you need to do it in a way where the guests are able to discern their own gate for themselves. How?
One team member proposes segregating entrances by last name - one for A-D, another for E-H, etc. But it’s too confusing for groups and inconvenient for folks with distant starting locations. Another team member proposes segregating entrances by the ticket level - one for 1 day passes, another for 3 day passes, etc. But it might not distribute evenly as seasons and consumer demands shift, and there is similar inconvenience depending on starting location. Finally, another team member proposes segregating by hotel. Everyone decides this idea makes a lot of sense because gates can be built for each hotel and the capacity of each is implicitly controlled.
You just learned about load balancing and sharding.
5. Counting
Now that guests are arriving at the right gates, your team turns its attention to ensuring that it can get an accurate real-time count of guests in the park so that it can constantly adjust its logistics in real-time. This seems easy, but there’s just one problem: there are dozens of turnstiles per gate and each takes its own count. Until now, those counts haven’t been observed until the end of each day. Asking turnstile attendants to halt all guests in place every minute to report their numbers would be inefficient.
A team member has an idea: What if a separate group of employees rotate around counting the turnstile numbers, ultimately reporting them back to a more central location? Everyone agrees that even though it’s not real-time and even though the central numbers may not always be precise, it’s the best they’ll probably get given the scale with which they’re dealing.
You just learned about database tradeoffs between consistency, availability, and partition tolerance.
6. Where are we going?
Guests are now filing in. But the park is big, the attractions are plentiful, and nobody knows where to start. In particular, your frontline customer service employees are getting slammed with basic questions about which attractions should be avoided at the moment because they are too crowded. There has to be a better way.
A team member has an idea: Add screens around the park showing wait times for each ride. Everybody thinks this is an excellent idea. That said, there will be tradeoffs. One approach is that wait times could be updated in the reporting system in real-time as they change, which would require a lot of procedural changes. Another is that they could be periodically observed, which would introduce some lag in the reported wait times viewable by guests.
You just learned about caching and content delivery networks.
7. One place to eat
Now all of the guests know where they’re going, but that doesn’t mean their travels throughout the park are efficient. One area of concern in particular is food services. They’re scattered throughout the park rather than in one area. That means that if one guest from a family wants pizza and the other wants chicken, they can’t eat together. And this is more than a pain for each family. It’s still one of the lingering issues creating unnecessary foot traffic in the park, which is consequently affecting the experience of every other guest as well.
Luckily there’s an obvious fix here. Most of the food services should be grouped together and clearly identified on the park map, this way families can look up where they are, head straight to the food, and get their needs met while keeping the park’s foot traffic to a minimum.
You just learned about indexing.
8. Lines
At any theme park, the attractions are front and center. And their popularity can be a problem, with literally thousands of people competing for a couple dozen attraction spots at a time. Imagine if it were simply left up to the guests to work this out for themselves. There might be mobs pushing to get into entrances, with no rhyme or reason to who gets in first. And then, wait times would slow even further as employees struggle to manage an angry crowd. Some people would end up simply leaving in frustration.
No, team members insist - the park needs clear lines. And depending on the attraction, those lines might need special properties. For example, a ride with 8 rows of seats might need to convert from one long line into eight smaller lanes as it gets closer to the ride.
You just learned about cascading performance problems, timeouts, and FIFO queuing.
9. Waiting without resentment
Of course, while lines might make the experience of waiting for an attraction less chaotic, they are still extremely frustrating. Crowds might still bubble up with anger, with certain folks leaving.
The key, one team member hypothesizes, is in understanding everything that makes a line frustrating, and doing everything possible to reduce that frustration:
Knowing where one stands - put markers in the line to make it clear about how long the wait it is depending on where a guest is standing
Not having other needs met - put restrooms and small concession stands near the lines
Not being entertained - position screens throughout the line and play relevant video clips
Being denied the opportunity to visit other attractions - create virtual lines via a program like FastPass
You just learned about user experience design.
10. Controlling demand
No matter how the team slices it, it’s inescapable that guests will prefer shorter lines no matter how great the experience is of being on one. So, they decide, they must focus on somehow reducing the number of people waiting for any one attraction.
One team member suggests distributing popular attractions throughout the park in between other less popular attractions to encourage crowds to explore all of the attractions and not aggregate to one corner of the park. The fact that gates are located in different areas of the park will also help with this. Everyone on the team agrees.
Another team member suggests that for live performance attractions, video recordings can be made to allow guests to watch later. Better yet, these videos can be played while guests are waiting in other lines. This is a good way to reduce demand for certain live performances while simultaneously enhancing the experience of another attraction’s line.
Of course, there will come a point where more rides and park real estate will be the only solutions left. But everyone on the team agrees that if these solution are employed prematurely, Disney World will end up with a compromised customer experience and poor unit economics. There won’t be a viable business to scale.
You just learned about the merit of easing demand.
Scaling challenges are user experience challenges
Technology companies can’t usually solve their scaling challenges by banking on technological progress or outside vendors as a panacea for the same reason Disney World can’t. Every component of every system - digital or physical - has unique strengths and weaknesses when it comes to scale and must be carefully chosen and tweaked to ensure the right balance. This starts with understanding humans, their behaviors, their limitations, and the metrics that result. It ends with implementing creative solutions using the right tools.