This post is by Grammarly software engineers Anton Pets and Yaroslav Voloshchuk. See part 2 here.
Unsupportable spaghetti code. Scope creep. Instant legacy status for a new product the whole team had been excited about. Most engineers have a horror story about a project’s complexity being underestimated or misunderstood. But the damage from a spiraling project like this doesn’t always end with the last commit and deploy: the consequences can extend to ruined architecture, reduced team morale, and increased risk of employee attrition. And that can all happen before the project even ships, when there can be a risk of actual physical dangers resulting from malfunctioning software.
Complexity is a constant in computer science, and there’s no one way to conquer it. The Grammarly engineering team has wrestled with it for years, and we’ve worked out methods for identifying it and reducing it, ensuring that our features are finished cleanly and efficiently.
This post details some of our battles with complexity and what we learned from them. By walking you through the development of three Grammarly features, we’ll showcase our systematic approach and leave you with several concrete ways of finding and addressing complexity in software projects—while also offering up some of our favorite outside resources. Our hope is that your engineering team can plan more effectively and unearth complexity at the beginning of the project instead of being surprised when it reveals itself too late.
Real-world example: the Grammarly Editor
When planning the creation of the Grammarly Editor, we anticipated complexity coming from needing to handle texts of varying lengths. Working with a user’s whole text can be difficult and can affect performance, particularly if the text is large, as often occurs in professional-level use.
One way to approach the Grammarly Editor would be to sync with the back-end every time a user types a character. This would be one way of ensuring that the user is always seeing checked text. But an average writer types 120 characters per minute, and to optimize performance, it is better to group batches together. For that approach, we have three requirements to keep in mind:
- We want to manage UI lag while a user types while also considering how reacting to the user’s typing with immediate feedback could come across as an interruption.
- Even if we pace feedback to match the rate desirable for a user, we still have to check the user’s text continuously for a smooth experience.
- We have to save the text efficiently and not overwhelm servers with overly frequent requests.
These are what we’ve established as primary concerns for good user experience. However, our mission is to help people improve their written communication, so we do need to inform users about grammar mistakes and offer other suggestions for improving their text in a way that lets them react and learn. Alerting them to these mistakes alongside the text in progress adds the following complications:
- We have to show several animations without lag.
- We must intelligently manage the appearance of alerts, showing them at the right time.
- We need to calculate and manage error ranges in text.
Beyond that, we also have to navigate the complexity that comes from network-related issues. For example, we use WebSockets to communicate with the backend, which requires a constant connection to be maintained from the client side. On top of that, we have to deal with frequent network bottlenecks.
This is all from a single product central to Grammarly’s business. Whew!
We try to address this fairly common level of complexity by approaching it from different sides—a common but undervalued approach. All this complexity arises from one fairly straightforward function: processing and offering guidance on the submitted text. How can engineers hope to reduce complexity when the central functions needed from your product are so complex? Code problems are universal, but our work affects our users directly and consistently. Because of this, code problems can become human problems faster than in other systems.
Fortunately, there are ways, both theoretical and practical, to find the origins of complexity, to recognize it, and to deal with it in a consistent, effective way. Here’s how we handle this.
Avoiding complexity assessment mistakes
Complexity is an inherent part of any planning and assessing, and it only intensifies when you bring software engineering into the picture. We can’t avoid complexity entirely, but we can reduce its risks by keeping some things in mind—and ensuring that we really, fully understand the problem at hand. Here’s another example from Grammarly engineering.
Real-world example: full-text search
We received a task to create a feature that could perform full-text search on a set of documents within the Grammarly Editor. This proved to be far more complicated than we initially understood.
Unfortunately, when an engineering team is initially estimating the time it will take to finish a task, they are not always operating with a full understanding of the task at hand: they don’t take edge cases into account, they use third-party libraries, or they are basing their thinking on incomplete requirements. Even so, the primary source of software complexity is usually the inherent complexity of the problem.
People tend to be optimistic, which can be an admirable trait in everyday life but a disastrous one in software engineering. For example, opening a window is a straightforward action for the user, so if we receive a requirement to create a window in the product, we may think it’s a pretty straightforward component to add. But how many different types of windows are there? And how many functions can they perform?
As a user, one doesn’t think, “I’m going to open a window of type 17, and clicking this button needs to update three database entries.” It just opens. But as engineers, trying to encode this as a program involves different methods of pulling the handle, updating handle state, and other details. Before we start working with complexity at the code level, we need to understand the task more fully. Without doing so, the process might look like this:
- Perform some planning and resource allocation
- Choose the approach, architecture, and tools
- Write code
- Test
- Deploy
None of this is technically incorrect; the problems come when we follow those steps and begin to think that we’re at the point of having production-ready software—and only then take into account edge cases and requirements.
So for the search problem: we needed to ask the right questions while planning. In our case, we learned that the function should be able to search all documents, not only the visible ones. (For context, clients only have access to their first 100 documents, which is something we engineers thought was widely understood.) But uploading all the documents to memory would take too much time, so we realized we would need to perform a search on the server.
Once that aspect of the request was understood, more requirements appeared. We’d need to make sure the server search was fast enough. And perhaps the search would even need to work differently than initially assumed. Here’s our solution, which unearthed even more complexity:
- If the user has fewer than 100 documents, then we’d perform a front-end-only search.
- Otherwise, we’d make two parallel searches—both client- and server-side—and show the server results as soon as they were available.
- However, this solution creates an interesting edge case: a front-end search might show zero results, but then the user may suddenly see more results once the ones from the server are populated. To make this more user-friendly, we decided that if the front-end results are empty, we’d show a spinner until the server results appear.
These snowballing requirements help our user, but they bring additional complexity to our architecture—particularly because we didn’t account for them from the beginning.
Let’s walk back through these steps and call out some of the problems that arose. In the next section, we’ll explain each avoidable problem and how to prevent it.
Our realization that accessing more documents would require a server-side search and that this search would be too slow for our purposes was an example of incorrect initial assessment. Our plan to respond by searching on both the client and server sides, then merging them in real time, was underestimating technological complexity. The result—spaghetti code and a bloated timeline—would increase costs across time: underestimating project complexity.
These can all be catastrophic to a team striving for good software engineering practices. Fortunately, they’re all also avoidable with awareness and good process.
Avoiding complexity catastrophe
Before we write a single line of code, we try to describe the problem with as much detail as possible. We talk to colleagues, specialists, and non-specialists—and to the debug duck [see right]. Then we run requirements testing with engineers, ask how to break a feature, and brainstorm edge cases. We think about life cycles and how a feature will develop in a month or a year or more from now. We also look at how competitors do similar features and learn from their approaches. This helps avoid incorrect initial assessment. We ask questions until no new answers turn up. David Hogue’s talk “Simplicity Is Not Simple” offers more ways to dig into a situation to uncover, categorize, and address hidden complexity, with examples of successful complexity reduction from his work.
To complement our questions, we use a variety of tools. Our designers not only draw but also make interactive product prototypes to ease implementation. In very complicated cases, we build real models, using prototype-quality code to simulate full functionality.
We’ve also had success using a large mindmap. When we were completely rewriting the Grammarly Editor, we gathered all of its features this way, which made iteration much simpler and ensured that we didn’t miss a single requirement. It also helps us to avoid underestimating project complexity.
If the product’s components use logic that is particularly complex, we typically also draw a state diagram (or statechart) with events and transitions. Here is an example of a state diagram of a standard alarm clock.
Though the step seems deceptively simple, creating a state diagram is an effective exercise in understanding that state is anything but straightforward. We’ve found this to be a great way to avoid underestimating technological complexity—by not only documenting this area of complexity but also by ensuring that any lack of clarity or detail is deliberately and clearly eliminated before valuable engineering hours are allocated and used up.
We found these through research and experimentation, which we recommend you do within your team when creating your process.
Let’s look at these steps in action.
Real-world example: the Grammarly Editor sidebar
Before we begin coding, and after the intense research we do with SMEs and other stakeholders, we’ve typically created and abandoned about five different designs for the UI. There are two general categories of abandoned designs:
- Designs rejected by our UI designers or product managers
- Designs that passed those reviews but were problematic to implement from an engineering point of view. Some are hard to implement in general, and some have just a few difficult edge cases with no UX solution
After we have two ideas that are good enough to prototype, we use the 80/20 principle: How can we make 80 percent of a product in 20 percent of the time without architecting or testing? We then iterate until we as a team are happy with the result. In the case of the Grammarly Editor sidebar, we decided to use the sixth design as the basis for our product.
We then designed the architecture on a whiteboard and made an initial version of the feature to show a small number of real users. We did this four times, collecting and implementing feedback, and showing the new version to users to ensure we were on the right track.
Once we landed on a relatively stable version of the sidebar design, we began to create the animation, keeping in mind that smooth animation is critical for this feature to work. Our designer created an interactive prototype that covered 80 percent of our predicted uses, but we couldn’t implement the other 20 percent of edge-case uses without a real prototype.
We decided to use paper prototypes as seen below. The feedback we got from those helped us to finally implement the animation. But then we got feedback from our users: the motions in our 3D animation made them feel queasy—so we backed off and made everything much simpler and in 2D.
All of this happened before we wrote a single line of code meant for production. Since then, we’ve implemented and released the sidebar. We continue to get feedback, and we’re still actively working on improving it. However, by asking questions, gathering feedback, and collecting requirements beyond the obvious early ideas, we were able to work through a lot of the feature’s needs before using expensive coding hours and without forcing our users to become beta testers. The process was still complex, but it remained organized—and our code reflects that.
Making friends with necessary complexity
Programming essentially consists of composition and some work with complexity. With the use of basic math and logical operations, you can build any software system—if you can cope with the complexity. No matter what programming language you are using, you’re just combining smaller blocks into bigger ones until you get a program. It sounds simple, but complexity lurks even in that stripped-down description.
So no one can avoid all complexity—but it is possible to understand and account for it before diving into code. At Grammarly, we use some classic ideas from computer science to guide the planning and iteration we described above, which help us have an idea of how much complexity we can reasonably expect to reduce through that process. We can’t cover the entirety of the vast field of complexity in one post, but we can describe our view of the issue and detail a few high-level practices that have been effective for us in reducing unnecessary complexity in our engineering organization.
One way to define complexity is as a measure of the interaction of the components and the level of entropy in the software—i.e., how disorganized things are. In his 1987 Computer magazine essay “No Silver Bullet—Essence and Accident of Software Engineering,” Fred Brooks wrote that complexity comes from two places:
- From the task itself: the so-called essential complexity, which depends on the input and output of your task, and is caused by the outside world so cannot disappear or be reduced. We can only be aware of it and work with it. Think of our earlier example about the intersection of technical and behavioral needs in the Grammarly Editor.
- From what we bring to the code: accidental complexity, the random component. This is what we are mainly struggling with. Our responsibility as software engineers is to write working code with minimal accidental complexity.
Within the code itself, the two types of complexity are coupling and cohesion. Coupling measures how spaghetti-like your code is: to what extent the components are related to one another, how many dependencies you have. One goal of good early planning is to reduce this as much as possible. Cohesion shows how focused your components are and whether they have a clear purpose or blurred functionality. The lower the cohesion and the more diffuse your components are, the stronger the connectivity, which requires more extensive parameters and other clarity-reducing efforts. Our task, when working with the code, is to lower the coupling and increase cohesion. (Daniel Westheide goes into greater depth on these ideas in “The Complexity Trap: Think Before You Leap.”)
Toward a less complex future
The great and terrible thing about software engineering is that there are few simple problems. However, even the most complicated projects can be made easier to understand and complete by taking the right steps at the right time—ideally before writing any code. Ask questions, account for edge cases, and always assume there is more to be understood.
Complexity manifests itself in our code at different levels of abstraction. The main challenges belong to the realms of structural and behavioral complexity types. Our next post will complement this one by diving into some of the more classic computer science principles and best practices for identifying and avoiding complexity. Until then, let me know if you enjoyed this article by hitting the applause button or leaving a comment below!