At Grammarly, as at many companies around the world, navigating remote work over the past year and a half has presented certain challenges. For our engineers, one of these challenges has been being separated from on-premise infrastructure. Having a distributed workforce requires teams to consider how to move internal systems to the cloud for faster connections and flexible scalability. It’s certainly been top of mind at Grammarly, where our team has grown significantly across our four global offices during this period of remote work.
Grammarly has used Amazon Web Services (AWS), the industry-leading secure cloud platform, to host our infrastructure for many years. However, there’s always been one notable obstacle: building for iOS and macOS.
Grammarly for Mac, Grammarly for iPad, Grammarly for Chrome, and Grammarly for Safari are among our most popular product offerings. They have also been among the trickiest applications for us to develop, build, and release, because Apple requires that products built for its ecosystem are developed on top of Apple hardware. Since this hardware is expensive and complex to provision, for years it’s been extremely hard to find a good solution for building iOS and macOS apps in the cloud. However, new offerings are emerging in this space: Notably, Amazon released support for EC2 Mac instances in November 2020.
Around the same time, we on Grammarly’s Platform team decided to transition our iOS and macOS build environment from our fleet of on-premise Apple devices to these brand-new EC2 Mac instances. We believed this would help us scale our development and achieve a smoother and more standardized build process, resulting in being able to deploy bug fixes and features more quickly.
We’re glad to report that we’ve made the move successfully, after navigating some of the bumps that are to be expected (we were probably among the first AWS customers to switch to the EC2 Mac instances!). Here we share specific takeaways and tips to help anyone looking to undertake a similar transition.
From “in-house” to “in-cloud”
Two main factors drove us to search for a better solution than our on-premise build system for macOS:
- Scalability: As we were rapidly taking on new teammates and projects while working remotely, we’d started to come up against resource limitations. With an on-premise solution, the number of physical devices you have access to at any given time is static. If it were simple to provision new macOS devices during the pandemic, this might not have been a problem—but even typically, this isn’t so easy and can take a long time. You have to physically purchase the hardware (easier said than done during lockdown!), set up all the software, and do a lot of manual configuration on top of that.
- Performance: An on-premise solution is designed for a team sharing a physical office, posing an obvious challenge for distributed work. For example, Grammarly engineers in San Francisco often needed to VPN into machines in our Kyiv office to build their macOS projects. Coordinating across time zones and over a long-distance VPN connection meant we had a variety of connectivity problems, slow performance issues, and logistical headaches.
Evaluating solutions
We ran an extensive evaluation of vendors for a cloud build system and came up with a shortlist that included MacStadium, MacinCloud, and Amazon EC2 Mac instances. We were impressed by MacStadium’s Orka solution, where they are managing to run macOS inside Docker containers on Kubernetes—that’s basically rocket science! But in the end, we decided to go with AWS, for several reasons:
- We expected that Amazon would provide a high level of stability.
- Our existing infrastructure toolset was supported, namely Packer for Amazon Machine Image (AMI) management and Terraform for resource provisioning.
- Since we were already using AWS at Grammarly, staying in the same ecosystem would eliminate the need to perform compliance and other legal verifications for a new vendor.
The known and unknown challenges
From the time we started working on this project in November 2020, Amazon’s Mac instances were brand new—they launched at the end of November. What kinds of issues and technical nuances would we encounter as some of the first customers to use this system? We could only guess.
There were, however, a few “known unknowns” that we could start figuring out right away. Namely, we wondered:
- How could we automate the macOS software configuration for AMIs?
- How could we automate the provisioning of build servers?
- How could we automatically register the build servers with GitLab, which we use for CI/CD?
We’ll answer these questions and explain some of the other issues we encountered. But first, let’s dive into an overview of Amazon EC2 Mac instances.
Deep dive: EC2 Mac Instances
Supported features
The official documentation describes the new AWS offering pretty well. But here we’d like to focus on specific features that may be valuable for your build environment:
- Mac instances support a user data feature, letting you specify some scripts that should be run right after launch. We use this to do things like register the instance in GitLab and register the device with our vulnerability scanner.
- They also support Systems Manager (SSM) Agent and Session Manager to set up access controls and other kinds of configurations for your services.
- Several AWS CLI tools are pre-installed on macOS from the Homebrew Tap, maintained by AWS.
- Mac instances come with Enhanced Network Adapter, and our test upload/download to the S3 bucket was about 300GB/s.
- Mac instances can report CPU, network, and EBS metrics to CloudWatch.
- Mac instances boot from EBS volumes, but the device’s built-in SSD drive is available. However, AWS does not guarantee data safety on that drive—so be careful.
Missing features
We’d also like to call out some features that, as of June 2021, were not yet available. While they were not critical blockers to our implementation, we are hopeful that AWS will continue developing EC2 Mac instances along some of the following paths:
- Mac instances do not support Auto Scaling. It would be great to have this so that build environments can be dynamically added and removed as resource needs change (for example, based on the number of build requests we have in GitLab).
- If you attach an EBS volume while the instance is running, the instance won’t recognize it. You have to reboot to make the volume visible to the system.
- Mac instances do not support some AWS services that rely on additional custom software, such as EC2 Instance Connect and AWS Inspector.
- They also do not support the instance screenshot and instance console output troubleshooting features that other EC2 instances offer.
- Mac instances are currently available in a limited number of AWS Availability Zones. You can check the availability for a specific region using the CLI. Here’s an example:
``shell
aws ec2 describe-instance-type-offerings --filters
Name=instance-type,Values=mac1.metal --location-type
availability-zone-id --region us-east-1
```
Instance lifecycle timings
We’ve noticed some lifecycle-related latencies with Mac instances that may be helpful to call out for engineers looking to work with them:
- 5–7 minutes from starting the instance until you’re able to SSH into it.
- 60–120 minutes to clean the data on an EC2 Dedicated Host after instance termination. You won’t be able to launch a mac1.metal on the Dedicated Host during this time.
Mac instances are available only as bare metal instances on Dedicated Hosts. In other words, each instance is running one-to-one on a physical Mac mini. This means that much like your computer when you trade it in at the Apple Store, the Dedicated Host needs to go through some cleanup when it’s released to make sure that your data is always scrubbed and completely protected from the next user.
Automating EC2 Mac instance configuration
Software provisioning with Ansible
Ansible is a popular platform for configuration as code, and we decided it would be an apt tool for configuring the software for our EC2 Mac instances. Ansible Galaxy doesn’t have many management modules for macOS, unfortunately. But it does have the basics:
- homebrew with homebrew_cask and homebrew_tap, to install software
- launchd, to manage services
- osx_defaults, to manage some user settings
It’s important to note that when you create a Mac instance with a custom-sized EBS, macOS does not automatically resize the Apple File System (APFS) container volume to match the EBS size.
Therefore, you need to do it on your own. Here is an example of resizing an APFS container with Ansible:
```yaml
- name: Get root disk identification
shell: "diskutil list physical external | grep 'GUID_partition_scheme'| tr -s ' ' | cut -d' ' -f6"
register: root_fs_disk
- name: Get APFS container identificator
shell: "diskutil list physical external | grep 'Apple_APFS' | tr -s ' ' | cut -d' ' -f8"
register: apfs_container
- name: Repair FS on root disk
shell: "yes | diskutil repairDisk {{ root_fs_disk.stdout }}"
- name: Resize APFS Container
shell: "diskutil apfs resizeContainer {{ apfs_container.stdout }} 0"
```
With the container resizing out of the way, software installation can take place through a mix of modules and shell commands. Though Ansible does not have many modules for macOS, fortunately there’s plenty of information about built-in macOS terminal utilities on the internet. Here is an excellent source for shell commands in macOS.
For example, this is how we install and bootstrap Xcode during AMI creation:
```yaml
- name: Get Xcode installation file
get_url:
url: "https://{{ xcode_url }}/{{item}}.xip"
dest: "{{ xcode_xip_location }}"
- name: Verify Xcode xip archive
shell: "pkgutil --check-signature {{ xcode_xip_location }} | grep 'Status: signed Apple Software'"
changed_when: false
- name: Install Xcode
shell: |
xip --expand {{ xcode_xip_location }}
mv Xcode.app {{ xcode_app }}
args:
chdir: /Applications
creates: "{{ xcode_app }}"
poll: 5
async: 3600 # Prevents SSH connections timing out during installation
- name: Accept License Agreement
shell: "{{ xcode_build }} -license accept"
- name: Run Xcode first launch
shell: "{{ xcode_build }} -runFirstLaunch"
- name: Switch into newly installed Xcode
shell: "xcode-select --switch {{ xcode_app }}/Contents/Developer"
```
Hint: Although it’s possible to automate the license agreement acceptance, be sure to read it before running this automation.
Here’s another example. This time we’re installing some software using Homebrew:
```yaml
- name: Update homebrew and upgrade all packages
community.general.homebrew:
update_homebrew: yes
upgrade_all: yes
- name: Install build software
community.general.homebrew:
name: "{{ item }}"
state: installed
loop:
- git-lfs
- swiftlint
- swiftformat
- wget
- carthage
- imagemagick
- ag
- "ruby@{{ ruby_version }}"
```
Creating AMIs with Packer
We use Packer for AMI creation. For Mac instances, we’ve found that our Packer configuration and usage don’t differ much from our other use cases. However, because Mac instances can take longer to spin up, we suggest increasing some timeout-related directives for a Packer builder:
```json
"aws_polling": {
"delay_seconds": 30,
"max_attempts": 60
},
"ssh_timeout": "20m"
```
Helpfully, the builder configuration does not require you to explicitly declare the Dedicated Host ID. However, to benefit from this feature, you need to enable the auto-placement parameter during Dedicated Host allocation.
Configuration at launch
As we mentioned earlier, Mac instances support user data. And it’s important to note that Mac instances have a new implementation of this feature that’s specific to macOS. It is called EC2 macOS Init, and it’s quite powerful.
In a nutshell, this utility is a macOS launch daemon that runs on behalf of the root user at system boot. It executes commands according to “Priority Groups” (in other words, it follows a predefined sequence).
EC2 macOS Init uses a human-readable configuration file in TOML format, where commands and options are defined. These are divided into modules with different purposes. For example, there is a module to manage network settings and a module to execute the script passed through the user data EC2 parameter. When EC2 macOS Init runs on an instance for the first time, it creates a unique directory keyed on the instance ID to store the execution history and user data copy. This allows it to track its invocations and decide whether to run again on the next instance boot (you can control this behavior).
We use EC2 macOS Init for a variety of post-provisioning configuration tasks:
- Registering the instance as a GitLab Runner with the provided token
- Installing and registering our vulnerability scanner
- Scheduling the reboot to run the GitLab Runner in a user session on the next boot
How to provision (almost) any number of Mac instances
By default, there is a Service Quota of three mac1.metal Dedicated Hosts per region. If you’re trying to spawn a fleet of Mac instances, you’ll first want to increase this quota manually through your AWS Web Console.
We provision our infrastructure with Terraform. However, as of June 2021, there is no support for the Dedicated Host resource type in Terraform AWS provider. Fortunately, we found a workaround to overcome this: using the CloudFormation template inside Terraform. It’s never ideal to mix different infrastructure-as-code tools, but this is the only reliable way for us to provision the Mac instances for now. However, we will change this to the Terraform resource once it’s available, for the sake of codebase consistency.
Here is a code snippet that illustrates this approach:
```hcl
locals {
cf_template_body = <<STACK
{
"Resources" : {
"MyDedicatedHost": {
"Type" : "AWS::EC2::Host",
"Properties" : {
"AutoPlacement" : "on",
"AvailabilityZone" : "${data.aws_availability_zone.az_name_by_id.name}",
"HostRecovery" : "off",
"InstanceType" : "${var.instance_type}"
}
}
},
"Outputs" : {
"HostID" : {
"Description": "Host ID",
"Value" : { "Ref" : "MyDedicatedHost" }
}
}
}
STACK
}
resource "aws_cloudformation_stack" "dedicated_host" {
count = var.desired_count
tags = local.resource_tags
name = "${local.service_name}-${var.environment}-host-${count.index}"
timeout_in_minutes = 20
template_body = local.cf_template_body
}
resource "aws_instance" "runner_instance" {
count = var.desired_count
ami = local.ami_channels[var.ami_channel]
instance_type = var.instance_type
host_id = aws_cloudformation_stack.dedicated_host[count.index].outputs["HostID"]
tenancy = "host"
iam_instance_profile = aws_iam_instance_profile.this.name
key_name = var.key_name
vpc_security_group_ids = [aws_security_group.default.id]
user_data = local.user_data
tags = merge({ "Name" : "${local.service_name}-${var.environment}" }, local.resource_tags)
volume_tags = local.resource_tags
subnet_id = data.aws_subnet.instance_subnet.id
timeouts {
delete = "20m"
}
}
```
What we’ve achieved, and what’s next
By being able to offer Mac instances to our engineering team, we solved the main issues we had with on-premise devices. We achieved several things:
- Fast deployment of build servers
- Better network performance (because build servers no longer require VPN)
- Standardized software and resource provisioning
- The ability to horizontally scale our build servers to the required number
We wanted to share our experience with other engineers who may have a similar situation or are looking for some ideas in this field. As is always the case at Grammarly, we’re still looking for ways to add and improve—and we know other engineers will have their own creative solutions.
Mainly, Grammarly’s Platform team wants to give teams at Grammarly the ability to customize their macOS software in a way that’s self-service. If an engineering team needs certain packages or tools, we’d like them to be able to specify that when they request an instance—ideally in a way that’s simple, so they don’t have to dig into the provisioning and configuration tools themselves.
Overcoming challenges like this using creativity and innovation is what we do best at Grammarly, where our product helps 30 million people and 30,000 professional teams around the world write effectively every day. We’re seeking engineers to help us further our mission of improving lives by improving communication. If you are interested in bringing your talent and ideas to solve other challenges we have, check out open roles on the Platform team!