Deployment times for Etsy’s Service Platform on Cloud Run are reduced from days to less than an hour.
Overview
Popular online business Etsy sells vintage, handcrafted, and unusual items and attempts to give excellent service. Etsy needs more people, technology, and resources like many fast-growing organizations. Over 1400% of its gross product sales climbed to $13.5 billion between 2012 and 2021.
Etsy moved all of its infrastructure from conventional data centers to Google Cloud in an attempt to keep up with this development. In addition to being a major technology advancement, this change forced Etsy to reconsider how it approaches service development. The process resulted in the establishment of “ESP” (Etsy’s Service Platform), a Google Cloud Run-based service platform specifically designed for Etsy that simplifies microservices development, deployment, and administration.
The need for change and architectural vision
The need for technical team to handle more sophisticated features and more traffic in Google cloud marketplace increased along with Etsy’s growth. Etsy developers were able to investigate and use Google Cloud-based service platforms with 2018 transfer to GCP. However, this surge of technological innovation also brought out some new difficulties, such as redundant code and scaffolding and unsupported infrastructure with unclear ownership.
In order to overcome these obstacles, Etsy brought together a group of architects to create a blueprint outlining the direction of the company’s future service growth. The objective was unambiguous: establish a platform that frees developers from the burden of backend complexity and enables them to swiftly and securely launch new services by separating service development from infrastructure.
Transforming vision into reality
The resultant architectural concept served as the foundation for Etsy’s Service Platform, or ESP, and a newly assembled team was tasked with the thrilling task of making the vision a reality. Putting together a dynamic team that could bridge the gap between application development and infrastructure was the first step. The team, which was made up of seasoned engineers with a variety of specialties, contributed a wide range of abilities.
Understanding how critical it was to connect with future platform users, the team worked closely with Etsy’s engineering and architecture. By consenting to embed one of their senior engineers in the service platform team, the Ads Platform Team, which was previously involved in service development, played a crucial role. As part of the Etsy’s Service Platform experiment, they jointly produced a Minimum Viable Platform (MVP) to facilitate the rollout of a new Ads Platform service.
Choosing Cloud Run for accelerated development
By separating infrastructure and automating its provisioning, architectural vision for a successful service platform would simplify the developer experience. The team realized that the bigger engineering organization’s prospective clients also need a platform that could seamlessly integrate into their workflow. The service platform team decided to concentrate on Etsy-specific elements in order to do this, including observability, service catalog, security, compliance, CI/CD, connection with current services, developer experience and language support, and more.
It was a calculated move to use Google Cloud services, particularly Cloud Run. The team intended to provide value as soon as possible, even if options like GKE were alluring. The team was able to concentrate on core platform functionality because to Cloud Run’s strong and user-friendly architecture, which helped Cloud Run manage the more difficult and time-consuming parts of executing containerized services.
The Toolbox: A Closer Look
Etsy’s Service Platform uses a well chosen toolkit to provide a reliable and effective development and operational experience:
- Developer Interface: A specially designed CLI tool to make developer interactions more efficient.
- Protocols for standardized communication include protobuf and gRPC.
- Supported languages include Go, Python, Node.js, PHP, Java, and Scala.
- CI/CD: Use GitHub Actions to provide a seamless pipeline for integration and deployment.
- Observability: Using Prometheus, AlertManager, Google Monitoring and Logging, and OTEL on Google Cloud services
- Client Library: Artifactory has Etsy’s Service Platform-generated clients registered.
- Service Catalog: Centralized service visibility via Backstage.
- Cloud Run was selected as the runtime due to its compatibility and ease of use.
Navigating Challenges
There were challenges along the way to developing the service platform. Overloading occurred on the VPC connection, and in order to maximize resource allocation, some services needed to be adjusted. Future adopters will benefit from platform-level enhancements brought forth by these difficulties.
Flexibility was given top priority in Etsy’s Service Platform design to account for varied technological environment. Despite the team’s multi-technological experience, it was difficult to develop a platform that could accommodate a wide range of service and client languages and use cases. Based on customer input, Google cloud made the decision to first concentrate on a core feature set and then add incremental capabilities and workarounds.
Important lessons learned throughout ESP’s development influenced both its ongoing operations and its future direction.
Sandbox Feature: Developers were able to deploy development versions of new services on Cloud Run in less than five minutes, replete with CI/CD and observability, with a rapid iteration process provided by a “sandbox” environment.
Known Observability Tools: ESP simplified engineer processes by integrating with current tools, such as promQL and Grafana.
Security Considerations: Working with the Google Serverless Networking team guaranteed safe connection with the old apps, even though ESP preferred TLS and layer 7 authentication via Google IAM.
Encouraging AI/ML Innovation: ESP’s flexibility was shown at a company-wide hackathon when a service that interfaced with Google’s Vertex AI was quickly put into use.
Real-World Success: As client support in new languages became available, the Ads Platform service grew to three more systems. The increasing load was effortlessly managed by Cloud Run’s auto-scaling.
Conclusion and Future Outlook
Etsy’s Service Platform is being steadily and continuously adopted across the company, allowing engineers to be bold, quick, and safe. Collaboration between Google cloud internal GKE team and Google has been sparked by customer needs for workloads beyond the serverless approach. Extending ESP’s tools to accommodate a growing range of services while preserving a consistently high standard of developer and operational experience is the aim.