Google published a whitepaper named Zanzibar back in 2019. After that, it gained a lot of attention. Carta and Airbnb companies even began to shift their legacy authorization structures to Zanzibar-style systems. Google Zanzibar is a global authorization system that helps to handle authorization for Google services and products, such as Drive, Maps, YouTube, Cloud, and Calendar. Over time, Zanzibar-based solutions have increased. Google is trying to make Zanzibar easily accessible so that all users can use it and benefit from its services and apps. In order to handle the difficult access requirements, the company designed its Zanzibar authorization system. Let’s learn “What Is Google Zanzibar?”
What is Google Zanzibar?
It refers to a white paper describing Google’s authorization system that is capable of handling the authorization for its huge number of services. This term is still popular in the IAM space and is synonymously used for describing the fine-grained authorization, just as RBAC or Role Based Access Control, which describes authorization systems.
It is popular for its distributed architecture, which offers consistency and scalability. Developers use it to model their permissions as data in a graph. It provides a natural solution to implement ReBAC.
Why Did Google Build Zanzibar?
Google Search, Sheets, Docs, and Gmail are some of the top Google apps that are highly trafficked. As Google accounts are shared between these systems, it is essential to coordinate authorization decisions. Besides, these apps work at huge scales. That’s why constant inter-service communication was not practical.
Their authorization system has to handle plenty of objects that billions of users share and must return results with very low latency. In addition, it is essential for their system to handle filtering questions, such as which documents can be seen by the user.
Their authorization system must be:
- Error-free:
An incorrect authorization decision lets you view a document that isn’t meant for your eyes.
- Fast:
Other apps are going to be waiting on authorization decisions from Google Zanzibar. The target of the company was <10ms per query.
- Highly available:
Authorization needs to be available at least at the level of apps that depend on it.
- High-throughput:
The company handles plenty of queries per day.
What Does Zanzibar Do Well?
This term refers to a centralized source of authorization decisions, and this approach is beneficial for two reasons. Every service is able to call Zanzibar. Then, it responds to your question by saying “yes” or “no” answer. These answers are consistent between services. Additionally, every service calls the same API. As a result, this one becomes simpler to use across multiple services.
It is compatible with reverse indexing, which is called data filtering. Once individual permissions are assigned to the user, it is possible to ask, “to which resources does the user have access?” Actually, this one is a fundamental authorization request. This authorization system helps to maintain and debug access controls.
Functionality:
This globally distributed authorization system includes five core methods— read, write, watch, check and expand.
The first three methods help to interact directly with the authorization data. However, the focus of the other two methods is mainly on authorization.
Let’s see the explanation for every method;
-
Read (2.4.1):
It enables you to query the graph data directly and helps to filter authorization data & the stored relational tuples.
-
Write (2.4.2):
It lets you write relational tuples to the data storage. These tuples are able to write into both the Zanzibar system and the app’s database within the same flow in a standard relational database-based app, which is capable of implementing the Zanzibar-style system.
-
Watch (2.4.3):
The name itself implies the functionality of this method. It helps to watch the relation tuple changes in the graph data.
-
Check (2.4.4):
Its use can be seen in enforcement. You are able to check the access in the form of “Can subject S have permission P on resource R” in a system based on Zanzibar. In this case, ensure that both resource and subject are specified. The check engine will walk through the relation’s graph to calculate the decision.
-
Expand (2.4.5):
Its purpose is to improve the reasonability and observability of access permissions. It provides a user set as a return, helping to form an ‘object # relation’ in Zanzibar.
Data Consistency:
This authorization system is accessed constantly with multiple access checks. That’s why it is necessary to make it secure. When you think about a system with actively managed data, the security is mainly connected to the consistency of collected and managed data.
Due to the inconsistent data, you can get false negative or false positive decision results on authorization checks. Therefore, when you see any schema update happening, all apps using this authorization system need to update their cache to avoid false positive or false negative decision results.
This globally distributed authorization system follows a snapshot-read approach to prevent the issue. Generally, it ensures that the evaluation of enforcement is done consistently to avoid inconsistency. The service’s team produced tokens that are known as Zookies. These are made of a timestamp which gets compared in access checks. The purpose is to ensure that the enforcement’s snapshot should be at least as fresh as the timestamp of the resource version.
How Does This Globally Distributed Authorization System Solve These Problems?
-
Correctness:
This authorization system limits both system and user errors. In this case, the semantics in this system are designed very carefully. For a resource such as a git repository, the API of this system has exposed who is able to edit, delete or act upon the repository. Also, it exposes why they can view this and how to prevent this from being seen.
This authorization system is capable of limiting system errors. As this one is a distributed system, it will take time in order to propagate new permissions. In order to prevent data staleness, this authorization system stores permissions in the Spanner database of Google. Spanner is capable of offering strong consistency guarantees. That’s why this globally distributed authorization system does not apply old permissions to new content.
-
Speed And Availability:
In order to reduce latency, this authorization system follows multiple tricks. First, this authorization system uses many caching layers. Leopard is known as the outermost cache layer, which is an indexing system made to send responses to the authorization checks quickly. After that, read requests are cached across those servers storing permissions. Moreover, calls between the available services inside Zanzibar will be cached.
By replicating data, this authorization system can move it closer to its physical access point. You can find the system working as a CDN. Google is maintaining several instances of Zanzibar throughout the globe.
This globally distributed authorization system depends on a few hand-tuning. A few fundamental permissions are used more than others in any authorization policy. Hand-tuning such hotspots is done by the Zanzibar team, for example, they have enabled cache prefetching to do so.
-
Scale:
Storing multiple access control rules is possible for it with the service’s cache and replication. Besides, it is able to handle a lot of requests per second.
Challenges Of Using Google Zanzibar:
Now you know how this system works and what advantages the company gained by creating such a system. Now, you should know about the drawbacks of the system.
If a graph based on this system is deployed, you will see this introducing a challenging system into your cloud environment, sometimes necessitating dependency on a hosted service. Such dependency is capable of instigating latency concerns. Later, it results in some scaling challenges.
-
The Centralized Graph Problem:
You can use a graph to query permissions. However, the graph needs to be centralized so that it can offer correct and reliable responses. As per the reports, the company provides the same replicas of 2 trillion record graphs worldwide. It can be done only when you own the internet, but for an average user, storing & maintaining several copies of such a large graph would not be possible.
A few Zanzibar implementations are capable of sharding the graph’s data as well as using event-based consistency updates in order to fix the issue and to ensure that the policy decisions are reliable.
The Bottom Line:
Sometimes, people consider this authorization system synonymous with fine-grained authorization. However, regarding handling attributes, its opinionated basic structure does not have granularity. ABAC is better for a few cases which require fine-grained ownership models, but with exception of those which need in-built implicit hierarchies. But for those hierarchies are a major part of the authorization system of the application, Zanzibar will be a good choice.