OpenStack Keystone in Rust
What happens if OpenStack Keystone would be rewritten in Rust? Is it possible? How complex is it? Which improvements are possible?
This project exists to answer this questions.
Primary target of the project is to implement a Rust library implementing Keystone functionality to be able to split a Keystone monolith into smaller pieces (similar to the microservices). Once this is done, adding an API is becoming also pretty simple.
It targets deploying Python and Rust implementation in parallel and do request routing on the web server level to get the speed and security of Rust implementation while keeping functions not implemented (yet) being served by the original Keystone. This approach also makes it possible to deploy Rust implementation in parallel to a much older version of Keystone giving possibility for the operators to enable new features while still using older version of Keystone (whatever the reason for that is).
Compatibility
Highest priority is to ensure that this implementation is compatible with the original python Keystone: authentication issued by Rust implementation is accepted by the Python Keystone and vice versa. At the same time it is expected, that the new implementation may implement new features not supported by the Python implementation. In this case, it is still expected that such features do not break authentication flows. It must be possible to deploy Python and Rust implementation in parallel and do request routing on the web server level.
Database
Adding new features most certanly require having database changes. It is not expected that such changes interfere with the Python implementation to ensure it is working correctly.
API
Also here it is expected that new API resources are going to be added. As above it is not expected that such changes interfere with the Python implementation to ensure it is still working correctly and existing clients will not break.
Installation
The easiest way to get started with the keystone-ng is using the container image. It is also possible to use the compiled version. It can be either compiled locally or downloaded from the project artifacts.
Using pre-compiled binaries
As of the moment of writing there were no releases. Due to that there are no pre-compiled binaries available yet. Every release of the project would include the pre-compiled binaries for a variety of platforms.
Compiling
In order to compile the keystone-ng it is necessary to have the rust compiler
available. It may be installed from the system packages or using the
rustup.rs
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Afterwards in the root of the project source tree following command may be
executed to invoke the cargo
cargo build --release
It produces 2 binaries:
-
target/release/keystone (the api server)
-
target/release/keystone-db (the database management tool)
Currently keystone depends on the openssl (through one of the dependencies). Depending on the environment it may be a statically linked or dynamically. There are signals that that may be not necessary anymore once all dependencies transition to the use of rustls.
Using containers
It is possible to run Keystone-ng inside containers. A sample Dockerfile is
present in the project source tree to build container image with the Keystone
and the keystone-db
utility. When no ready image is available it can be build
like that:
docker build . -t keystone:rust
Since keystone itself communicates with the database and OpenPolicyAgent those
must be provided separately. docker-compose.yaml
demonstrates how this can be
done.
docker run -v /etc/keystone/:/etc/keystone -p 8080:8080 ghcr.io/gtema/keystone:main -v /etc/keystone/keystone.conf
Database migrations
Rust Keystone is using different ORM and implements migration that co-exist together with alembic migrations of the python Keystone. It also ONLY manages the database schema additions and does NOT include the original database schema. Therefore it is necessary to apply both migrations.
keystone-db -u <DB_URL>
It is important to also understand that the DB_URL may differ between python and rust due to the optional presence of the preferred database driver in the url. keystone-ng will ignore the the driver in the application itself, but the migration may require user to manually remove it since it is being processed by the ORM itself and not by the keystone-ng code.
OpenPolicyAgent
keystone-ng relies on the OPA for policy enforcement. Default policies are provided with the project and can be passed directly to the OPA process or compilied into the bundle.
opa run -s policies
NOTE: by default OPA process listens on the localhost only what lead to
unavailability to expose it between containers. Please use -a 0.0.0.0:8181
to
start listening on all interfaces.
Architecture
Keystone requires 2 additional components to run:
-
database (the same as the py-keystone uses)
-
OpenPolicyAgent, that implements API policy enforcement
architecture-beta service db(database)[Database] service keystone(server)[Keystone] service opa(server)[OpenPolicyAgent] db:L -- R:keystone opa:L -- T:keystone
Database
Python keystone uses the sqlalchemy as ORM and the migration tool. It cannot be
used from Rust efficiently, therefore keystone-ng uses the sea-orm
which
provides async support natively and also allows database type abstraction.
Current development focuses on the PostgreSQL database. The MySQL should be
supported, but is not currently tested against.
New API and resources are being added. This requires database changes. sea-orm
also comed with the migration tools. However there is a slight difference
between sqlalchemy and sea-orm. The later suggests doing database schema first.
In the next step object types are created out of the database. That means that
the database migration must be written first and cannot be automatically
generated from the code (easily, but there is a way). Current migrations do not
create database schema that is managed by the py-keystone. Therefore in order
to get a fully populated database schema it is necessary to apply
keystone-manage db_sync
and keystone-db up
independently.
Target of the keystone-ng is to be deployed in pair with the python keystone of "any" version. Due to that it is not possible to assume the state of the database, nor to apply any changes to the schema manaaged by the py-keystone. A federation rework assumes model change. To keep it working with the python-keystone artificial table entries may be created (in the example when a new identity provider is being created automatically sanitized entries are being added for the legacy identity provider and necessary protocols) A federation rework assumes model change. To keep it working with the python-keystone artificial table entries may be created (in the example when a new identity provider is being created automatically sanitized entries are being added for the legacy identity provider together with necessary idp protocols).
Fernet
keystone-ng uses the same mechanism for tokens to provide compatibility. The fernet-keys repository must be provided in the runtime (i.e. by mounting them as a volume into the container). There is no tooling to create or rotate keys as the py-keystone does.
API policy enforcement
API policy is implemented using the Open Policy Agent
(OPA). It is a very powerful tool and allows
implementing policies much more complex than what the oslo.policy
would ever
allow. The policy
folder contain default policies. They can be overloaded by
the deployment.
OPA can be integrated into Keystone in 2 ways:
-
HTTP. This is a default and recommended way of integrating applications with the OPA. Usually the OPA process is started as a side car container to keep network latencies as low as possible. Policies themselves are bundled into the container which OPA process is capable of downloading and even periodically refreshing. It can be started as
opa run -s --log-level debug tools/opa-config.yaml
. Alternatively the OPA process can itself run in the container in which case the configuration file should be mounted as a volume and referred from the entrypoint. -
WASM. Policies can be built into a WASM binary module. This method does not support feeding additional data and dynamic policy reload as of now. Unfortunately there is also a memory access violation error in the
wasmtime
crate happening for the big policy files. The investigation is in progress, so it is preferred not to rely on this method anyway. While running OPA as a WASM eliminates any networking communication, it heavily reduces feature set. In particular hot policy reload, decision logging, external calls done by the policies themselves are not possible by design. Using this way of policy enforcement requireswasm
feature enabled.
All the policies currently are using the same policy names and definitions as the original Keystone to keep the deviation as less as possible. For the newly added APIs this is not anymore the case.
With the Open Policy Agent it is not only possible to define a decision (allowed or forbidden), but also to produce additional information describing i.e. reason of the request refusal. This is currently being used by the policies by defining an array of "violation" objects explaining missing permissions.
Sample policy for updating the federated IDP mapping:
package identity.mapping_update
# update mapping.
default allow := false
allow if {
"admin" in input.credentials.roles
}
allow if {
own_mapping
"manager" in input.credentials.roles
}
own_mapping if {
input.target.domain_id != null
input.target.domain_id == input.credentials.domain_id
}
violation contains {"field": "domain_id", "msg": "updating mapping for other domain requires `admin` role."} if {
identity.foreign_mapping
not "admin" in input.credentials.roles
}
violation contains {"field": "role", "msg": "updating global mapping requires `admin` role."} if {
identity.global_mapping
not "admin" in input.credentials.roles
}
violation contains {"field": "role", "msg": "updating mapping requires `manager` role."} if {
identity.own_mapping
not "member" in input.credentials.roles
}
As can be guessed such policy would permit the API request when admin
role is
present in the current credentials roles or the mapping in scope is owned by
the domain the user is currently scoped to with the manager
role.`
List operation
All query parameters are passed into the policy engine to be provide capability
of making decision based on the parameters passed. For example an admin user
may specify domain_id
parameter when the current authentication scope is not
matching the given domain_id
or a user with the manager
role being able to
list shared federated identity providers.
Policy is being evaluated before the real data is being fetched from the backend.
Show operation
Policy evaluation for GET operations on the resource are executed with the requested entity in the scope. This allows policy to deny the operation if the user requested resource it is should not have access to. This means that 404 error may be raised before the validation of whether the user is allowed to perform such operations.
Create operation
Resource creation operation would pass the whole object to be created in the context to the policy enforcement engine.
Update operation
For the update operation the context contain the current state of the resource and the new one. This allows defining policies preventing resource update upon certain conditions (i.e. when tag "locked" is added).
Delete operation
Resource deletion also passes the current resource state in the context to allow comprehensive logic.
Federation support
Python Keystone is not implementing the Federation natively (neither SAML2, nor OIDC). It relies on the proxy server for the authentication protocol specifics and tries to map resulting users into the local database. This leads to a pretty big number of limitations (not limited to):
-
Identity Provider can be only configured by cloud administrators only
-
Pretty much any change on the IdP configuration require restart of the service
-
Certain protocol specifics can not be implemented at all (i.e. backend initiated logout)
-
Forces deployment of the proxy service in front of Keystone relying on the modules for SAML2 and/or OIDC implementation (such modules may be abandoned or removed).
-
Client authentication right now is complex and error prone (every public provider has implementation specifics that are often even not cross-compatible)
In order to address those challenges a complete reimplementation is being done with a different design. This allows implementing features not technically possible in the py-keystone:
-
Federation is controlled on the domain level by the domain managers. This means that the domain manager is responsible for the configuration of how users should be federated from external IdPs.
-
Identity providers and/or attribute mappings can be reused by different domains allowing implementing social logins.
-
Keystone serves as a relying party in the OIDC authentication flow. It decreases amount of different flows to the minimum making client applications much simpler and more reliable.
API changes
A series of brand new API endpoints have been added to the Keystone API.
-
/v4/federation/identity_providers (manage the identity providers)
-
/v4/federation/mappings (manage the mappings tied to the identity provider)
-
/v4/federation/auth (initiate the authentication and get the IdP url)
-
/v4/federation/oidc/callback (exchange the authorization code for the Keystone token)
-
/v4/federation/identity_providers/{idp_id}/jwt (exchange the JWT token issued by the referred IdP for the Keystone token)
DB changes
Following tables are added:
- federated_identity_provider
#![allow(unused)] fn main() { //! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.7 use sea_orm::entity::prelude::*; #[derive(Clone, Debug, PartialEq, DeriveEntityModel, Eq)] #[sea_orm(table_name = "federated_identity_provider")] pub struct Model { #[sea_orm(primary_key)] pub id: String, pub name: String, pub domain_id: Option<String>, pub oidc_discovery_url: Option<String>, pub oidc_client_id: Option<String>, pub oidc_client_secret: Option<String>, pub oidc_response_mode: Option<String>, pub oidc_response_types: Option<String>, pub jwks_url: Option<String>, #[sea_orm(column_type = "Text", nullable)] pub jwt_validation_pubkeys: Option<String>, pub bound_issuer: Option<String>, pub default_mapping_name: Option<String>, pub provider_config: Option<Json>, } #[derive(Copy, Clone, Debug, EnumIter, DeriveRelation)] pub enum Relation { #[sea_orm( belongs_to = "super::project::Entity", from = "(Column::DomainId, Column::DomainId, Column::DomainId, Column::DomainId)", to = "(super::project::Column::Id, super::project::Column::Id, super::project::Column::Id, super::project::Column::Id)", on_update = "NoAction", on_delete = "Cascade" )] Project, } impl Related<super::project::Entity> for Entity { fn to() -> RelationDef { Relation::Project.def() } } impl ActiveModelBehavior for ActiveModel {} }
- federated_mapping
#![allow(unused)] fn main() { #[sea_orm(primary_key, auto_increment = false)] pub id: String, pub name: String, pub idp_id: String, pub domain_id: Option<String>, pub r#type: MappingType, pub allowed_redirect_uris: Option<String>, pub user_id_claim: String, pub user_name_claim: String, pub domain_id_claim: Option<String>, pub groups_claim: Option<String>, pub bound_audiences: Option<String>, pub bound_subject: Option<String>, pub bound_claims: Option<Json>, pub oidc_scopes: Option<String>, pub token_user_id: Option<String>, }
- federated_auth_state
#![allow(unused)] fn main() { pub idp_id: String, pub mapping_id: String, #[sea_orm(primary_key, auto_increment = false)] pub state: String, pub nonce: String, pub redirect_uri: String, pub pkce_verifier: String, pub expires_at: DateTime, pub requested_scope: Option<Json>, }
Compatibility notes
Since the federation is implemented very differently to how it was done before it certain compatibility steps are implemented:
-
Identity provider is "mirrored" into the existing identity_provider with the subset of attributes
-
For every identity provider "oidc" protocol entry in the federation_protocol table is created pointing to the "<
>" mapping.
Testing
Federation is very complex and need to be tested with every supported public provider. Only this can guarantee that issues with not fully compliant OIDC implementations can be identified early enough.
Authorization code flow requires presence of the browser. Due to that the tests need to rely on Selenium.
At the moment following integrations are tested automatically:
- Keycloak (login using browser)
- Keycloak (login with JWT)
- GitHub (workload federation with JWT)
Authentication using the Authorization Code flow and Keystone serving as RP
sequenceDiagram Actor Human Human ->> Cli: Initiate auth Cli ->> Keystone: Fetch the OP auth url Keystone --> Keystone: Initialize authorization request Keystone ->> Cli: Returns authURL of the IdP with cli as redirect_uri Cli ->> User-Agent: Go to authURL User-Agent -->> IdP: opens authURL IdP -->> User-Agent: Ask for consent Human -->> User-Agent: give consent User-Agent -->> IdP: Proceed IdP ->> Cli: callback with Authorization code Cli ->> Keystone: Exchange Authorization code for Keystone token Keystone ->> IdP: Exchange Authorization code for Access token IdP ->> Keystone: Return Access token Keystone ->> Cli: return Keystone token Cli ->> Human: Authorized
TLDR
The user client (cli) sends authentication request to Keystone specifying the identity provider, the preferred attribute mapping and optionally the scope (no credentials in the request). In the response the user client receives the time limited URL of the IDP that the user must open in the browser. When authentication in the browser is completed the user is redirected to the callback that the user also sent in the initial request (most likely on the localhost). User client is catching this callback containing the OIDC authorization code. Afterwards this code is being sent to the Keystone together with the authentication state and the user receives regular scoped or unscoped Keystone token.
Authenticating with the JWT
It is possible to authenticate with the JWT token issued by the federated IdP. More precisely it is possible to exchange a valid JWT for the Keystone token. There are few different use scenarios that are covered.
Since the JWT was issued without any knowledge of the Keystone scopes it becomes hard to control scope. In the case of real human login the Keystone may issue unscoped token allowing user to further rescope it. In the case of the workflow federation that introduces a potential security vulnerability. As such in this scenario the attribute mapping is responsible to fix the scope.
Login request looks following:
curl https://keystone/v4/federation/identity_providers/${IDP}/jwt -X POST -H "Authorization: bearer ${JWT}" -H "openstack-mapping: ${MAPPING_NAME}"
Regular user obtains JWT (ID token) at the IdP and presents it to Keystone
In this scenario a real user (human) is obtaining the valid JWT from the IDP using any available method without any communication with Keystone. This may use authorization code grant, password grant, device grant or any other enabled method. This JWT is then presented to the Keystone and an explicitly requested attribute mapping converts the JWT claims to the Keystone internal representation after verifying the JWT signature, expiration and further restricted bound claims.
Workload federation
Automated workflows (Zuul job, GitHub workflows, GitLab CI, etc) are typical workloads not being bound to any specific user and are more regularly considered being triggered by certain services. Such workflows are usually in possession of a JWT token issued by the service owned IdP. Keystone allows exchange of such tokens to the regular Keystone token after validating token issuer signature, expiration and applying the configured attribute mapping. Since in such case there is no real human the mapping also need to be configured slightly different.
-
It is strongly advised the attribute mapping must fill
token_user_id
,token_project_id
(and soontoken_role_ids
). This allows strong control of which technical account (soon a concept of service accounts will be introduced in Keystone) is being used and which project such request can access. -
Attribute mapping should use
bound_audiences
,bound_claims
,bound_subject
, etc to control the tokens issued by which workflows are allowed to access OpenStack resources.
GitHub workflow federation
In order for the GitHub workflow to be able to access OpenStack resources it is
necessary to register GitHub as a federated IdP and establish a corresponding
attribute mapping of the jwt
type.
IdP:
"identity_provider": {
"name": "github",
"bound_issuer": "https://token.actions.githubusercontent.com",
"jwks_url": "https://token.actions.githubusercontent.com/.well-known/jwks"
}
Mapping:
"mapping": {
"type": "jwt",
"name": "gtema_keystone_main",
"idp_id": <IDP_ID>,
"domain_id": <DOMAIN_ID>,
"bound_audiences": ["https://github.com"],
"bound_subject": "repo:gtema/keystone:pull_request",
"bound_claims": {
"base_ref": "main"
},
"user_id_claim": "actor_id",
"user_name_claim": "actor",
"token_user_id": <UID>
}
TODO: add more claims according to docs
A way for the workflow to obtain the JWT is described here.
...
permissions:
token: write
contents: read
job:
...
- name: Get GitHub JWT token
id: get_token
run: |
TOKEN_JSON=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"$ACTIONS_ID_TOKEN_REQUEST_URL&audience=https://github.com")
TOKEN=$(echo $TOKEN_JSON | jq -r .value)
echo "token=$TOKEN" >> $GITHUB_OUTPUT
...
# TODO: build a proper command for capturing the actual token and/or write a dedicated action for that.
- name: Exchange GitHub JWT for Keystone token
run: |
KEYSTONE_TOKEN=$(curl -H "Authorization: bearer ${{ steps.get_token.outputs.token }}" -H "openstack-mapping: gtmema_keystone_main" https://keystone_url/v4/federation/identity_providers/IDP/jwt)
PassKey (WebAuthN)
A new way of authentication using Security Device (a passkey type) is being added to allow authenticating the user more securely.
Important thing to be mentioned is that Operating System Passkeys (Apple keychain passkey, Google passkey, Microsoft ???) require browser to be running. This makes them unsuitable for the remote access. It is possible to implement client authentication similar to the OIDC login which also requires browser, but it is not implemented now. Therefore only authentication with bare security device (Yubikey or similar) is implemented.
Authenticate with Security Device
sequenceDiagram participant Authenticator Client->>Server: Authentication request Server->>Client: Challenge to be signed Client->>Authenticator: Challenge Authenticator->>+Authenticator: Sign with the private key and verify user presence Authenticator->>Client: Signed Challenge Client->>Server: Signed Challenge Server->>Server: Verify signature Server->>Client: Token
API changes
Few dedicated API resources are added controlling the necessary aspects:
-
/users/{user_id}/passkeys/register_start (initialize registering of the security device of the user)
-
/users/{user_id}/passkeys/register_finish (complete the security key registration)
-
/users/{user_id}/passkeys/login_start (initialize login of the security device of the user)
-
/users/{user_id}/passkeys/login_finish (complete the security key login)
DB changes
Following DB tables are added:
- webauthn_credential
#![allow(unused)] fn main() { pub id: i32, pub user_id: String, pub credential_id: String, pub passkey: String, pub r#type: String, pub aaguid: Option<String>, pub created_at: DateTime, pub last_used_at: Option<DateTime>, pub last_updated_at: Option<DateTime>, }
- webauthn_state
#![allow(unused)] fn main() { pub user_id: String, pub state: String, pub r#type: String, pub created_at: DateTime, }
Performance comparison
TODO