Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OpenStack Keystone in Rust

What happens if OpenStack Keystone would be rewritten in Rust? Is it possible? How complex is it? Which improvements are possible?

This project exists to answer this questions.

Primary target of the project is to implement a Rust library implementing Keystone functionality to be able to split a Keystone monolith into smaller pieces (similar to the microservices). Once this is done, adding an API is becoming also pretty simple.

It targets deploying Python and Rust implementation in parallel and do request routing on the web server level to get the speed and security of Rust implementation while keeping functions not implemented (yet) being served by the original Keystone. This approach also makes it possible to deploy Rust implementation in parallel to a much older version of Keystone giving possibility for the operators to enable new features while still using older version of Keystone (whatever the reason for that is).

Compatibility

Highest priority is to ensure that this implementation is compatible with the original python Keystone: authentication issued by Rust implementation is accepted by the Python Keystone and vice versa. At the same time it is expected, that the new implementation may implement new features not supported by the Python implementation. In this case, it is still expected that such features do not break authentication flows. It must be possible to deploy Python and Rust implementation in parallel and do request routing on the web server level.

Database

Adding new features most certanly require having database changes. It is not expected that such changes interfere with the Python implementation to ensure it is working correctly.

API

Also here it is expected that new API resources are going to be added. As above it is not expected that such changes interfere with the Python implementation to ensure it is still working correctly and existing clients will not break.

Installation

TODO:

  • Prepare the binary (download from GH releases, build yourself, use the container image, ...)

  • Perform the DB migration keystone-db up

  • Start the binary as keystone -c <PATH_TO_THE_KEYSTONE_CONFIG>

Database migrations

Rust Keystone is using different ORM and implements migration that co-exist together with alembic migrations of the python Keystone. It also ONLY manages the database schema additions and does NOT include the original database schema. Therefore it is necessary to apply both migrations.

API policy enforcement

API policy is implemented using the Open Policy Agent (OPA). It is a very powerful tool and allows implementing policies much more complex than what the oslo.policy would ever allow. The policy folder contain default policies. They can be overloaded by the deployment.

OPA can be integrated into Keystone in 2 ways:

  • HTTP. This is a default and recommended way of integrating applications with the OPA. Usually the OPA process is started as a side car container to keep network latencies as low as possible. Policies themselves are bundled into the container which OPA process is capable of downloading and even periodically refreshing. It can be started as opa run -s --log-level debug tools/opa-config.yaml. Alternatively the OPA process can itself run in the container in which case the configuration file should be mounted as a volume and referred from the entrypoint.

  • WASM. Policies can be built into a WASM binary module. This method does not support feeding additional data and dynamic policy reload as of now. Unfortunately there is also a memory access violation error in the wasmtime crate happening for the big policy files. The investigation is in progress, so it is preferred not to rely on this method anyway. While running OPA as a WASM eliminates any networking communication, it heavily reduces feature set. In particular hot policy reload, decision logging, external calls done by the policies themselves are not possible by design. Using this way of policy enforcement requires wasm feature enabled.

All the policies currently are using the same policy names and definitions as the original Keystone to keep the deviation as less as possible. For the newly added APIs this is not anymore the case.

With the Open Policy Agent it is not only possible to define a decision (allowed or forbidden), but also to produce additional information describing i.e. reason of the request refusal. This is currently being used by the policies by defining an array of "violation" objects explaining missing permissions.

Sample policy for updating the federated IDP mapping:

package identity.mapping_update

# update mapping.

default allow := false

allow if {
	"admin" in input.credentials.roles
}

allow if {
	own_mapping
	"manager" in input.credentials.roles
}

own_mapping if {
	input.target.domain_id != null
	input.target.domain_id == input.credentials.domain_id
}

violation contains {"field": "domain_id", "msg": "updating mapping for other domain requires `admin` role."} if {
	identity.foreign_mapping
	not "admin" in input.credentials.roles
}

violation contains {"field": "role", "msg": "updating global mapping requires `admin` role."} if {
	identity.global_mapping
	not "admin" in input.credentials.roles
}

violation contains {"field": "role", "msg": "updating mapping requires `manager` role."} if {
	identity.own_mapping
	not "member" in input.credentials.roles
}

As can be guessed such policy would permit the API request when admin role is present in the current credentials roles or the mapping in scope is owned by the domain the user is currently scoped to with the manager role.`

Additional improvement from the legacy Keystone is the time and data when the policies are evaluated. For list operation policy input is populated with the credentials and all query parameters. For show operation the input additionally contain the target object previously fetched so that the policy can additionally consider current resource attributes. create operation also gets the complete input. update operation first fetch the target resource and pass it as the target, while the updated properties are passed as the "update" object into the policy. The delete operation also fetches the to be deleted object passing it into the policy. This approach allow advanced cases where operations may need to be prohibited by certain resource attributes.

Federation support

Python Keystone is not implementing the Federation natively (neither SAML2, nor OIDC). It relies on the proxy server for the authentication protocol specifics and tries to map resulting users into the local database. This leads to a pretty big number of limitations (not limited to):

  • Identity Provider can be only configured by cloud administrators only

  • Pretty much any change on the IdP configuration require restart of the service

  • Certain protocol specifics can not be implemented at all (i.e. backend initiated logout)

  • Forces deployment of the proxy service in front of Keystone relying on the modules for SAML2 and/or OIDC implementation (such modules may be abandoned or removed).

  • Client authentication right now is complex and error prone (every public provider has implementation specifics that are often even not cross-compatible)

In order to address those challenges and complete reimplementation is being done here. This leads to a completely different design opening doors for new features.

  • Federation is controlled on the domain level by the domain managers. This means that the domain manager is responsible for the configuration of how users should be federated from external IdPs.

  • Keystone serves as a relying party in the OIDC authentication flow. This moves the complex logic from client to the the Keystone side. This allows making client applications much simpler and more reliable.

Authentication using the Authorization Code flow and Keystone serving as RP

sequenceDiagram

    Actor Human
    Human ->> Cli: Initiate auth
    Cli ->> Keystone: Fetch the OP auth url
    Keystone --> Keystone: Initialize authorization request
    Keystone ->> Cli: Returns authURL of the IdP with cli as redirect_uri
    Cli ->> User-Agent: Go to authURL
    User-Agent -->> IdP: opens authURL
    IdP -->> User-Agent: Ask for consent
    Human -->> User-Agent: give consent
    User-Agent -->> IdP: Proceed
    IdP ->> Cli: callback with Authorization code
    Cli ->> Keystone: Exchange Authorization code for Keystone token
    Keystone ->> IdP: Exchange Authorization code for Access token
    IdP ->> Keystone: Return Access token
    Keystone ->> Cli: return Keystone token
    Cli ->> Human: Authorized

Authenticating with the JWT

This is a work in progress and is not implemented yet

API changes

A series of brand new API endpoints have been added to the Keystone API.

  • /v3/federation/identity_providers (manage the identity providers)

  • /v3/federation/mappings (manage the mappings tied to the identity provider)

  • /v3/federation/auth (initiate the authentication and get the IdP url)

  • /v3/federation/oidc/callback (exchange the authorization code for the Keystone token)

DB changes

Following tables are added:

  • federated_identity_provider
#![allow(unused)]
fn main() {
//! `SeaORM` Entity, @generated by sea-orm-codegen 1.1.7

use sea_orm::entity::prelude::*;

#[derive(Clone, Debug, PartialEq, DeriveEntityModel, Eq)]
#[sea_orm(table_name = "federated_identity_provider")]
pub struct Model {
    #[sea_orm(primary_key)]
    pub id: String,
    pub name: String,
    pub domain_id: Option<String>,
    pub oidc_discovery_url: Option<String>,
    pub oidc_client_id: Option<String>,
    pub oidc_client_secret: Option<String>,
    pub oidc_response_mode: Option<String>,
    pub oidc_response_types: Option<String>,
    #[sea_orm(column_type = "Text", nullable)]
    pub jwt_validation_pubkeys: Option<String>,
    pub bound_issuer: Option<String>,
    pub default_mapping_name: Option<String>,
    pub provider_config: Option<Json>,
}

#[derive(Copy, Clone, Debug, EnumIter, DeriveRelation)]
pub enum Relation {
    #[sea_orm(
        belongs_to = "super::project::Entity",
        from = "(Column::DomainId, Column::DomainId, Column::DomainId, Column::DomainId)",
        to = "(super::project::Column::Id, super::project::Column::Id, super::project::Column::Id, super::project::Column::Id)",
        on_update = "NoAction",
        on_delete = "Cascade"
    )]
    Project,
}

impl Related<super::project::Entity> for Entity {
    fn to() -> RelationDef {
        Relation::Project.def()
    }
}

impl ActiveModelBehavior for ActiveModel {}
}
  • federated_mapping
#![allow(unused)]
fn main() {
    pub id: String,
    pub name: String,
    pub idp_id: String,
    pub domain_id: Option<String>,
    pub allowed_redirect_uris: Option<String>,
    pub user_id_claim: String,
    pub user_name_claim: String,
    pub domain_id_claim: Option<String>,
    pub groups_claim: Option<String>,
    pub bound_audiences: Option<String>,
    pub bound_subject: Option<String>,
    pub bound_claims: Option<Json>,
    pub oidc_scopes: Option<String>,
    pub token_user_id: Option<String>,
    pub token_role_ids: Option<String>,
    pub token_project_id: Option<String>,
}
  • federated_auth_state
#![allow(unused)]
fn main() {
    pub idp_id: String,
    pub mapping_id: String,
    #[sea_orm(primary_key, auto_increment = false)]
    pub state: String,
    pub nonce: String,
    pub redirect_uri: String,
    pub pkce_verifier: String,
    pub expires_at: DateTime,
    pub requested_scope: Option<Json>,
}

Compatibility notes

Since the federation is implemented very differently to how it was done before it certain compatibility steps are implemented:

  • Identity provider is "mirrored" into the existing identity_provider with the subset of attributes

  • For every identity provider "oidc" protocol entry in the federation_protocol table is created pointing to the "<>" mapping.

Testing

Federation is very complex and need to be tested with every supported public provider. Only this can guarantee that issues with not fully compliant OIDC implementations can be identified early enough.

Authorization code flow requires presence of the browser. Due to that the tests need to rely on Selenium.

At the moment following integrations are tested automatically:

  • Keycloak (login using browser)

PassKey (WebAuthN)

A new way of authentication using Security Device (a passkey type) is being added to allow authenticating the user more securely.

Important thing to be mentioned is that Operating System Passkeys (Apple keychain passkey, Google passkey, Microsoft ???) require browser to be running. This makes them unsuitable for the remote access. It is possible to implement client authentication similar to the OIDC login which also requires browser, but it is not implemented now. Therefore only authentication with bare security device (Yubikey or similar) is implemented.

Authenticate with Security Device

sequenceDiagram

    participant Authenticator
    Client->>Server: Authentication request
    Server->>Client: Challenge to be signed
    Client->>Authenticator: Challenge
    Authenticator->>+Authenticator: Sign with the private key and verify user presence
    Authenticator->>Client: Signed Challenge
    Client->>Server: Signed Challenge
    Server->>Server: Verify signature
    Server->>Client: Token

API changes

Few dedicated API resources are added controlling the necessary aspects:

  • /users/{user_id}/passkeys/register_start (initialize registering of the security device of the user)

  • /users/{user_id}/passkeys/register_finish (complete the security key registration)

  • /users/{user_id}/passkeys/login_start (initialize login of the security device of the user)

  • /users/{user_id}/passkeys/login_finish (complete the security key login)

DB changes

Following DB tables are added:

  • webauthn_credential
#![allow(unused)]
fn main() {
    pub id: i32,
    pub user_id: String,
    pub credential_id: String,
    pub passkey: String,
    pub r#type: String,
    pub aaguid: Option<String>,
    pub created_at: DateTime,
    pub last_used_at: Option<DateTime>,
    pub last_updated_at: Option<DateTime>,
}
  • webauthn_state
#![allow(unused)]
fn main() {
    pub user_id: String,
    pub state: String,
    pub r#type: String,
    pub created_at: DateTime,
}

Performance comparison

TODO