Kubeflow Chart Plugins
Deploy Kubeflow plugins in Alauda AI >= 2.0. Including:
- kfbase: Kubeflow Base components, including authentication and authorization, central dashboard, notebook, pvc-viewer, tensorboards, volumes, model registry ui, kserve endpoints ui, model catalog API service, etc.
- chart-kubeflow-model-registry: Kubeflow Model Registry instance (Helm Chart)
- kfp: Kubeflow Pipeline
- kftraining: Kubeflow Training Operator (deprecated)
- kubeflow-trainer: Kubeflow Training job management plugin, aka. Kubeflow Trainer v2 (replaces kftraining)
TOC
Environment PreparationConfigure oauth2-proxy PluginComponent OnboardingDeployment Steps1. Deploy kfbase (Kubeflow Base)2. Create Kubeflow User and Bind to Namespace3. If binding user to an already created namespace, you also need to complete the following configuration:4. Deploy kfp (Kubeflow Pipeline) and kftrainer (Kubeflow Training Operator)5. Deploy chart-kubeflow-model-registry (Kubeflow Model Registry)6. Deploy kubeflow-trainer (Kubeflow Trainer v2)Environment Preparation
- A running ACP environment
- Ensure Alauda AI has been deployed (requires Alauda AI version >= 2.0)
- Deploy ASM in the business cluster where Kubeflow is to be deployed (if ASM was not deployed in the previous step) (Supports ASM v1 for now. ASM v2 support is expected in the future)
- Deploy LWS (Alauda Build of LeaderWorkerSet) plugin, which is a dependency of Kubeflow Trainer v2.
- Configure the oauth2-proxy plugin as described below
Configure oauth2-proxy Plugin
Obtain the platform dex CA certificate for later use:
Then go to the global cluster or in acp Platform Management -> Resource Management update the ServiceMesh resource, add the following content under the spec field:
Note: If spec.values.pilot.jwksResolverExtraRootCA has already been configured, you can only configure spec.meshConfig.extensionProviders, and only add new ones, do not delete the original spec.meshConfig.extensionProviders
Component Onboarding
Obtain the installation packages for the following plugins and use the violet tool to complete onboarding.
- kfbase: Kubeflow Base functionality
- chart-kubeflow-model-registry: Kubeflow Model Registry
- kfp: Kubeflow Pipeline functionality
- kftraining: Kubeflow Training Operator (deprecated)
- kubeflow-trainer: Kubeflow Training job management plugin (replaces kftraining)
Note: For the kftraining plugin, if you want to enable volcano scheduler support, you need to deploy volcano first then deploy kftraining.
Deployment Steps
1. Deploy kfbase (Kubeflow Base)
In Cluster Plugins, find the kfbase (Kubeflow Base) plugin, fill in the configuration according to the page prompts, and wait for the component deployment to complete.
After deployment, you need to perform the following operations to configure dex redirection:
In Administrator - Clusters - Resources, select Global cluster,
find the ConfigMap resource in the cpaas-system namespace, and click the
edit button to add the following configuration under redirectURIs:
Note: note: the redirect host and port must be the same with
oidcRedirectURLconfigured when installing the "Kubeflow Base" plugin.
After deployment, you can find the Kubeflow menu item under the Advanced navigation in AML. Click to enter the Kubeflow interface.
2. Create Kubeflow User and Bind to Namespace
Before the first login to Kubeflow, you need to bind the ACP user to the namespace. Users can see the following example, create namespace kubeflow-admin-cpaas-io and bind user admin@cpaas.io as its owner.
Note: If this Profile resource was already deployed during AML deployment, you can skip this step
Note: You may need to lower the Pod Security Admission level of the user namespace to create Notebook instances, etc.
3. If binding user to an already created namespace, you also need to complete the following configuration:
If in the previous step, AML has been deployed, and the kubeflow-admin-cpaas-io namespace has been created, the Profile resource has also been created, but still cannot select the namespace, you can refer to the following resource to create the account's role binding.
4. Deploy kfp (Kubeflow Pipeline) and kftrainer (Kubeflow Training Operator)
As above, in Cluster Plugins, find kfp (Kubeflow Pipeline) and kftrainer (Kubeflow Training Operator).
Note: After Kubeflow Pipeline deployment, Pipeline related functions can be used in the Kubeflow interface. Note: Kubeflow Training Operator is a background task scheduler and will not appear in the UI menu and functions.
5. Deploy chart-kubeflow-model-registry (Kubeflow Model Registry)
In Catalog or Administrator - Marketplace - Chart Repositories, find chart-kubeflow-model-registry, click the "Create" button, fill in the deployment name, project, namespace (example deployment location), Chart Version, then copy the values.yaml configuration information from the right to the left, modify the following content according to the cluster information:
Note: Must install in a namespace that has already been bound to a Kubeflow user Profile, otherwise the Model Registry UI will not be displayed
- global.registry.address: The image registry address used by the current platform
- mysqlStorageClass: The mysql storage class used by Model Registry. Needs to be a storage class supported by the target deployment cluster.
- mysqlStorageSize: The mysql storage size used by Model Registry.
- mysqlDataBase: Database name (will be created automatically).
- modelRegistryDisplayName: The name of the Model Registry instance to be deployed
- modelRegistryDescription: Brief description of the Model Registry instance to be deployed
Note: After the Model Registry instance starts, refresh the Model Registry menu in the left navigation of the Kubeflow page to see the instance deployed in the above steps. Before deploying the first instance, the Kubeflow Model Registry interface will display empty.
Note: The Model Registry instance will restrict network requests from non-current namespaces. If you need to allow more namespaces to access, you need to manually modify kubectl -n <your-namespace> edit authorizationpolicy model-registry-service and according to the istio documentation, add the namespaces that are allowed to access.
Note: You can install multiple Model Registry instances in different namespaces, each instance is independent of each other.
6. Deploy kubeflow-trainer (Kubeflow Trainer v2)
Note: You need to uninstall kftraining (Kubeflow Training Operator) before deploying kubeflow-trainer, if you have already deployed kftraining.
Note: make sure to install LWS (leader worker set) plugin before deploying kubeflow-trainer, as LWS is a dependency of kubeflow-trainer.
In Cluster Plugins, find kubeflow-trainer (Kubeflow Trainer v2),
click the "Install" button, select the options of whether to enable JobSet
and click the "Install" button to complete the deployment.