Data Orchestration Hub (Preview)

Slack Docker Pulls

Data Orchestration Hub, or the Hub, is a management service that makes it easy to manage multiple clusters and connect with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.

  • Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
  • Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
  • Manage Configuration: Set and distribute configuration for a cluster.

When to Use

Data Orchestration Hub agents are co-deployed on the cluster running Alluxio. The Hub can connect to multiple Alluxio clusters across environments. Further instructions can be found in the deployment section below.

Once connected to an Alluxio cluster, the Hub can be used to modify the state of the Alluxio cluster, such as updating configuration and restarting processes. The following scenarios illustrate usage of the Hub web interface.

Scenario A: Managing an Alluxio cluster

The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes.

hub_cluster_page

Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.

Scenario B: Connecting to data sources across regions

Alluxio is used to connect a compute cluster with data sources across private data-centers and public clouds potentially over a wide area network. The Hub uses a self-guided wizard based approach to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.

These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments.

hub_data_storage

Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.

Further usage scenarios and descriptions for the available toolset can be found by following this section below.

Deployment

The Hub consists of the following components deployed on your Alluxio cluster.

  • Hub Manager: The Hub Manager serves requests to Alluxio processes via Hub Agents. The Hub Manager is used to register and communicate with the hosted Hub UI. To gain access to the Hub UI, please contact product@alluxio.com. This is a process that runs on the same node as a Alluxio Master by default, and provides the REST endpoints to serve UI requests. When using multiple Alluxio masters, any node can be chosen to deploy the Hub Manager.
  • Hub Agent: The Hub agents are deployed on both Alluxio Masters and Alluxio Workers. These agent processes serve requests from the Hub Manager to make changes to the cluster without SSH access.

The following diagram illustrates the Hub architecture:

hub_architecture

Hub Agents must be present on all managed nodes whereas the Hub Manager is a single instance.

Choose your compute environment to see how to deploy Data Orchestration Hub.


To get more information and get access to the Hub, please contact [product@alluxio.com](mailto:product@alluxio.com).

Getting Started

The Hub web interface is a hosted service that gives users a single access point to connect and interact with their Alluxio clusters via Hub Manager/Agents.

Generating an API key

An API/secret key pair is required to authenticate the Hub Manager with the Hosted Hub. Before you start the Hub Manager and Hub Agents, you will need to access the Hub UI and generate an API/secret key pair. Once generated, you must set alluxio.hub.authentication.apiKey amd alluxio.hub.authentication.secretKey in the Hub Manager’s alluxio-site.properties.

hub_api_keys_page

Click on the "New API Key" button and follow the prompts to generate an API and secret keypair.

Starting the Hub

To start the Hub Manager on the primary master node only:

$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_manager

To start the Hub Agents on all nodes, execute the following on each node:

$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_agent

Stopping the Hub

In order to stop the Hub, execute the following on the node the Hub manager was started:

$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_manager
$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent

Execute the following on all nodes:

$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent

Configuration

For a complete list of properties applicable to the Hub, please search for properties prefixed with alluxio.hub on this page.

Note:

  • alluxio.hub.hosted.rpc.hostname (required) specifies the address of the Hosted Hub for the Hub manager to register with.
  • alluxio.hub.authentication.apiKey (required) is a required API key that is used (with the secret key) to authenticate the Hub manager.
  • alluxio.hub.authentication.secretKey (required) is a required secret key that is used (with the api key) to authenticate the Hub manager.
  • alluxio.hub.cluster.label (optional) can be set to label the cluster to help identify it when managing multiple clusters.

All other properties are optional. These properties should be set in alluxio-site.properties before starting the Hub processes. The mechanism varies depending on the compute environment selected as in the deployment section above.

What next

Once deployed, you can visit the Hub at url provided by Alluxio (same as alluxio.hub.hosted.rpc.hostname). Sign in using the configured username and password.

hub_cluster_page

Sign in using the admin credentials. Default: Username = 'alluxio', Password = 'alluxio'.

In the console you have access to the following:

If you have multiple Alluxio clusters, you can connect all of them to the Hub and have access to the features listed above for each cluster.

hub_multi_cluster_page

Click on a cluster to access the selected cluster's processes dashboard, configuration wizard, and much more.