Data Orchestration Hub (Preview)
Data Orchestration Hub, or the Hub, is a management service that makes it easy to manage multiple clusters and connect with multiple data sources to unify data lakes. The service provides an easy to use unified management view for configuration and monitoring, and wizard based curation of deployment workflows.
- Connect Your Data Sources: Connect Alluxio to data storage and catalogs across multiple clouds, single cloud or on-premises using guided wizards.
- Monitor Your Alluxio Cluster: Monitor your Alluxio cluster.
- Manage Configuration: Set and distribute configuration for a cluster.
When to Use
Data Orchestration Hub agents are co-deployed on the cluster running Alluxio. The Hub can connect to multiple Alluxio clusters across environments. Further instructions can be found in the deployment section below.
Once connected to an Alluxio cluster, the Hub can be used to modify the state of the Alluxio cluster, such as updating configuration and restarting processes. The following scenarios illustrate usage of the Hub web interface.
Scenario A: Managing an Alluxio cluster
The Hub can be used to view a dashboard to monitor the state of processes on the cluster, as well as update configuration and restart processes.
Monitor the status of an Alluxio cluster anywhere. You can start or stop cluster components from an intuitive UI.
Scenario B: Connecting to data sources across regions
Alluxio is used to connect a compute cluster with data sources across private data-centers and public clouds potentially over a wide area network. The Hub uses a self-guided wizard based approach to allow users to connect to data sources and catalogs in the same or remote data centers. A user is guided through the required configuration steps along with validation of the connection.
These wizards are applicable for multiple scenarios including: hybrid cloud, cross-data center, single cloud or private data center deployments.
Connect Alluxio to all your data sources across multiple clouds, single cloud or on-premises using self-guided wizards.
Further usage scenarios and descriptions for the available toolset can be found by following this section below.
Deployment
The Hub consists of the following components deployed on your Alluxio cluster.
- Hub Manager: The Hub Manager serves requests to Alluxio processes via Hub Agents. The Hub Manager is used to register and communicate with the hosted Hub UI. To gain access to the Hub UI, please contact product@alluxio.com. This is a process that runs on the same node as a Alluxio Master by default, and provides the REST endpoints to serve UI requests. When using multiple Alluxio masters, any node can be chosen to deploy the Hub Manager.
- Hub Agent: The Hub agents are deployed on both Alluxio Masters and Alluxio Workers. These agent processes serve requests from the Hub Manager to make changes to the cluster without SSH access.
The following diagram illustrates the Hub architecture:
Hub Agents must be present on all managed nodes whereas the Hub Manager is a single instance.
Choose your compute environment to see how to deploy Data Orchestration Hub.
To get more information and get access to the Hub, please contact [product@alluxio.com](mailto:product@alluxio.com).
Getting Started
The Hub web interface is a hosted service that gives users a single access point to connect and interact with their Alluxio clusters via Hub Manager/Agents.
Generating an API key
An API/secret key pair is required to authenticate the Hub Manager with the Hosted Hub. Before you start the Hub Manager
and Hub Agents, you will need to access the Hub UI and generate an API/secret key pair. Once generated, you must set
alluxio.hub.authentication.apiKey
amd alluxio.hub.authentication.secretKey
in the Hub Manager’s alluxio-site.properties
.
Click on the "New API Key" button and follow the prompts to generate an API and secret keypair.
Starting the Hub
To start the Hub Manager on the primary master node only:
$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_manager
To start the Hub Agents on all nodes, execute the following on each node:
$ ${ALLUXIO_HOME}/bin/alluxio-start.sh -a hub_agent
Stopping the Hub
In order to stop the Hub, execute the following on the node the Hub manager was started:
$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_manager
$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent
Execute the following on all nodes:
$ ${ALLUXIO_HOME}/bin/alluxio-stop.sh -a hub_agent
Configuration
For a complete list of properties applicable to the Hub, please search for properties prefixed with
alluxio.hub
on this page.
Note:
alluxio.hub.hosted.rpc.hostname
(required) specifies the address of the Hosted Hub for the Hub manager to register with.alluxio.hub.authentication.apiKey
(required) is a required API key that is used (with the secret key) to authenticate the Hub manager.alluxio.hub.authentication.secretKey
(required) is a required secret key that is used (with the api key) to authenticate the Hub manager.alluxio.hub.cluster.label
(optional) can be set to label the cluster to help identify it when managing multiple clusters.
All other properties are optional. These properties should be set in alluxio-site.properties before starting the Hub processes. The mechanism varies depending on the compute environment selected as in the deployment section above.
What next
Once deployed, you can visit the Hub at url provided by Alluxio (same as alluxio.hub.hosted.rpc.hostname
).
Sign in using the configured username and password.
Sign in using the admin credentials. Default: Username = 'alluxio', Password = 'alluxio'.
In the console you have access to the following:
- Process Management: Monitor status of each process part of the Alluxio cluster, and start / stop processes.
- Connect Data Storage: Connect Alluxio to your data sources across a hybrid cloud, single cloud or on-premises.
- Connect Data Catalog: Configure structured data catalogs for OLAP on Alluxio.
- Advanced Configuration: Customize your Alluxio cluster with advanced options.
If you have multiple Alluxio clusters, you can connect all of them to the Hub and have access to the features listed above for each cluster.
Click on a cluster to access the selected cluster's processes dashboard, configuration wizard, and much more.