- Module Description - What the module does and why it is useful
- Setup - The basics of getting started with hue
- Usage - Configuration options and additional functionality
- Reference - An under-the-hood peek at what the module is doing and how
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
Installs Apache Hue - web user interface for Hadoop environment.
- Alternatives:
- alternatives are used for /etc/hue/conf in Cloudera
- this module switches to the new alternative by default, so the original configuration can be kept intact
- Files modified:
- /etc/hue/conf/hue.ini
- /etc/security/keytab/hue.service.keytab ownership changed (only when security is enabled, the path can be changed by keytab_hue parameter)
- /etc/grid-security/hostcert.pem copied to /etc/hue/conf/ (only with https)
- /etc/grid-security/hostkey.pem copied to /etc/hue/conf/ (only with https)
- /etc/security/keytab/hue-http.service.keytab copied to /etc/hue/conf/ (only with spnego authorization)
- when using external database, the import logs in /var/lib/hue/logs/ are kept
- Helper files:
- /var/lib/hue/.puppet-init1-syncdb
- /var/lib/hue/.puppet-init1-migrate
- Packages: hue, python-psycopg2 (when postgres DB is used)
- Services: hue
- Users and groups:
- hue::hdfs and hue::user classes creates the user hue and group hue
- When using external database: data are imported using hue tools
- Hadoop cluster with WebHDFS or httpfs (httpfs is required for HDFS HA)
- HBase with thrift server (optional)
- Hive Server2 (optional)
- Oozie Server (optional)
- Authorizations set
- 'hue' user must be enabled in security.client.protocol.acl, default '*' is OK
- Proxy users set for Hadoop core, Hadoop httpfs, Ozzie server and Hive
- for cesnet::hadoop puppet module: parameters hue_hostnames and httpfs_hostnames
- for cesnet::oozie puppet module: parameter hue_hostnames
- for cesnet::hive puppet module:
- add 'oozie' and 'hive' to hadoop.proxyuser.hive.groups
- set also hadoop.proxyuser.hive.hosts as needed
- with security:
- secured cluster
- Oozie property (among the others): oozie.credentials.credentialclasses
$master_hostname = 'hdfs.example.com'
$hue_hostname = 'hue.example.com'
$secret = 'I sleep with my cats.'
class { '::hadoop':
...
hue_hostnames => ['hue.example.com'],
#oozie_hostnames => [...],
}
node 'hdfs.example.com' {
include ::hadoop::namenode
...
include ::hue::hdfs
}
node 'hue.example.com' {
class { '::hue':
hdfs_hostname => $master_hostname,
#yarn_hostname => ...,
#oozie_hostname => ...,
secret => $secret,
}
}
$cluster_name = 'cluster',
$master_hostnames = [
'master1.example.com',
'master2.example.com',
]
$hue_hostname = 'hue.example.com'
$secret = "Trump's real name is Drumpf."
class { '::hadoop':
...
cluster_name => $cluster_name,
hdfs_hostname => $master_hostnames[0],
hdfs_hostname2 => $master_hostnames[1],
hue_hostnames => [$hue_hostname],
httpfs_hostnames => [$hue_hostname],
yarn_hostname => $master_hostnames[0],
yarn_hostname2 => $master_hostnames[1],
#oozie_hostnames => [...],
}
node 'master1.example.com' {
include ::hadoop::namenode
...
include ::hue::hdfs
}
node 'master2.example.com' {
include ::hadoop::namenode
...
include ::hue::user
}
node 'hue.example.com' {
include ::hadoop::httpfs
class { '::hue':
defaultFS => "hdfs://${cluster_name}",
httpfs_hostname => $hue_hostname,
yarn_hostname => $master_hostnames[0],
yarn_hostname2 => $master_hostnames[1],
#oozie_hostname => ...,
secret => $secret,
}
}
There is also needed class hue::hdfs on all HDFS Namenodes to authorization work properly. You can use hue::user instead, or install hue-common package.
It is recommended to set properties hadoop.yarn_clusters.default.logical_name and hadoop.yarn_clusters.ha.logical_name according to the yarn.resourcemanager.ha.rm-ids from Hadoop YARN. cesnet-hue module uses 'rm1' and 'rm2' values, which is cesnet-hadoop puppet module default.
Use realm parameter to set the Kerberos realm and enable security. https parameter will enable SSL support.
Useful parameters:
Default credential files locations:
- /etc/security/keytab/hue.service.keytab
- /etc/grid-security/hostcert.pem
- /etc/grid-security/hostkey.pem
- /etc/hue/cacerts.pem (system default)
By default strict-transport-security is explicitly disabled by this puppet module, so other services are not disrupted. Use desktop.secure_hsts_seconds to enable it (default value used by hue is 31536000).
You can authenticate over HTTPS using Kerberos ticket.
For that is needed kerberos keytab placed in /etc/security/keytabs/hue-http.service.keytab with principals (replace HOSTNAME and REALM by real values):
- hue/HOSTNAME@REALM
- HTTP/HOSTNAME@REALM
You will need to set auth parameter to spnego (or set everything manually: KRB5_KTNAME environment, and deskop.auth property).
Example (hiera yaml format):
hue::auth: spnego
#(default)hue::keytab_hue: /etc/security/keytab/hue.service.keytab
#(default)hue::keytab_spnego: /etc/security/keytab/hue-http.service.keytab
Example (manually) (hiera yaml format):
hue::environment:
KRB5_KTNAME: /etc/security/keytab/hue-http.service.keytab
hue::keytab_hue: /etc/security/keytab/hue.service.keytab
hue::properties:
desktop.auth.backend: desktop.auth.backend.SpnegoDjangoBackend
SAML is SSO authentization used for example in federated environments.
For SAML there is needed:
- metadata file from Identity Provider (libsaml.metadata_file property)
- SSL certificates (see Enable Security)
- permitted redirection for used IdP (desktop.redirect_whitelist property)
- xmlsec1 utility on ALL nodes in Hadoop cluster
Example (hiera yaml format):
hue::auth: saml
hue::properties:
desktop.redirect_whitelist: ^\/.$,^https:\/\/idp.example.com\/.$
libsaml.metadata_file: /opt/my-idp-saml-metadata.xml
Example (manually) (hiera yaml format):
hue::properties: desktop.redirect_whitelist: ^/.$,^https://idp.example.com/.$ libsaml.metadata_file: /opt/my-idp-saml-metadata.xml libsaml.cert_file: /etc/hue/conf/hostcert.pem libsaml.key_file: /etc/hue/conf/hostkey.pem libsaml.xmlsec_binary: /usr/bin/xmlsec1
It is recommended to use a full database instead of sqlite.
Example of using MySQL with puppetlabs-mysql puppet module:
node 'hue.example.com' {
...
class{'::hue':
...
db => 'mysql',
db_password => 'huepassword',
}
class { '::mysql::server':
root_password => 'strongpassword',
}
mysql::db { 'hue':
user => 'hue',
password => 'huepassword',
grant => ['ALL'],
}
# database import in the hue::service, database also required for hue
Mysql::Db['hue'] -> Class['hue::service']
}
It is recommended to use a full database instead of sqlite.
Example of using PostgreSQL with puppetlabs-postgresql puppet module:
node 'hue.example.com' {
...
class{'::hue':
...
db => 'postgresql',
db_password => 'huepassword',
}
class { '::postgresql::server':
postgres_password => 'strongpassword',
}
postgresql::server::db { 'hue':
user => 'hue',
password => postgresql_password('hue', 'huepassword'),
}
# database import in the hue::service, database also required for hue
Postgresql::Server::Db['hue'] -> Class['hue::service']
}
hue
: The main configuration classhue::common::postinstall
: Preparation steps after installationhue::config
: Configuration of Apache Huehue::hdfs
: HDFS initializationhue::install
: Installation of Apache Huehue::params
hue::service
: Ensure the Apache Hue is runninghue::user
: Create hue system user, if needed
The main deployment class.
####alternatives
Switches the alternatives used for the configuration. Default: 'cluster' (Debian) or undef.
It can be used only when supported (for example with Cloudera distribution).
####auth
Authorization backend. Default: undef
Values:
undef
(default): default Hue authorizaction, local passwords- saml: SAML authorization backend
- spnego: GSS-API negotiation mechanism
See also:
- keytab_spnego
- properties: libsaml.*
####db
Database backend for Hue. Default: undef.
The default is the sqlite database, but it is recommended to use a full database.
Values:
- sqlite (default): database in the file
- mariadb, mysql: MySQL/MariaDB
- oracle: Oracle database
- postgresql: PostgreSQL
It can be overridden by desktop.database.engine property.
####db_host
Database hostname for mariadb, mysql, postgresql. Default: 'localhost'.
It can be overridden by desktop.database.host property.
####db_name
The file for sqlite, database name for mariadb, mysql and postgresql, or database name or SID for oracle. Default: undef.
Default values:
- sqlite: /var/lib/hue/desktop.db
- mariadb, mysql, postgresql: hue
- oracle: XE
It can be overridden by desktop.database.name property.
####db_user
Database user for mariadb, mysql, and postgresql. Default: 'hue'.
####db_password
Database password for mariadb, mysql, and postgresql. Default: undef.
####defaultFS
HDFS defaultFS. Default: undef ("hdfs://${hdfs_hostname}:8020").
The value is required for HA HDFS cluster. For non-HA cluster the automatic value from hdfs_hostname parameter is fine.
####environment
Environment to set for Hue daemon. Default: undef.
environment => {
'KRB5_KTNAME' => '/var/lib/hue/hue.keytab',
}
####group
Default user group for newly created users. Default: 'users'.
####hdfs_hostname
Hadoop HDFS hostname. Default: undef.
The value is required for non-HA HDFS cluster (for HDFS HA, the parameters httpfs_hostname and defaultFS must be used instead).
####historyserver_hostname
Hadoop MapReduce Job History hostname. Default: undef.
By default, the value is yarn_hostname2, or yarn_hostname.
####httpfs_hostname
HTTPFS proxy hostname, if available. Default: undef.
It is required with HDFS High Availability. We recommend to have it on the same machine with Apache Hue.
####hive_server2_hostname
Hive Server2 hostname. Default: undef.
####impala_hostname
Impala server hostname. Default: undef.
Use one of the impalad.
####https
Enable support for https. Default: false.
####https_cachain
CA chain file in PEM format. Default: undef.
System default is /etc/hue/cacerts.pem.
####https_certificate
Certificate file in PEM format. Default: '/etc/grid-security/hostcert.pem'.
The certificate file is copied into Hue configuration directory.
####https_hue
Enable support for https, but only in Hue web interface. Default: undef.
If specified, this parameter will take precedence over https parameter for Hue web interface. All remote services are still used according to https parameter.
Using https_hue set to true, it is possible to have unsecured Hadoop cluster, but with secured Hue web GUI endpoint. See also desktop.secure_hsts_seconds Hue property in Enable Security.
####https_private_key
Private key file in PEM format. Default: '/etc/grid-security/hostkey.pem'.
The key file is copied into Hue configuration directory.
####https_passphrase
Default: undef.
####keytab_hue
Default: '/etc/security/keytabs/hue.service.keytab'.
Hue keytab file with hue/HOSTNAME@REALM principal.
####oozie_hostname
Oozie server hostname. Default: undef.
####properties
"Raw" properties for hadoop cluster. Default: undef.
"::undef" value will remove given property set automatically by this module, empty string sets the empty value.
####package_name
Hue package name. Default: 'hue'.
####service_name
Hue service name. Default: 'hue'.
####realm
Kerberos realm. Default: undef.
Non-empty value enables the security.
####yarn_hostname
Hadoop YARN Resourcemanager hostname. Default: undef.
####yarn_hostname2
Hadoop YARN Second Resourcemanager hostname, when high availability is used. Default: undef.
####zookeeper_hostnames
List of zookeeper hostnames. Default: [].
####zookeeper_rest_hostname
Zookeeper REST server hostname. Default: undef.
Not available in Cloudera. Sources are available at https://github.com/apache/zookeeper.
### Class `hue::hdfs`HDFS initialization. Actions necessary to launch on HDFS namenode: Create hue user, if needed.
This class or hue::user class is needed to be launched on all HDFS namenodes.
### Class `hue::user`Creates hue system user, if needed. The hue user is required on the all HDFS namenodes to authorization work properly and we don't need to install hue just for the user.
It is better to handle creating the user by the packages, so we recommend dependency on installation classes or Hue packages.
No Java is installed nor software repository set (you can use other puppet modules for that: cesnet-java_ng , puppetlabs::java, cesnet::site_hadoop, razorsedge/cloudera, ...).
- Repository: https://github.com/MetaCenterCloudPuppet/cesnet-hue
- Tests:
- basic: see .travis.yml