Skip to content

marvi/cesnet-hue

 
 

Repository files navigation

Apache Hue Web Interface

Build Status Puppet Forge

Table of Contents

  1. Module Description - What the module does and why it is useful
  2. Setup - The basics of getting started with hue
  3. Usage - Configuration options and additional functionality
  4. Reference - An under-the-hood peek at what the module is doing and how
  5. Limitations - OS compatibility, etc.
  6. Development - Guide for contributing to the module

Module Description

Installs Apache Hue - web user interface for Hadoop environment.

Setup

What hue affects

  • Alternatives:
  • alternatives are used for /etc/hue/conf in Cloudera
  • this module switches to the new alternative by default, so the original configuration can be kept intact
  • Files modified:
  • /etc/hue/conf/hue.ini
  • /etc/security/keytab/hue.service.keytab ownership changed (only when security is enabled, the path can be changed by keytab_hue parameter)
  • /etc/grid-security/hostcert.pem copied to /etc/hue/conf/ (only with https)
  • /etc/grid-security/hostkey.pem copied to /etc/hue/conf/ (only with https)
  • /etc/security/keytab/hue-http.service.keytab copied to /etc/hue/conf/ (only with spnego authorization)
  • when using external database, the import logs in /var/lib/hue/logs/ are kept
  • Helper files:
  • /var/lib/hue/.puppet-init1-syncdb
  • /var/lib/hue/.puppet-init1-migrate
  • Packages: hue, python-psycopg2 (when postgres DB is used)
  • Services: hue
  • Users and groups:
  • hue::hdfs and hue::user classes creates the user hue and group hue
  • When using external database: data are imported using hue tools

Setup Requirements

  • Hadoop cluster with WebHDFS or httpfs (httpfs is required for HDFS HA)
  • HBase with thrift server (optional)
  • Hive Server2 (optional)
  • Oozie Server (optional)
  • Authorizations set
  • 'hue' user must be enabled in security.client.protocol.acl, default '*' is OK
  • Proxy users set for Hadoop core, Hadoop httpfs, Ozzie server and Hive
  • for cesnet::hadoop puppet module: parameters hue_hostnames and httpfs_hostnames
  • for cesnet::oozie puppet module: parameter hue_hostnames
  • for cesnet::hive puppet module:
  • add 'oozie' and 'hive' to hadoop.proxyuser.hive.groups
  • set also hadoop.proxyuser.hive.hosts as needed
  • with security:
  • secured cluster
  • Oozie property (among the others): oozie.credentials.credentialclasses

Usage

Basic cluster usage

$master_hostname = 'hdfs.example.com'
$hue_hostname = 'hue.example.com'
$secret = 'I sleep with my cats.'

class { '::hadoop':
  ...
  hue_hostnames => ['hue.example.com'],
  #oozie_hostnames => [...],
}

node 'hdfs.example.com' {
  include ::hadoop::namenode
  ...
  include ::hue::hdfs
}

node 'hue.example.com' {
  class { '::hue':
    hdfs_hostname => $master_hostname,
    #yarn_hostname  => ...,
    #oozie_hostname => ...,
    secret        => $secret,
  }
}

High availability cluster usage

$cluster_name = 'cluster',
$master_hostnames = [
  'master1.example.com',
  'master2.example.com',
]
$hue_hostname = 'hue.example.com'
$secret = "Trump's real name is Drumpf."

class { '::hadoop':
  ...
  cluster_name     => $cluster_name,
  hdfs_hostname    => $master_hostnames[0],
  hdfs_hostname2   => $master_hostnames[1],
  hue_hostnames    => [$hue_hostname],
  httpfs_hostnames => [$hue_hostname],
  yarn_hostname    => $master_hostnames[0],
  yarn_hostname2   => $master_hostnames[1],
  #oozie_hostnames => [...],
}

node 'master1.example.com' {
  include ::hadoop::namenode
  ...
  include ::hue::hdfs
}

node 'master2.example.com' {
  include ::hadoop::namenode
  ...
  include ::hue::user
}

node 'hue.example.com' {
  include ::hadoop::httpfs
  class { '::hue':
    defaultFS       => "hdfs://${cluster_name}",
    httpfs_hostname => $hue_hostname,
    yarn_hostname   => $master_hostnames[0],
    yarn_hostname2  => $master_hostnames[1],
    #oozie_hostname => ...,
    secret          => $secret,
  }
}

There is also needed class hue::hdfs on all HDFS Namenodes to authorization work properly. You can use hue::user instead, or install hue-common package.

It is recommended to set properties hadoop.yarn_clusters.default.logical_name and hadoop.yarn_clusters.ha.logical_name according to the yarn.resourcemanager.ha.rm-ids from Hadoop YARN. cesnet-hue module uses 'rm1' and 'rm2' values, which is cesnet-hadoop puppet module default.

Enable security

Use realm parameter to set the Kerberos realm and enable security. https parameter will enable SSL support.

Useful parameters:

Default credential files locations:

  • /etc/security/keytab/hue.service.keytab
  • /etc/grid-security/hostcert.pem
  • /etc/grid-security/hostkey.pem
  • /etc/hue/cacerts.pem (system default)

By default strict-transport-security is explicitly disabled by this puppet module, so other services are not disrupted. Use desktop.secure_hsts_seconds to enable it (default value used by hue is 31536000).

SPNEGO authentization

You can authenticate over HTTPS using Kerberos ticket.

For that is needed kerberos keytab placed in /etc/security/keytabs/hue-http.service.keytab with principals (replace HOSTNAME and REALM by real values):

  • hue/HOSTNAME@REALM
  • HTTP/HOSTNAME@REALM

You will need to set auth parameter to spnego (or set everything manually: KRB5_KTNAME environment, and deskop.auth property).

Example (hiera yaml format):

hue::auth: spnego
#(default)hue::keytab_hue: /etc/security/keytab/hue.service.keytab
#(default)hue::keytab_spnego: /etc/security/keytab/hue-http.service.keytab

Example (manually) (hiera yaml format):

hue::environment:
 KRB5_KTNAME:  /etc/security/keytab/hue-http.service.keytab
hue::keytab_hue: /etc/security/keytab/hue.service.keytab
hue::properties:
 desktop.auth.backend: desktop.auth.backend.SpnegoDjangoBackend

SAML authentization

SAML is SSO authentization used for example in federated environments.

For SAML there is needed:

  • metadata file from Identity Provider (libsaml.metadata_file property)
  • SSL certificates (see Enable Security)
  • permitted redirection for used IdP (desktop.redirect_whitelist property)
  • xmlsec1 utility on ALL nodes in Hadoop cluster

Example (hiera yaml format):

hue::auth: saml
hue::properties:
  desktop.redirect_whitelist: ^\/.$,^https:\/\/idp.example.com\/.$
  libsaml.metadata_file: /opt/my-idp-saml-metadata.xml

Example (manually) (hiera yaml format):

hue::properties: desktop.redirect_whitelist: ^/.$,^https://idp.example.com/.$ libsaml.metadata_file: /opt/my-idp-saml-metadata.xml libsaml.cert_file: /etc/hue/conf/hostcert.pem libsaml.key_file: /etc/hue/conf/hostkey.pem libsaml.xmlsec_binary: /usr/bin/xmlsec1

MySQL backend

It is recommended to use a full database instead of sqlite.

Example of using MySQL with puppetlabs-mysql puppet module:

node 'hue.example.com' {
  ...

  class{'::hue':
    ...
    db          => 'mysql',
    db_password => 'huepassword',
  }

  class { '::mysql::server':
    root_password  => 'strongpassword',
  }

  mysql::db { 'hue':
    user     => 'hue',
    password => 'huepassword',
    grant    => ['ALL'],
  }

  # database import in the hue::service, database also required for hue
  Mysql::Db['hue'] -> Class['hue::service']
}

PostgreSQL backend

It is recommended to use a full database instead of sqlite.

Example of using PostgreSQL with puppetlabs-postgresql puppet module:

node 'hue.example.com' {
  ...

  class{'::hue':
    ...
    db          => 'postgresql',
    db_password => 'huepassword',
  }

  class { '::postgresql::server':
    postgres_password  => 'strongpassword',
  }

  postgresql::server::db { 'hue':
    user     => 'hue',
    password => postgresql_password('hue', 'huepassword'),
  }

  # database import in the hue::service, database also required for hue
  Postgresql::Server::Db['hue'] -> Class['hue::service']
}

Reference

### Classes
  • hue: The main configuration class
  • hue::common::postinstall: Preparation steps after installation
  • hue::config: Configuration of Apache Hue
  • hue::hdfs: HDFS initialization
  • hue::install: Installation of Apache Hue
  • hue::params
  • hue::service: Ensure the Apache Hue is running
  • hue::user: Create hue system user, if needed
### Class `hue`

The main deployment class.

####alternatives

Switches the alternatives used for the configuration. Default: 'cluster' (Debian) or undef.

It can be used only when supported (for example with Cloudera distribution).

####auth

Authorization backend. Default: undef

Values:

  • undef (default): default Hue authorizaction, local passwords
  • saml: SAML authorization backend
  • spnego: GSS-API negotiation mechanism

See also:

  • keytab_spnego
  • properties: libsaml.*

####db

Database backend for Hue. Default: undef.

The default is the sqlite database, but it is recommended to use a full database.

Values:

  • sqlite (default): database in the file
  • mariadb, mysql: MySQL/MariaDB
  • oracle: Oracle database
  • postgresql: PostgreSQL

It can be overridden by desktop.database.engine property.

####db_host

Database hostname for mariadb, mysql, postgresql. Default: 'localhost'.

It can be overridden by desktop.database.host property.

####db_name

The file for sqlite, database name for mariadb, mysql and postgresql, or database name or SID for oracle. Default: undef.

Default values:

  • sqlite: /var/lib/hue/desktop.db
  • mariadb, mysql, postgresql: hue
  • oracle: XE

It can be overridden by desktop.database.name property.

####db_user

Database user for mariadb, mysql, and postgresql. Default: 'hue'.

####db_password

Database password for mariadb, mysql, and postgresql. Default: undef.

####defaultFS

HDFS defaultFS. Default: undef ("hdfs://${hdfs_hostname}:8020").

The value is required for HA HDFS cluster. For non-HA cluster the automatic value from hdfs_hostname parameter is fine.

####environment

Environment to set for Hue daemon. Default: undef.

environment => {
  'KRB5_KTNAME' => '/var/lib/hue/hue.keytab',
}

####group

Default user group for newly created users. Default: 'users'.

####hdfs_hostname

Hadoop HDFS hostname. Default: undef.

The value is required for non-HA HDFS cluster (for HDFS HA, the parameters httpfs_hostname and defaultFS must be used instead).

####historyserver_hostname

Hadoop MapReduce Job History hostname. Default: undef.

By default, the value is yarn_hostname2, or yarn_hostname.

####httpfs_hostname

HTTPFS proxy hostname, if available. Default: undef.

It is required with HDFS High Availability. We recommend to have it on the same machine with Apache Hue.

####hive_server2_hostname

Hive Server2 hostname. Default: undef.

####impala_hostname

Impala server hostname. Default: undef.

Use one of the impalad.

####https

Enable support for https. Default: false.

####https_cachain

CA chain file in PEM format. Default: undef.

System default is /etc/hue/cacerts.pem.

####https_certificate

Certificate file in PEM format. Default: '/etc/grid-security/hostcert.pem'.

The certificate file is copied into Hue configuration directory.

####https_hue

Enable support for https, but only in Hue web interface. Default: undef.

If specified, this parameter will take precedence over https parameter for Hue web interface. All remote services are still used according to https parameter.

Using https_hue set to true, it is possible to have unsecured Hadoop cluster, but with secured Hue web GUI endpoint. See also desktop.secure_hsts_seconds Hue property in Enable Security.

####https_private_key

Private key file in PEM format. Default: '/etc/grid-security/hostkey.pem'.

The key file is copied into Hue configuration directory.

####https_passphrase

Default: undef.

####keytab_hue

Default: '/etc/security/keytabs/hue.service.keytab'.

Hue keytab file with hue/HOSTNAME@REALM principal.

####oozie_hostname

Oozie server hostname. Default: undef.

####properties

"Raw" properties for hadoop cluster. Default: undef.

"::undef" value will remove given property set automatically by this module, empty string sets the empty value.

####package_name

Hue package name. Default: 'hue'.

####service_name

Hue service name. Default: 'hue'.

####realm

Kerberos realm. Default: undef.

Non-empty value enables the security.

####yarn_hostname

Hadoop YARN Resourcemanager hostname. Default: undef.

####yarn_hostname2

Hadoop YARN Second Resourcemanager hostname, when high availability is used. Default: undef.

####zookeeper_hostnames

List of zookeeper hostnames. Default: [].

####zookeeper_rest_hostname

Zookeeper REST server hostname. Default: undef.

Not available in Cloudera. Sources are available at https://github.com/apache/zookeeper.

### Class `hue::hdfs`

HDFS initialization. Actions necessary to launch on HDFS namenode: Create hue user, if needed.

This class or hue::user class is needed to be launched on all HDFS namenodes.

### Class `hue::user`

Creates hue system user, if needed. The hue user is required on the all HDFS namenodes to authorization work properly and we don't need to install hue just for the user.

It is better to handle creating the user by the packages, so we recommend dependency on installation classes or Hue packages.

No Java is installed nor software repository set (you can use other puppet modules for that: cesnet-java_ng , puppetlabs::java, cesnet::site_hadoop, razorsedge/cloudera, ...).

Development

About

Apache Hue web interface

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Puppet 66.2%
  • Ruby 28.4%
  • HTML 5.4%