Detailed overview can be found here
Tested with:
- HDP 3.1.0 / Ambari 2.7.3.0
- HDP 3.0.1 / Ambari 2.7.1.0
- HDP 3.0.0 / Ambari 2.7.0.0
How to quickly setup HDP Security/Governance/GDPR (HortoniaBank) demo for secondary filesystem.
- HortoniaBank artifacts
- Demo hive tables
- Demo tags/attributes and lineage in Atlas
- Demo Zeppelin notebooks to walk through demo scenario
- Ranger policies across HDFS, Hive, Hbase, Kafka, Atlas to showcase:
- Tag based access policies across HDFS/Hive/HBase/Kafka
- Row level filtering on co-mingled datasets in Hive
- Dynamic tag based masking in Hive columns
- Time bound Ranger policies
- Classifications (tags) in Atlas
- Tag propagation
- Data lineage in Atlas for HDFS, Hive, Hbase
- GDPR Scenarios around consent and data erasure
-
Password less ssh across all nodes with gateway/ambari/appliance node (also needed password less ssh with self)
-
Add below components if not present :
- Ranger
(Password to be set as Rangerpassword@123)
- Atlas
(Password to be set as admin123)
- Hive
- HBase
- Kafka
(One broker on the same node as appliance)
- Zeppelin
(On the same node as appliance)
- Ranger
-
Setup a KDC server
(with default REALM as EXAMPLE.COM)
and Enable kerberos in the cluster and have a admin principal created :Principal : admin/admin@EXAMPLE.COM Password : admin
-
Enable all the Ranger plugins. This step creates the policy repo for each component.
But before running the setup disable the plugin again (to avoid permission issue enforced by Ranger). -
Populate secondary fs url in /grid/0/hadoopqe/conf/suite.conf and do not put a "/" at end of the URI :
Format : [fs-acronym]://[server]:[port]
E.g:[secondaryfs] USE_SECONDARY_FS = True SECONDARY_FS_URL = nfs://hdpserver:1228
-
Data and Media files needed for the test are hosted in qe-repo bucket and referred again from above suite.conf file :
[hortoniabank] NOTEBOOK_MEDIA_URL = http://qe-repo.s3.amazonaws.com/partener-test-data/notebook-media.tgz RANGER_ATLAS_DATA_URL = http://qe-repo.s3.amazonaws.com/partener-test-data/ranger-atlas-data-csv.tgz HORTONIA_MUNICH_DATA_URL = http://qe-repo.s3.amazonaws.com/partener-test-data/HortoniaMunichSetup-data-csv.tgz
-
To trigger the script SSH in to Ambari/appliance node as root and run below:
cd /grid/0/tools/hortoniabank chmod +x setup.sh ./setup.sh
- Accessing Zeppelin UI
http://<appliance_host>:9995/
User List for zeppelin with default password :
ivanna_eu_hr / admin
joe_analyst / admin
etl_user / admin
scott_intern / admin
- Once services are up, login to Zeppelin (http://<appliance_host>:9995/) as ivanna_eu_hr.
- Find her notebook by searching for "hortonia" using the text field under ‘Notebook’ section.
- Select the notebook called: “HortoniaBank - Ivana EU HR”. Once selected it will ask for the interpreters to run the job.
Select all listed and click on "Save" - Once you are logged in and the mentioned notebook is selected, below page will be displayed.
- These are the list of scenarios to test Data lineage and security.
When you run a job (with Play button), the output of the job should adhere to the header statement.
E.g. First Job in above pic says "Permission Denied" as the job expect it to throw an Access Control Violation on accessing US customers. - Repeat the job with all the mentioned users and refer to the notebook based on user :
ivanna_eu_hr joe_analyst etl_user scott_intern