Skip to content

Commit

Permalink
feat(v1.0): first steps toward v1.0, with first alpha version
Browse files Browse the repository at this point in the history
  • Loading branch information
RISCH Francois committed Sep 5, 2024
1 parent aeb95b2 commit 14655d7
Show file tree
Hide file tree
Showing 177 changed files with 28,598 additions and 2,360 deletions.
16 changes: 15 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,18 @@ dev-support/deployment/repository/launch*.sh
.classpath
.settings
.project
config.properties
config.properties
node_modules/
node_modules/**
# The following files are generated/updated by vaadin-maven-plugin
src/main/frontend/generated/
pnpmfile.js
vite.generated.ts

# Browser drivers for local integration tests
drivers/
# Error screenshots generated by TestBench for failed integration tests
error-screenshots/
webpack.generated.js


9 changes: 6 additions & 3 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ Public Documentation is available here link:https://frischhwc.github.io/datagen[

Datagen is made with:

- Java 11
- Java 17
- Maven 2.5

It is built for:

- CDP 7.1.7, 7.1.8, 7.1.9
- CDP 7.1.9 +

Datagen is made as a Spring Boot project with dependencies on SDK to interact with various services.

Expand Down Expand Up @@ -60,7 +60,7 @@ Then Package the program:


[source,bash]
java -Dnashorn.args=--no-deprecation-warning --add-opens java.base/jdk.internal.ref=ALL-UNNAMED -jar random-datagen.jar
java -Dnashorn.args=--no-deprecation-warning --add-opens java.base/jdk.internal.ref=ALL-UNNAMED -jar datagen.jar


=== How to deploy it on CDP ?
Expand All @@ -84,6 +84,9 @@ It also provides already set up commands directly available from Cloudera Manage

image:dev-support/images/datagen_in_cm.png[Datagen Actions in CM]

__Note: If deploying to CDP cluster with auto-tLS enabled and want to use external services (such as S3 or GCP), do this command on teh host where it run:__

keytool -importkeystore -srckeystore /etc/pki/java/cacerts -destkeystore /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_truststore.jks -srcstorepass changeit -deststorepass Cloudera1234

=== NOT RECOMMENDED: How to run it on a cluster in a non-integrated way ?

Expand Down
63 changes: 55 additions & 8 deletions dev-support/csd/aux/templates/properties.j2
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
# WARNING: These two properties are required to make the spring boot work with all dependencies
# WARNING: required property to start the server
server.tomcat.additional-tld-skip-patterns=*.jar
spring.mvc.pathmatch.matching-strategy=ANT_PATH_MATCHER

# Vaadin
logging.level.org.atmosphere=warn
spring.mustache.check-template-location=false
vaadin.allowed-packages=com.vaadin,org.vaadin,dev.hilla,com.datagen
vaadin.exclude-urls=/swagger-ui/**,/api/v1/**
spring.jpa.defer-datasource-initialization=true
vaadin.productionMode=true

# When deployed to CDP, should be set to 'cdp'
spring.profiles.active=cdp
Expand All @@ -19,9 +26,12 @@ datagen.home.directory={{ globals['datagen_home_dir'] }}
datagen.model.path=${DATAGEN_MODELS_DIR}
datagen.model.received.path={{ globals['data_model_received'] }}
datagen.model.generated.path={{ globals['data_model_generated'] }}
datagen.model.default={{ globals['data_model'] }}
datagen.custom.model={{ globals['custom_model_path'] }}
datagen.model.store.path={{ globals['data_model_store'] }}
datagen.credentials.path={{ globals['credentials_file_path'] }}
datagen.analysis.path={{ globals['analysis_file_path'] }}
datagen.scheduler.file.path={{ globals['scheduler_file_path'] }}
datagen.commands.path={{ globals['commands_file_path'] }}
datagen.load.default.models={{ globals['load_default_models'] }}

{% if globals['tls_enabled'] == 'true' %}
# Spring internal TLS settings
Expand Down Expand Up @@ -88,8 +98,8 @@ hbase.zookeeper.port={{ globals['hbase_zk_quorum_port'] }}
hbase.zookeeper.znode={{ globals['hbase_znode_parent'] }}
hbase.auth.kerberos=#{kerberos.enabled}
# It is not needed to fill below configuration if KERBEROS is not activated
hbase.security.user=#{kerberos.user}
hbase.security.keytab=#{kerberos.keytab}
hbase.auth.kerberos.user=#{kerberos.user}
hbase.auth.kerberos.keytab=#{kerberos.keytab}


# OZONE
Expand Down Expand Up @@ -143,7 +153,7 @@ kafka.auth.kerberos.user=#{kerberos.user}


# KUDU
kudu.master.server={{ globals['kudu_url'] }}
kudu.url={{ globals['kudu_url'] }}
kudu.auth.kerberos=#{kerberos.enabled}
# It is not needed to fill below configuration if KERBEROS is not activated
kudu.security.user=#{kerberos.user}
Expand All @@ -165,4 +175,41 @@ adls.sas.token={{ globals['adls_sas_token'] }}
# GCS
gcs.project.id={{ globals['gcs_project_id'] }}
# Only if using a service account key, otherwise use any other ADC login
gcs.accountkey.path={{ globals['gcs_account_key'] }}
gcs.account.key.path={{ globals['gcs_account_key'] }}
gcs.region={{ globals['gcs_region'] }}

# OLLAMA
spring.ai.ollama.base-url={{ globals['ollama_base_url'] }}
spring.ai.ollama.chat.enabled=true
spring.ai.ollama.chat.options.format=json
ollama.model.default={{ globals['ollama_model'] }}
ollama.temperature.default={{ globals['ollama_temperature'] }}
ollama.frequency_penalty.default={{ globals['ollama_frequency_penalty'] }}
ollama.presence_penalty.default={{ globals['ollama_presence_penalty'] }}
ollama.top_p.default={{ globals['ollama_top_p'] }}

# BEDROCK
bedrock.region={{ globals['bedrock_region'] }}
bedrock.model.default={{ globals['bedrock_model'] }}
bedrock.temperature.default={{ globals['bedrock_temperature'] }}
bedrock.max_tokens.default={{ globals['bedrock_max_tokens'] }}
bedrock.access_key.id={{ globals['bedrock_access_key_id'] }}
bedrock.access_key.secret={{ globals['bedrock_access_key_secret'] }}

# OPEN AI
# Let the api key here to 'test' (avoid auto-configuration of client not working)
spring.ai.openai.api-key={{ globals['openai_key_name'] }}
openai.api.key={{ globals['openai_key'] }}
openai.model.default={{ globals['openai_model'] }}
openai.temperature.default={{ globals['openai_temperature'] }}
openai.frequency_penalty.default={{ globals['openai_frequency_penalty'] }}
openai.presence_penalty.default={{ globals['openai_presence_penalty'] }}
openai.max_tokens.default={{ globals['openai_max_tokens'] }}
openai.top_p.default={{ globals['openai_top_p'] }}

#Local LLM
local.llm.temperature.default={{ globals['local_llm_temperature'] }}
local.llm.frequency_penalty.default={{ globals['local_llm_frequence_penalty'] }}
local.llm.presence_penalty.default={{ globals['local_llm_presence_penalty'] }}
local.llm.max_tokens.default={{ globals['local_llm_max_tokens'] }}
local.llm.top_p.default={{ globals['local_llm_top_p'] }}
4 changes: 2 additions & 2 deletions dev-support/csd/descriptor/service.mdl
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"version": "0.5.1",
"version": "0.8.0",
"name": "DATAGEN",
"nameForCrossEntityAggregateMetrics": "datagen",
"compatibility": {
"cdhVersion": {
"min": "7.1.7",
"min": "7.1.9",
"max": "8.0.0"
}
},
Expand Down
Loading

0 comments on commit 14655d7

Please sign in to comment.