由于 LLAP 服务一直运行不释放。整个集群可以有一个 LLAP 服务,也可以有多个 LLAP 服务。提交LLAP 服务时指定 LLAP 到哪个队列。每个 LLAP 都有唯一的名称,用户提交作业时指定提交到哪个 LLAP中。
生成LLAP 服务程序每个用户都可以执行生成 LLAP 服务程序,运行此程序,仅仅根据参数生成运行 LLAP 需要的程序和配置。
hive --service llap --name llap-demo --instances 1 --cache 128m --executors 3 --iothreads 2 --size 1024m --xmx 512m --queue default --loglevel INFO重点的参数
参数名称 | 参数说明 |
---|---|
service | llap, 调用 hive 的 llap service,这个是固定值 |
name | LLAP 的名称,必须唯一(所有的 LLAP 服务必须用不同的名称)。由于 LLAP 使用 Zookeeper 做服务发现,启动此 LLAP 服务时,注册到 Zookeeper 的相关目录里。 |
instances | 容器的个数 |
cache | 缓存的大小 |
executors | 一个容器内的执行线程数,一个 执行线程处理一个 Task。 |
iothreads | 读取数据线程和执行线程是不同的线程。读取数据线程读取数据,并准备成执行线程所需要的列执行的格式 |
size | 容器的内存大小,指向 ResourceManager 申请容器的大小。 |
xmx | 容器的堆内存大小 |
queue | 此 LLAP 服务提交到哪个队列里。 |
loglevel | 容器的日志级别 |
usage: llap -a,--args java arguments to the llap instance -auxhive,--auxhive whether to package the Hive aux jars (true by default) -b,--service-am-container-mb The size of the service AppMaster container in MB -c,--cache生成的文件cache size per instance -d,--directory Temp directory for jars etc. -e,--executors executor per instance -H,--help Print help information -h,--auxhbase whether to package the Hbase jars (true by default) --health-init-delay-secs Delay in seconds after which health percentage is monitored (Default: 400) --health-percent Percentage of running containers after which LLAP application is considered healthy (Default: 80) --health-time-window-secs Time window in seconds (after initial delay) for which LLAP application is allowed to be in unhealthy state before being killed (Default: 300) --hiveconf Use value for given property. Overridden by explicit parameters -i,--instances Specify the number of instances to run this on -j,--auxjars additional jars to package (by default, JSON SerDe jar is packaged if available) --javaHome Path to the JRE/JDK. This should be installed at the same location on all cluster nodes ($JAVA_HOME, java.home by default) -l,--loglevel log levels for the llap instance --logger logger for llap instance ([RFA], query-routing, console -n,--name Cluster name for YARN registry --output
执行之后,生成如 “llap-yarn-29Sep2021” 的目录,以当前日期为后缀。里面有三个文件:
- Yarnfile : Yarn Service 的定义文件。
run.sh: 执行此命令启动 LLAP 服务。
llap-29Sep2021.tar.gz: LLAP 服务用的 jar 包。
Yarnfile 的内容如下:
{ "name": "llap-demo", "version": "1.0.0", "queue": "", "configuration": { "properties": { "yarn.service.rolling-log.include-pattern": ".*\.done", "yarn.component.placement.policy" : "4", "yarn.container.health.threshold.percent": "80", "yarn.container.health.threshold.window.secs": "300", "yarn.container.health.threshold.init.delay.secs": "400" } }, "components": [ { "name": "llap", "number_of_containers": 1, "launch_command": "$LLAP_DAEMON_BIN_HOME/llapDaemon.sh start &> $LLAP_DAEMON_TMP_DIR/shell.out", "artifact": { "id": ".yarn/package/LLAP/llap-29Sep2021.tar.gz", "type": "TARBALL" }, "resource": { "cpus": 1, "memory": "1024" }, "configuration": { "env": { "JAVA_HOME": "/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home", "LLAP_DAEMON_HOME": "$PWD/lib/", "LLAP_DAEMON_TMP_DIR": "$PWD/tmp/", "LLAP_DAEMON_BIN_HOME": "$PWD/lib/bin/", "LLAP_DAEMON_CONF_DIR": "$PWD/lib/conf/", "LLAP_DAEMON_LOG_DIR": "run.sh", "LLAP_DAEMON_LOGGER": "query-routing", "LLAP_DAEMON_LOG_LEVEL": "INFO", "LLAP_DAEMON_HEAPSIZE": "512", "LLAP_DAEMON_PID_DIR": "$PWD/lib/app/run/", "LLAP_DAEMON_LD_PATH": "/usr/local/hadoop/lib/native", "LLAP_DAEMON_OPTS": " -Dhttp.maxConnections=4 ", "APP_ROOT": " /app/install/", "APP_TMP_DIR": " /tmp/" } } } ], "kerberos_principal" : { "principal_name" : "", "keytab" : "" }, "quicklinks": { "LLAP Daemon JMX Endpoint": "http://llap-0.${SERVICE_NAME}.${USER}.${DOMAIN}:15002/jmx" } }
run.sh 先 stop 服务,然后 destroy, 然后重新执行。
#!/bin/bash -e baseDIR=$(dirname $0) yarn app -stop llap-demo yarn app -destroy llap-demo hdfs dfs -mkdir -p .yarn/package/LLAP hdfs dfs -copyFromLocal -f $baseDIR/llap-29Sep2021.tar.gz .yarn/package/LLAP yarn app -launch llap-demo $baseDIR/Yarnfilellap-${CREATE_DATE}.tar.gz
对 llap-${CREATE_DATE}.tar.gz 解压
bin包含 service 的运行命令
- llap-daemon-env.sh
- llapDaemon.sh
- runLlapDaemon.sh
生成 service 的参数,都以 JSON 的格式放到此文件里。
conf 生成的配置目录。其中 llap-daemon-site.xml 包含 LLAP 的参数。包括core-site.xml hive-site.xml llap-udfs.lst hadoop-metrics2.properties llap-daemon-log4j2.properties tez-site.xml hdfs-site.xml llap-daemon-site.xml yarn-site.xmlllap-daemon-site.xml
可以看到,我们命令中输入的参数生成了 llap 服务配置的参数。
libhive.llap.daemon.service.hosts @llap-demo false hive.llap.io.memory.size 134217728 false hive.llap.daemon.yarn.container.mb 1024 false hive.llap.io.threadpool.size 2 false hive.llap.daemon.num.executors 3 false hive.llap.daemon.memory.per.instance.mb 512 false
lib 目录是运行 llap 的 jar 包。
运行 Service执行 run.sh 文件,可以看到 RerouceManager 上出现了一个 Application。
Port | Parameter | Mean |
---|---|---|
15002 | hive.llap.daemon.web.port | LLAP daemon web UI port. |
15003 | hive.llap.daemon.output.service.port | LLAP daemon output service port |
15004 | hive.llap.management.rpc.port | RPC port for LLAP daemon management service. |
15551 | hive.llap.daemon.yarn.shuffle.port | YARN shuffle port for LLAP-daemon-hosted shuffle. |
0 | hive.llap.daemon.rpc.port | The LLAP daemon RPC port. |
从以下可以看到,每个 LLAP 服务都在 /llap-unsecure 的当前用户下有一个目录。workers目录下有两个文件,一个是 slot 文件,一个是 worker 文件。每个容器一个 slot znode,一个 worker znode。打开 slot znode,有一个 UUID。打开 worker znode,有LLAP 容器的相关信息,并且信息中有 “registry.unique.id”:“34850c09-d8b1-415b-8572-139456d476fc” 和 slot znode 的内容对应。
[zk: localhost:2181(CONNECTED) 6] get /llap-unsecure/user-houzhizhen/llap-demo/workers/slot-0000000000 34850c09-d8b1-415b-8572-139456d476fc [zk: localhost:2181(CONNECTED) 7] get /llap-unsecure/user-houzhizhen/llap-demo/workers/worker-0000000026 {"type":"JSONServiceRecord","external":[{"api":"services","addressType":"uri","protocolType":"webui","addresses":[{"uri":"http://localhost:15002"}]}],"internal":[{"api":"llap","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"46480"}]},{"api":"llapmng","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"15004"}]},{"api":"shuffle","addressType":"host/port","protocolType":"tcp","addresses":[{"host":"localhost","port":"15551"}]},{"api":"llapoutputformat","addressType":"host/port","protocolType":"hadoop/IPC","addresses":[{"host":"localhost","port":"15003"}]}],"hive.llap.daemon.container.id":"container_1632897605333_0007_01_000002","hive.llap.daemon.yarn.container.mb":"2048","hive.llap.auto.auth":"false","hive.llap.io.allocator.mmap":"false","hive.llap.io.use.lrfu":"true","hive.llap.io.memory.size":"134217728","hive.llap.management.rpc.port":"15004","hive.llap.allow.permanent.fns":"true","hive.llap.daemon.rpc.port":"46480","hive.llap.daemon.web.ssl":"false","hive.llap.auto.max.input.size":"10737418240","hive.llap.io.lrfu.lambda":"1.0E-6","hive.llap.daemon.nm.address":"localhost:38742","llap.daemon.metrics.sessionid":"40fc27da-f0d3-458b-9059-d46c8dc32132","hive.llap.auto.enforce.vectorized":"true","hive.llap.daemon.service.refresh.interval.sec":"60s","hive.llap.io.orc.time.counters":"true","hive.llap.auto.max.output.size":"1073741824","hive.llap.io.allocator.direct":"true","registry.unique.id":"34850c09-d8b1-415b-8572-139456d476fc","hive.llap.daemon.web.port":"15002","hive.llap.object.cache.enabled":"true","hive.llap.execution.mode":"all","hive.llap.daemon.yarn.shuffle.port":"15551","hive.llap.daemon.output.service.port":"15003","hive.llap.daemon.download.permanent.fns":"false","hive.llap.io.memory.mode":"cache","hive.llap.daemon.task.scheduler.wait.queue.size":"10","hive.llap.daemon.memory.per.instance.mb":"1024","hive.llap.auto.enforce.tree":"true","hive.llap.io.threadpool.size":"2","hive.llap.daemon.service.hosts":"@llap-demo","hive.llap.auto.enforce.stats":"true","hive.llap.auto.allow.uber":"false","hive.llap.daemon.num.executors":"1"}Hive测试
hive-site.xml 添加以下配置,注意 hive.llap.daemon.service.hosts 必须是 “@” + ${LLAP_SERVICE_NAME}
LLAP 测试hive.execution.engine tez hive.llap.execution.mode all hive.execution.mode llap hive.llap.daemon.service.hosts @llap-demo hive.zookeeper.quorum zk_ip:zk_port hive.llap.daemon.memory.per.instance.mb 2048 hive.llap.daemon.num.executors 2 hive.server2.tez.default.queues root.default hive.server2.tez.initialize.default.sessions true hive.server2.tez.sessions.per.default.queue 2
我们用 tpch-ds 测试,执行三次 query1.sql。
use tpcds_bin_partitioned_orc_2; source query1.sql; source query1.sql; source query1.sql;
我们发现,第一次执行用 7.47 秒,第2次执行用 4.31 秒,第 3 次执行用 4.05 秒。因为第 1 次执行后,LLAP 把一些原始数据缓冲到堆外内存里。
set hive.execution.mode=tez; set hive.llap.execution.mode=none; use tpcds_bin_partitioned_orc_2; source query1.sql; source query1.sql; source query1.sql;
为了公平,测试之前先杀死 LLAP 的资源。
第 1 次运行。
第 2 次运行:
第 3 次运行:
可以看出,每次运行都用 11 秒左右。
- 没办法指定容器的cpu 的 vcores 数量。我们指定 executors 参数,是控制启动后的容器中,启动多少个计算线程,并不控制从 ResourceManager 中申请多少 CPU 资源。向 ResourceManager 申请的 CPU 资源,是生成的 Yarnfile 中的以下参数控制。
"resource": { "cpus": 1, "memory": "1024" },
- 不能在一台服务器上启动两个 LLAP 服务。
因为
- 用户自定义 jar 包。