SET Statement
Specifies values for query options that control the runtime behavior of other statements within the same session.
In CDH 5.7 / Impala 2.5 and higher, SET also defines user-specified substitution variables for the impala-shell interpreter. This feature uses the SET command built into impala-shell instead of the SQL SET statement. Therefore the substitution mechanism only works with queries processed by impala-shell, not with queries submitted through JDBC or ODBC.
In CDH 5.14 / Impala 2.11 and higher, the output of the SET statement changes in some important ways:
-
The options are divided into groups: Regular Query Options, Advanced Query Options, Development Query Options, and Deprecated Query Options.
-
The advanced options are intended for use in specific kinds of performance tuning and debugging scenarios. The development options are related to internal development of Impala or features that are not yet finalized; these options might be changed or removed without notice. The deprecated options are related to features that are removed or changed so that the options no longer have any purpose; these options might be removed in future versions.
-
By default, only the first two groups (regular and advanced) are displayed by the SET command. Use the syntax SET ALL to see all groups of options.
-
impala-shell options and user-specified variables are always displayed at the end of the list of query options, after all appropriate option groups.
-
When the SET command is run through the JDBC or ODBC interfaces, the result set has a new third column, level, indicating which group each option belongs to. The same distinction of SET returning the regular and advanced options, and SET ALL returning all option groups, applies to JDBC and ODBC also.
Syntax:
SET [query_option=option_value]
SET ALL
SET and SET ALL with no arguments return a result set consisting of all the applicable query options and their current values.
The query option name and any string argument values are case-insensitive.
Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and disabled by assigning values of either true and false, or 1 and 0. Some numeric options accept a final character signifying the unit, such as 2g for 2 gigabytes or 100m for 100 megabytes. See Query Options for the SET Statement for the details of each query option.
Setting query options during impala-shell invocation:
In CDH 5.14 / Impala 2.11 and higher, you can use one or more command-line options of the form --query_option=option=value when running the impala-shell command. The corresponding query option settings take effect for that impala-shell session.
User-specified substitution variables:
In CDH 5.7 / Impala 2.5 and higher, you can specify your own names and string substitution values within the impala-shell interpreter. Once a substitution variable is set up, its value is inserted into any SQL statement in that same impala-shell session that contains the notation ${var:varname}. Using SET in an interactive impala-shell session overrides any value for that same variable passed in through the --var=varname=value command-line option.
For example, to set up some default parameters for report queries, but then override those default within an impala-shell session, you might issue commands and statements such as the following:
-- Initial setup for this example. create table staging_table (s string); insert into staging_table values ('foo'), ('bar'), ('bletch'); create table production_table (s string); insert into production_table values ('North America'), ('EMEA'), ('Asia'); quit; -- Start impala-shell with user-specified substitution variables, -- run a query, then override the variables with SET and run the query again. $ impala-shell --var=table_name=staging_table --var=cutoff=2 ... banner message ... [localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; Query: select s from staging_table order by s limit 2 +--------+ | s | +--------+ | bar | | bletch | +--------+ Fetched 2 row(s) in 1.06s [localhost:21000] > set var:table_name=production_table; Variable TABLE_NAME set to production_table [localhost:21000] > set var:cutoff=3; Variable CUTOFF set to 3 [localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; Query: select s from production_table order by s limit 3 +---------------+ | s | +---------------+ | Asia | | EMEA | | North America | +---------------+
The following example shows how SET ALL with no parameters displays all user-specified substitution variables, and how UNSET removes the substitution variable entirely:
[localhost:21000] > set all; Query options (defaults shown in []): ABORT_ON_ERROR: [0] COMPRESSION_CODEC: [] DISABLE_CODEGEN: [0] ... Advanced Query Options: APPX_COUNT_DISTINCT: [0] BUFFER_POOL_LIMIT: [] DEFAULT_JOIN_DISTRIBUTION_MODE: [0] ... Development Query Options: BATCH_SIZE: [0] DEBUG_ACTION: [] DECIMAL_V2: [0] ... Deprecated Query Options: ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0] ALLOW_UNSUPPORTED_FORMATS: [0] DEFAULT_ORDER_BY_LIMIT: [-1] ... Shell Options LIVE_PROGRESS: False LIVE_SUMMARY: False Variables: CUTOFF: 3 TABLE_NAME: staging_table [localhost:21000] > unset var:cutoff; Unsetting variable CUTOFF [localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff}; Error: Unknown variable CUTOFF
See Running Commands and SQL Statements in impala-shell for more examples of using the --var, SET, and ${var:varname} substitution technique in impala-shell.
Usage notes:
MEM_LIMIT is probably the most commonly used query option. You can specify a high value to allow a resource-intensive query to complete. For testing how queries would work on memory-constrained systems, you might specify an artificially low value.
Complex type considerations:
Examples:
The following example sets some numeric and some Boolean query options to control usage of memory, disk space, and timeout periods, then runs a query whose success could depend on the options in effect:
set mem_limit=64g; set DISABLE_UNSAFE_SPILLS=true; set parquet_file_size=400m; set RESERVATION_REQUEST_TIMEOUT=900000; insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
Added in: CDH 5.2.0 / Impala 2.0.0
SET has always been available as an impala-shell command. Promoting it to a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
Cancellation: Cannot be cancelled.
HDFS permissions: This statement does not touch any HDFS files or directories, therefore no HDFS permissions are required.
See Query Options for the SET Statement for the query options you can adjust using this statement.
<< DISTINCT Operator | ©2016 Cloudera, Inc. All rights reserved | Query Options for the SET Statement >> |
Terms and Conditions Privacy Policy |