HDFS Configuration Reference

This reference page describes HDFS configuration values that are configured for HAWQ either within hdfs-site.xml, core-site.xml, or hdfs-client.xml.

HDFS Site Configuration (hdfs-site.xml and core-site.xml)

This topic provides a reference of the HDFS site configuration values recommended for HAWQ installations. These parameters are located in either hdfs-site.xml or core-site.xml of your HDFS deployment.

This table describes the configuration parameters and values that are recommended for HAWQ installations. Only HDFS parameters that need to be modified or customized for HAWQ are listed.

ParameterDescriptionRecommended Value for HAWQ InstallsComments
dfs.allow.truncateAllows truncate.trueHAWQ requires that you enable dfs.allow.truncate. The HAWQ service will fail to start if dfs.allow.truncate is not set to true.
dfs.block.access.token.enableIf true, access tokens are used as capabilities for accessing DataNodes. If false, no access tokens are checked on accessing DataNodes.false for an unsecured HDFS cluster, or true for a secure cluster 
dfs.block.local-path-access.userComma separated list of the users allowed to open block files on legacy short-circuit local read.gpadmin 
dfs.client.read.shortcircuitThis configuration parameter turns on short-circuit local reads.trueIn Ambari, this parameter corresponds to HDFS Short-circuit read. The value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml.
dfs.client.socket-timeoutThe amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds.300000000 
dfs.client.use.legacy.blockreader.localSetting this value to false specifies that the new version of the short-circuit reader is used. Setting this value to true means that the legacy short-circuit reader would be used.false 
dfs.datanode.data.dir.permPermissions for the directories on on the local filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or symbolic.750In Ambari, this parameter corresponds to DataNode directories permission
dfs.datanode.handler.countThe number of server threads for the DataNode.60 
dfs.datanode.max.transfer.threadsSpecifies the maximum number of threads to use for transferring data in and out of the DataNode.40960In Ambari, this parameter corresponds to DataNode max data transfer threads
dfs.datanode.socket.write.timeoutThe amount of time before a write operation times out, expressed in milliseconds.7200000 
dfs.domain.socket.path(Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string “_PORT” is present in this path, it is replaced by the TCP port of the DataNode. If set, the value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml.
dfs.namenode.accesstime.precisionThe access time for HDFS file is precise up to this value. Setting a value of 0 disables access times for HDFS.0In Ambari, this parameter corresponds to Access time precision
dfs.namenode.handler.countThe number of server threads for the NameNode.600 
dfs.support.appendWhether HDFS is allowed to append to files.true 
ipc.client.connection.maxidletimeThe maximum time in milliseconds after which a client will bring down the connection to the server.3600000In core-site.xml
ipc.client.connect.timeoutIndicates the number of milliseconds a client will wait for the socket to establish a server connection.300000In core-site.xml
ipc.server.listen.queue.sizeIndicates the length of the listen queue for servers accepting client connections.3300In core-site.xml

HDFS Client Configuration (hdfs-client.xml)

This topic provides a reference of the HAWQ configuration values located in $GPHOME/etc/hdfs-client.xml.

This table describes the configuration parameters and their default values:

ParameterDescriptionDefault ValueComments
dfs.client.failover.max.attemptsThe maximum number of times that the DFS client retries issuing a RPC call when multiple NameNodes are configured.15 
dfs.client.log.severityThe minimal log severity level. Valid values include: FATAL, ERROR, INFO, DEBUG1, DEBUG2, and DEBUG3.INFO 
dfs.client.read.shortcircuitDetermines whether the DataNode is bypassed when reading file blocks, if the block and client are on the same node. The default value, true, bypasses the DataNode.trueThe value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml.
dfs.client.use.legacy.blockreader.localDetermines whether the legacy short-circuit reader implementation, based on HDFS-2246, is used. Set this property to true on non-Linux platforms that do not have the new implementation based on HDFS-347.false 
dfs.default.blocksizeDefault block size, in bytes.134217728Default is equivalent to 128 MB. 
dfs.default.replicaThe default number of replicas.3 
dfs.domain.socket.path(Optional.) The path to a UNIX domain socket to use for communication between the DataNode and local HDFS clients. If the string “_PORT” is present in this path, it is replaced by the TCP port of the DataNode. If set, the value for this parameter should be the same in hdfs-site.xml and HAWQ’s hdfs-client.xml.
dfs.prefetchsizeThe number of blocks for which information is pre-fetched.10 
hadoop.security.authenticationSpecifies the type of RPC authentication to use. A value of simple indicates no authentication. A value of kerberos enables authentication by Kerberos.simple 
input.connect.timeoutThe timeout interval, in milliseconds, for when the input stream is setting up a connection to a DataNode.600000 Default is equal to 1 hour.
input.localread.blockinfo.cachesizeThe size of the file block path information cache, in bytes.1000 
input.localread.default.buffersizeThe size of the buffer, in bytes, used to hold data from the file block and verify the checksum. This value is used only when dfs.client.read.shortcircuit is set to true.1048576Default is equal to 1MB. Only used when is set to true.

If an older version of hdfs-client.xml is retained during upgrade, to avoid performance degradation, set the input.localread.default.buffersize to 2097152. 

input.read.getblockinfo.retryThe maximum number of times the client should retry getting block information from the NameNode.3
input.read.timeoutThe timeout interval, in milliseconds, for when the input stream is reading from a DataNode.3600000Default is equal to 1 hour.
input.write.timeoutThe timeout interval, in milliseconds, for when the input stream is writing to a DataNode.3600000 
output.close.timeoutThe timeout interval for closing an output stream, in milliseconds.900000Default is equal to 1.5 hours.
output.connect.timeoutThe timeout interval, in milliseconds, for when the output stream is setting up a connection to a DataNode.600000Default is equal to 10 minutes.
output.default.chunksizeThe chunk size of the pipeline, in bytes.512 
output.default.packetsizeThe packet size of the pipeline, in bytes.65536Default is equal to 64KB. 
output.default.write.retryThe maximum number of times that the client should reattempt to set up a failed pipeline.10 
output.packetpool.sizeThe maximum number of packets in a file’s packet pool.1024 
output.read.timeoutThe timeout interval, in milliseconds, for when the output stream is reading from a DataNode.3600000Default is equal to 1 hour. 
output.replace-datanode-on-failureDetermines whether the client adds a new DataNode to pipeline if the number of nodes in the pipeline is less than the specified number of replicas.false (if # of nodes less than or equal to 4), otherwise trueWhen you deploy a HAWQ cluster, the hawq init utility detects the number of nodes in the cluster and updates this configuration parameter accordingly. However, when expanding an existing cluster to 4 or more nodes, you must manually set this value to true. Set to false if you remove existing nodes and fall under 4 nodes.
output.write.timeoutThe timeout interval, in milliseconds, for when the output stream is writing to a DataNode.3600000Default is equal to 1 hour.
rpc.client.connect.retryThe maximum number of times to retry a connection if the RPC client fails connect to the server.10 
rpc.client.connect.tcpnodelayDetermines whether TCP_NODELAY is used when connecting to the RPC server.true 
rpc.client.connect.timeoutThe timeout interval for establishing the RPC client connection, in milliseconds.600000Default equals 10 minutes.
rpc.client.max.idleThe maximum idle time for an RPC connection, in milliseconds.10000Default equals 10 seconds.
rpc.client.ping.intervalThe interval which the RPC client send a heart beat to server. 0 means disable.10000 
rpc.client.read.timeoutThe timeout interval, in milliseconds, for when the RPC client is reading from the server.3600000Default equals 1 hour.
rpc.client.socket.linger.timeoutThe value to set for the SO_LINGER socket when connecting to the RPC server.-1 
rpc.client.timeoutThe timeout interval of an RPC invocation, in milliseconds.3600000Default equals 1 hour.
rpc.client.write.timeoutThe timeout interval, in milliseconds, for when the RPC client is writing to the server.3600000Default equals 1 hour.