Help wanted!

The following content of this documentation page has been machine-translated. But unlike other websites, it is not done on the fly. This translated text lives on GitHub repository alongside main ClickHouse codebase and waits for fellow native speakers to make it more human-readable. You can also use the original English version as a reference.

Help ClickHouse documentation by editing this page

外部字典的来源

外部字典可以从许多不同的来源连接。

如果使用xml-file配置字典,则配置如下所示:

  1. <yandex>
  2. <dictionary>
  3. ...
  4. <source>
  5. <source_type>
  6. <!-- Source configuration -->
  7. </source_type>
  8. </source>
  9. ...
  10. </dictionary>
  11. ...
  12. </yandex>

在情况下 DDL-查询,相等的配置将看起来像:

  1. CREATE DICTIONARY dict_name (...)
  2. ...
  3. SOURCE(SOURCE_TYPE(param1 val1 ... paramN valN)) -- Source configuration
  4. ...

源配置在 source 科。

对于源类型 本地文件, 可执行文件, HTTP(s), ClickHouse
可选设置:

  1. <source>
  2. <file>
  3. <path>/opt/dictionaries/os.tsv</path>
  4. <format>TabSeparated</format>
  5. </file>
  6. <settings>
  7. <format_csv_allow_single_quotes>0</format_csv_allow_single_quotes>
  8. </settings>
  9. </source>

  1. SOURCE(FILE(path '/opt/dictionaries/os.tsv' format 'TabSeparated'))
  2. SETTINGS(format_csv_allow_single_quotes = 0)

来源类型 (source_type):

本地文件

设置示例:

  1. <source>
  2. <file>
  3. <path>/opt/dictionaries/os.tsv</path>
  4. <format>TabSeparated</format>
  5. </file>
  6. </source>

  1. SOURCE(FILE(path '/opt/dictionaries/os.tsv' format 'TabSeparated'))

设置字段:

  • path – The absolute path to the file.
  • format – The file format. All the formats described in “格式” 支持。

可执行文件

使用可执行文件取决于 字典如何存储在内存中. 如果字典存储使用 cachecomplex_key_cache,ClickHouse通过向可执行文件的STDIN发送请求来请求必要的密钥。 否则,ClickHouse将启动可执行文件并将其输出视为字典数据。

设置示例:

  1. <source>
  2. <executable>
  3. <command>cat /opt/dictionaries/os.tsv</command>
  4. <format>TabSeparated</format>
  5. </executable>
  6. </source>

  1. SOURCE(EXECUTABLE(command 'cat /opt/dictionaries/os.tsv' format 'TabSeparated'))

设置字段:

  • command – The absolute path to the executable file, or the file name (if the program directory is written to PATH).
  • format – The file format. All the formats described in “格式” 支持。

Http(s)

使用HTTP(s)服务器取决于 字典如何存储在内存中. 如果字典存储使用 cachecomplex_key_cache,ClickHouse通过通过发送请求请求必要的密钥 POST 方法。

设置示例:

  1. <source>
  2. <http>
  3. <url>http://[::1]/os.tsv</url>
  4. <format>TabSeparated</format>
  5. <credentials>
  6. <user>user</user>
  7. <password>password</password>
  8. </credentials>
  9. <headers>
  10. <header>
  11. <name>API-KEY</name>
  12. <value>key</value>
  13. </header>
  14. </headers>
  15. </http>
  16. </source>

  1. SOURCE(HTTP(
  2. url 'http://[::1]/os.tsv'
  3. format 'TabSeparated'
  4. credentials(user 'user' password 'password')
  5. headers(header(name 'API-KEY' value 'key'))
  6. ))

为了让ClickHouse访问HTTPS资源,您必须 配置openSSL 在服务器配置中。

设置字段:

  • url – The source URL.
  • format – The file format. All the formats described in “格式” 支持。
  • credentials – Basic HTTP authentication. Optional parameter.
    • user – Username required for the authentication.
    • password – Password required for the authentication.
  • headers – All custom HTTP headers entries used for the HTTP request. Optional parameter.
    • header – Single HTTP header entry.
    • name – Identifiant name used for the header send on the request.
    • value – Value set for a specific identifiant name.

ODBC

您可以使用此方法连接具有ODBC驱动程序的任何数据库。

设置示例:

  1. <source>
  2. <odbc>
  3. <db>DatabaseName</db>
  4. <table>ShemaName.TableName</table>
  5. <connection_string>DSN=some_parameters</connection_string>
  6. <invalidate_query>SQL_QUERY</invalidate_query>
  7. </odbc>
  8. </source>

  1. SOURCE(ODBC(
  2. db 'DatabaseName'
  3. table 'SchemaName.TableName'
  4. connection_string 'DSN=some_parameters'
  5. invalidate_query 'SQL_QUERY'
  6. ))

设置字段:

  • db – Name of the database. Omit it if the database name is set in the <connection_string> 参数。
  • table – Name of the table and schema if exists.
  • connection_string – Connection string.
  • invalidate_query – Query for checking the dictionary status. Optional parameter. Read more in the section 更新字典.

ClickHouse接收来自ODBC-driver的引用符号,并将查询中的所有设置引用到driver,因此有必要根据数据库中的表名大小写设置表名。

如果您在使用Oracle时遇到编码问题,请参阅相应的 FAQ 文章.

ODBC字典功能的已知漏洞

注意

通过ODBC驱动程序连接参数连接到数据库时 Servername 可以取代。 在这种情况下,值 USERNAMEPASSWORDodbc.ini 被发送到远程服务器,并且可能会受到损害。

不安全使用示例

让我们为PostgreSQL配置unixODBC。 的内容 /etc/odbc.ini:

  1. [gregtest]
  2. Driver = /usr/lib/psqlodbca.so
  3. Servername = localhost
  4. PORT = 5432
  5. DATABASE = test_db
  6. #OPTION = 3
  7. USERNAME = test
  8. PASSWORD = test

如果然后进行查询,例如

  1. SELECT * FROM odbc('DSN=gregtest;Servername=some-server.com', 'test_db');

ODBC驱动程序将发送的值 USERNAMEPASSWORDodbc.inisome-server.com.

连接Postgresql的示例

Ubuntu操作系统。

为PostgreSQL安装unixODBC和ODBC驱动程序:

  1. $ sudo apt-get install -y unixodbc odbcinst odbc-postgresql

配置 /etc/odbc.ini (或 ~/.odbc.ini):

  1. [DEFAULT]
  2. Driver = myconnection
  3. [myconnection]
  4. Description = PostgreSQL connection to my_db
  5. Driver = PostgreSQL Unicode
  6. Database = my_db
  7. Servername = 127.0.0.1
  8. UserName = username
  9. Password = password
  10. Port = 5432
  11. Protocol = 9.3
  12. ReadOnly = No
  13. RowVersioning = No
  14. ShowSystemTables = No
  15. ConnSettings =

ClickHouse中的字典配置:

  1. <yandex>
  2. <dictionary>
  3. <name>table_name</name>
  4. <source>
  5. <odbc>
  6. <!-- You can specify the following parameters in connection_string: -->
  7. <!-- DSN=myconnection;UID=username;PWD=password;HOST=127.0.0.1;PORT=5432;DATABASE=my_db -->
  8. <connection_string>DSN=myconnection</connection_string>
  9. <table>postgresql_table</table>
  10. </odbc>
  11. </source>
  12. <lifetime>
  13. <min>300</min>
  14. <max>360</max>
  15. </lifetime>
  16. <layout>
  17. <hashed/>
  18. </layout>
  19. <structure>
  20. <id>
  21. <name>id</name>
  22. </id>
  23. <attribute>
  24. <name>some_column</name>
  25. <type>UInt64</type>
  26. <null_value>0</null_value>
  27. </attribute>
  28. </structure>
  29. </dictionary>
  30. </yandex>

  1. CREATE DICTIONARY table_name (
  2. id UInt64,
  3. some_column UInt64 DEFAULT 0
  4. )
  5. PRIMARY KEY id
  6. SOURCE(ODBC(connection_string 'DSN=myconnection' table 'postgresql_table'))
  7. LAYOUT(HASHED())
  8. LIFETIME(MIN 300 MAX 360)

您可能需要编辑 odbc.ini 使用驱动程序指定库的完整路径 DRIVER=/usr/local/lib/psqlodbcw.so.

连接MS SQL Server的示例

Ubuntu操作系统。

安装驱动程序: :

  1. $ sudo apt-get install tdsodbc freetds-bin sqsh

配置驱动程序:

  1. $ cat /etc/freetds/freetds.conf
  2. ...
  3. [MSSQL]
  4. host = 192.168.56.101
  5. port = 1433
  6. tds version = 7.0
  7. client charset = UTF-8
  8. $ cat /etc/odbcinst.ini
  9. ...
  10. [FreeTDS]
  11. Description = FreeTDS
  12. Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
  13. Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
  14. FileUsage = 1
  15. UsageCount = 5
  16. $ cat ~/.odbc.ini
  17. ...
  18. [MSSQL]
  19. Description = FreeTDS
  20. Driver = FreeTDS
  21. Servername = MSSQL
  22. Database = test
  23. UID = test
  24. PWD = test
  25. Port = 1433

在ClickHouse中配置字典:

  1. <yandex>
  2. <dictionary>
  3. <name>test</name>
  4. <source>
  5. <odbc>
  6. <table>dict</table>
  7. <connection_string>DSN=MSSQL;UID=test;PWD=test</connection_string>
  8. </odbc>
  9. </source>
  10. <lifetime>
  11. <min>300</min>
  12. <max>360</max>
  13. </lifetime>
  14. <layout>
  15. <flat />
  16. </layout>
  17. <structure>
  18. <id>
  19. <name>k</name>
  20. </id>
  21. <attribute>
  22. <name>s</name>
  23. <type>String</type>
  24. <null_value></null_value>
  25. </attribute>
  26. </structure>
  27. </dictionary>
  28. </yandex>

  1. CREATE DICTIONARY test (
  2. k UInt64,
  3. s String DEFAULT ''
  4. )
  5. PRIMARY KEY k
  6. SOURCE(ODBC(table 'dict' connection_string 'DSN=MSSQL;UID=test;PWD=test'))
  7. LAYOUT(FLAT())
  8. LIFETIME(MIN 300 MAX 360)

DBMS

Mysql

设置示例:

  1. <source>
  2. <mysql>
  3. <port>3306</port>
  4. <user>clickhouse</user>
  5. <password>qwerty</password>
  6. <replica>
  7. <host>example01-1</host>
  8. <priority>1</priority>
  9. </replica>
  10. <replica>
  11. <host>example01-2</host>
  12. <priority>1</priority>
  13. </replica>
  14. <db>db_name</db>
  15. <table>table_name</table>
  16. <where>id=10</where>
  17. <invalidate_query>SQL_QUERY</invalidate_query>
  18. </mysql>
  19. </source>

  1. SOURCE(MYSQL(
  2. port 3306
  3. user 'clickhouse'
  4. password 'qwerty'
  5. replica(host 'example01-1' priority 1)
  6. replica(host 'example01-2' priority 1)
  7. db 'db_name'
  8. table 'table_name'
  9. where 'id=10'
  10. invalidate_query 'SQL_QUERY'
  11. ))

设置字段:

  • port – The port on the MySQL server. You can specify it for all replicas, or for each one individually (inside <replica>).

  • user – Name of the MySQL user. You can specify it for all replicas, or for each one individually (inside <replica>).

  • password – Password of the MySQL user. You can specify it for all replicas, or for each one individually (inside <replica>).

  • replica – Section of replica configurations. There can be multiple sections.

    1. - `replica/host` The MySQL host.
    2. - `replica/priority` The replica priority. When attempting to connect, ClickHouse traverses the replicas in order of priority. The lower the number, the higher the priority.
  • db – Name of the database.

  • table – Name of the table.

  • where – The selection criteria. The syntax for conditions is the same as for WHERE 例如,mysql中的子句, id > 10 AND id < 20. 可选参数。

  • invalidate_query – Query for checking the dictionary status. Optional parameter. Read more in the section 更新字典.

MySQL可以通过套接字在本地主机上连接。 要做到这一点,设置 hostsocket.

设置示例:

  1. <source>
  2. <mysql>
  3. <host>localhost</host>
  4. <socket>/path/to/socket/file.sock</socket>
  5. <user>clickhouse</user>
  6. <password>qwerty</password>
  7. <db>db_name</db>
  8. <table>table_name</table>
  9. <where>id=10</where>
  10. <invalidate_query>SQL_QUERY</invalidate_query>
  11. </mysql>
  12. </source>

  1. SOURCE(MYSQL(
  2. host 'localhost'
  3. socket '/path/to/socket/file.sock'
  4. user 'clickhouse'
  5. password 'qwerty'
  6. db 'db_name'
  7. table 'table_name'
  8. where 'id=10'
  9. invalidate_query 'SQL_QUERY'
  10. ))

ClickHouse

设置示例:

  1. <source>
  2. <clickhouse>
  3. <host>example01-01-1</host>
  4. <port>9000</port>
  5. <user>default</user>
  6. <password></password>
  7. <db>default</db>
  8. <table>ids</table>
  9. <where>id=10</where>
  10. </clickhouse>
  11. </source>

  1. SOURCE(CLICKHOUSE(
  2. host 'example01-01-1'
  3. port 9000
  4. user 'default'
  5. password ''
  6. db 'default'
  7. table 'ids'
  8. where 'id=10'
  9. ))

设置字段:

  • host – The ClickHouse host. If it is a local host, the query is processed without any network activity. To improve fault tolerance, you can create a 分布 表并在后续配置中输入它。
  • port – The port on the ClickHouse server.
  • user – Name of the ClickHouse user.
  • password – Password of the ClickHouse user.
  • db – Name of the database.
  • table – Name of the table.
  • where – The selection criteria. May be omitted.
  • invalidate_query – Query for checking the dictionary status. Optional parameter. Read more in the section 更新字典.

Mongodb

设置示例:

  1. <source>
  2. <mongodb>
  3. <host>localhost</host>
  4. <port>27017</port>
  5. <user></user>
  6. <password></password>
  7. <db>test</db>
  8. <collection>dictionary_source</collection>
  9. </mongodb>
  10. </source>

  1. SOURCE(MONGO(
  2. host 'localhost'
  3. port 27017
  4. user ''
  5. password ''
  6. db 'test'
  7. collection 'dictionary_source'
  8. ))

设置字段:

  • host – The MongoDB host.
  • port – The port on the MongoDB server.
  • user – Name of the MongoDB user.
  • password – Password of the MongoDB user.
  • db – Name of the database.
  • collection – Name of the collection.

Redis

设置示例:

  1. <source>
  2. <redis>
  3. <host>localhost</host>
  4. <port>6379</port>
  5. <storage_type>simple</storage_type>
  6. <db_index>0</db_index>
  7. </redis>
  8. </source>

  1. SOURCE(REDIS(
  2. host 'localhost'
  3. port 6379
  4. storage_type 'simple'
  5. db_index 0
  6. ))

设置字段:

  • host – The Redis host.
  • port – The port on the Redis server.
  • storage_type – The structure of internal Redis storage using for work with keys. simple 适用于简单源和散列单键源, hash_map 用于具有两个键的散列源。 不支持具有复杂键的范围源和缓存源。 可以省略,默认值为 simple.
  • db_index – The specific numeric index of Redis logical database. May be omitted, default value is 0.

原始文章