Creating a local table

There are 2 different approaches to deal with tables in Manticore:

Online schema management (RT mode)

Real-time mode requires no table definition in the configuration file, but presence of data_dir directive in searchd section is mandatory. Index files are stored inside the data_dir.

Replication is available only in this mode.

In this mode you can use SQL commands like CREATE TABLE, ALTER TABLE and DROP TABLE to create and change table schema and drop it. This mode is especially useful for real-time and percolate tables.

Table names are case insensitive in the RT mode.

Defining table schema in config (Plain mode)

In this mode you can specify table schema in config which will be read on Manticore start and if the table doesn’t exist yet it will be created. This mode is especially useful for plain tables that are built upon indexing data from an external storage.

Dropping tables is only possible by removing them from the configuration file or by removing the path setting and sending a HUP signal to the server or restarting it.

Table names are case sensitive in this mode.

All table types are supported in this mode.

Table types and modes

Index typeRT modePlain mode
Real-timesupportedsupported
Plainnot supportedsupported
Percolatesupportedsupported
Distributedsupportedsupported
Templatenot supportedsupported

Real-time table

Real-time table is the main type of tables in Manticore. It allows adding, updating and deleting documents with immediate availability of the changes. Real-time table settings can be defined in a configuration file or online via CREATE/UPDATE/DELETE/ALTER commands.

Real-time table internally consists of one or multiple plain tables called chunks. There can be:

  • multiple disk chunks. They are stored on disk with the same structure as any plain table
  • single ram chunk. Stored in memory and used as an accumulator of changes

RAM chunk size is controlled by rt_mem_limit. Once the limit is exceeded the RAM chunk is flushed to disk in a form of a disk chunk. When there are too many disk chunks they can be merged into one for better performance using command OPTIMIZE.

  • SQL
  • JSON
  • PHP
  • Python
  • Javascript
  • Java
  • CONFIG

SQL JSON PHP Python Javascript Java CONFIG

  1. CREATE TABLE products(title text, price float) morphology='stem_en';
  1. POST /cli -d "CREATE TABLE products(title text, price float) morphology='stem_en'"
  1. $index = new \Manticoresearch\Index($client);
  2. $index->setName('products');
  3. $index->create([
  4. 'title'=>['type'=>'text'],
  5. 'price'=>['type'=>'float'],
  6. ]);
  1. utilsApi.sql('CREATE TABLE forum(title text, price float)')
  1. res = await utilsApi.sql('CREATE TABLE forum(title text, price float)');
  1. utilsApi.sql("CREATE TABLE forum(title text, price float)");
  1. table products {
  2. type = rt
  3. path = tbl
  4. rt_field = title
  5. rt_attr_uint = price
  6. stored_fields = title
  7. }

Response

  1. Query OK, 0 rows affected (0.00 sec)
  1. {
  2. "total":0,
  3. "error":"",
  4. "warning":""
  5. }
Creating a real-time table via JSON over HTTP:

👍 What you can do with a real-time table:

⛔ What you cannot do with a real-time table:

  • Index data with help of indexer
  • Link it with sources for easy indexing from external storages
  • Update it’s killlist_target, it’s just not needed as the real-time table takes controls of it automatically

Real-time table files structure

ExtensionDescription
.locklock file
.ramRAM chunk
.metaRT table headers
..spdisk chunks (see plain table format)

Plain table

Plain table is a basic element for non-percolate searching. It can be specified only in a configuration file in the Plain mode). It’s not supported in the RT mode). It’s normally used together with a source to process data from an external storage and afterwards can be attached to a real-time table.

👍 What you can do with a plain table:

⛔ What you cannot do with a plain table:

  • insert more data into a table after it’s built
  • delete data from it
  • create/delete/alter a plain table online (you need to define it in a configuration file)
  • use UUID for automatic ID generation. When you fetch data from an external storage it must include a unique identifier for each document

Except numeric attributes (including MVA)), the rest of the data in a plain table is immutable. If you need to update/add new records you need to rebuild the table. While table is being rebuilt, existing table is still available for serving requests. When a new version of the table is ready, a process called rotation is performed which puts the new version online and discards the old one.

  • Plain table example

Plain table example

A plain table can be only defined in a configuration file. It’s not supported by command CREATE TABLE

  1. source source {
  2. type = mysql
  3. sql_host = localhost
  4. sql_user = myuser
  5. sql_pass = mypass
  6. sql_db = mydb
  7. sql_query = SELECT id, title, description, category_id from mytable
  8. sql_attr_uint = category_id
  9. sql_field_string = title
  10. }
  11. table tbl {
  12. type = plain
  13. source = source
  14. path = /path/to/table
  15. }

Plain table building performance

Speed of plain indexing depends on several factors:

  • how fast the source can be providing the data
  • tokenization settings
  • your hardware (CPU, amount of RAM, disk performance)

Plain table building scenarios

Rebuild fully when needed

In the simplest usage scenario, we would use a single plain table which we just fully rebuild from time to time. It works fine for smaller data sets and if you are ready that:

  • the table will be not as fresh as data in the source
  • indexing duration grows with the data, the more data you have in the source the longer it will take to build the table
Main+delta

If you have a bigger data set and still want to use a plain table rather than Real-Time what you can do is:

  • make another smaller table for incremental indexing
  • combine the both using a distributed table

What it can give is you can rebuild the bigger table seldom (say once per week), save the position of the freshest indexed document and after that use the smaller table to process anything new or updated from your source. Since you will only need to fetch the updates from your storage you can do it much more frequently (say once per minute or even each few seconds).

But after a while the smaller indexing duration will become too high and that will be the moment when you need to rebuild the bigger table and empty the smaller one.

This is called main+delta schema and you can learn more about it in this interactive course.

When you build a smaller “delta” table it can get documents that are already in the “main” table. To let Manticore know that documents from the current table should take precedence there’s a mechanism called kill list and corresponding directive killlist_target.

More information on this topic can be found here.

Plain table files structure

ExtensionDescription
.spastores document attributes in row-wise mode
.spbstores blob attributes in row-wise mode: strings, MVA, json
.spcstores document attributes in columnar mode
.spdstores matching document ID lists for each word ID
.sphstores table header information
.sphistores histograms of attribute values
.spistores word lists (word IDs and pointers to .spd file)
.spidxstores secondary indexes data
.spkstores kill-lists
.spllock file
.spmstores a bitmap of killed documents
.sppstores hit (aka posting, aka word occurrence) lists for each word ID
.sptstores additional data structures to speed up lookups by document ids
.spestores skip-lists to speed up doc-list filtering
.spdsstores document texts
.tmptemporary files during index_settings_and_status
.new.spnew version of a plain table before rotation
.old.sp*old version of a plain table after rotation