Reporting bugs

Unfortunately, Manticore is not yet 100% bug-free (although we are working hard towards that goal). You may occasionally encounter some issues.

It is very important to report as much information as possible about each bug. To fix a bug, we need to either reproduce and fix it or deduce what is causing it based on the information you provide. Therefore, here are some instructions on how to do that.

Bug-tracker

We track bugs and feature requests on Github. Feel free to create a new ticket and describe your bug in detail so that you and the developers can save time.

Documentation updates

Updates to the documentation (what you are reading now) are also done on Github.

Crashes

Manticore is written in C++, a low-level programming language that allows for direct communication with the computer for faster performance. The drawback of that is that in rare cases, it may not be possible to elegantly handle a bug by writing an error to a log and skipping the processing of the command that caused the problem. Instead, the program may crash, which means it will stop completely and need to be restarted.

When Manticore Search crashes, you need to let the Manticore team know about it by making a bug report on GitHub, or if you use Manticore’s professional services in your private helpdesk. The Manticore team needs the following information:

  1. Searchd log
  2. Coredump
  3. Query log

It will be great if you additionally do the following:

  1. Run gdb to inspect the coredump:

    1. gdb /usr/bin/searchd </path/to/coredump>
  2. Find crashed thread id in the coredump file name (make sure you have %p in /proc/sys/kernel/core_pattern), e.g. core.work_6.29050.server_name.1637586599 means thread_id=29050

  3. In gdb run:

    1. set pagination off
    2. info threads
    3. # find thread number by it's id (e.g. for `LWP 29050` it will be thread number 8
    4. thread apply all bt
    5. thread <thread number>
    6. bt full
    7. info locals
    8. quit
  4. Provide the outputs

What do I do when Manticore Search hangs?

You need to run gdb manually and collect some info that may be useful to understand why it’s hanging.

  1. show threads option format=all run trough a VIP port

  2. collect lsof output since hanging can be caused by too many connections or open file descriptors

    1. lsof -p `cat /var/run/manticore/searchd.pid`
  3. dump core

    1. gcore `cat /var/run/manticore/searchd.pid`

    (it will save the dump to the current dir)

  4. Install and run gdb:

    1. gdb /usr/bin/searchd `cat /var/run/manticore/searchd.pid`

    Note it will halt your running searchd, but if it’s alredy hanging it shouldn’t be a problem.

  5. In gdb run:

    1. set pagination off
    2. info threads
    3. thread apply all bt
    4. quit
  6. Collect all the outputs and files and provide them in a bug report.

For experts: the macros added in this commit can be helpful to debug.

How to enable saving coredumps on crash?

  1. [root@srv lib]# systemctl set-environment _ADDITIONAL_SEARCHD_PARAMS='--coredump'
  2. [root@srv lib]# systemctl restart manticore
  3. [root@srv lib]# ps aux|grep searchd
  4. mantico+ 1955 0.0 0.0 61964 1580 ? S 11:02 0:00 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --coredump
  5. mantico+ 1956 0.6 0.0 392744 2664 ? Sl 11:02 0:00 /usr/bin/searchd --config /etc/manticoresearch/manticore.conf --coredump
  • make sure that your OS allows you to save coredumps: /proc/sys/kernel/core_pattern should be non-empty - this is where it will save them. If you do:

    1. echo "/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern

    it will instruct your kernel to save them to file like core.searchd.1773.centos-4gb-hel1-1.1636454937

  • searchd should be started with ulimit -c unlimited, but if you start Manticore via systemctl it does it for yourself since it does:

    1. [root@srv lib]# grep CORE /lib/systemd/system/manticore.service
    2. LimitCORE=infinity

How do I install debug symbols?

Manticore Search and Manticore Columnar Library are written in C++, which means that when you run them, you get a compiled, compact binary file that executes optimally on your operating system. However, when you run a binary, your system does not have full access to the names of variables, functions, methods, and classes that are implemented. All of this information is provided separately in something called “debuginfo” or “symbol packages.”

Debug symbols are useful for troubleshooting and other debugging purposes, as they allow you to visualize the state of the system when it crashed, including the names of functions, when you have symbols and your binary crashes. Manticore Search provides a backtrace in the searchd log and also generates a coredump if it was run with the --coredump flag. Without symbols, all you get is internal offsets, which can be difficult or impossible to decode. Therefore, if you need to make a bug report about a crash, the Manticore team will often need debug symbols in order to help you.

To install Manticore Search/Manticore Columnar Library debug symbols, you will need to install the *debuginfo* package for CentOS, the *dbgsym* package for Ubuntu and Debian, or the *dbgsymbols* package for Windows and macOS. These packages should be of the same version as the version of Manticore that you are running. For example, if you’ve installed Manticore Search in Centos 8 from the package https://repo.manticoresearch.com/repository/manticoresearch/release/centos/8/x86_64/manticore-4.0.2_210921.af497f245-1.el8.x86_64.rpm , the corresponding package with symbols would be https://repo.manticoresearch.com/repository/manticoresearch/release/centos/8/x86_64/manticore-debuginfo-4.0.2_210921.af497f245-1.el8.x86_64.rpm

Note that both packages have the same commit id af497f245, which corresponds to the commit that this version was built from.

If you have installed Manticore from a Manticore APT/YUM repository, you can use one of the following tools:

  • debuginfo-install in CentOS 7
  • dnf debuginfo-install CentOS 8
  • find-dbgsym-packages in Debian and Ubuntu

to find a debug symbols package for you.

How to check the debug symbols exist?

  1. Find build id in file /usr/bin/searchd output:
  1. [root@srv lib]# file /usr/bin/searchd
  2. /usr/bin/searchd: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=2c582e9f564ea1fbeb0c68406c271ba27034a6d3, stripped

In this case it’s 2c582e9f564ea1fbeb0c68406c271ba27034a6d3.

  1. Find symbols in /usr/lib/debug/.build-id like this:
  1. [root@srv ~]# ls -la /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3*
  2. lrwxrwxrwx. 1 root root 23 Nov 9 10:42 /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3 -> ../../../../bin/searchd
  3. lrwxrwxrwx. 1 root root 27 Nov 9 10:42 /usr/lib/debug/.build-id/2c/582e9f564ea1fbeb0c68406c271ba27034a6d3.debug -> ../../usr/bin/searchd.debug

Uploading your data

To fix your bug, developers often need to reproduce it locally. To do this, they need your configuration file, table files, binlog (if present), and sometimes source data (such as data from external storages or XML/CSV files) and queries.

Attach your data when you create a ticket on Github. If it is too large or the data is sensitive, you can feel free to upload it to our write-only S3 storage s3://s3.manticoresearch.com/write-only/. Here’s how you can do it using Minio client:

  1. Install the client https://min.io/docs/minio/linux/reference/minio-mc.html#install-mc
  2. Add our s3 host: mc config host add manticore http://s3.manticoresearch.com:9000 manticore manticore
  3. Copy your files: mc cp -r issue-1234/ manticore/write-only/issue-1234 . Make sure you make the folder name unique, best if it corresponds to the issue on GitHub where you described the bug

DEBUG

  1. DEBUG [ subcommand ]

DEBUG statement is designed to call different internal or vip commands for dev/testing purposes. It is not intended for production automation, since the syntax of subcommand part may be freely changed in any build.

Call DEBUG without params to show a list of useful commands (in general) and subcommands (of DEBUG statement) available at the current context.

  1. mysql> debug;
  2. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  3. | command | meaning |
  4. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  5. | flush logs | emulate USR1 signal |
  6. | reload indexes | emulate HUP signal |
  7. | debug token <password> | calculate token for password |
  8. | debug malloc_stats | perform 'malloc_stats', result in searchd.log |
  9. | debug malloc_trim | pefrorm 'malloc_trim' call |
  10. | debug sleep <N> | sleep for <N> seconds |
  11. | debug tasks | display global tasks stat (use select from @@system.tasks instead) |
  12. | debug sched | display task manager schedule (use select from @@system.sched instead) |
  13. | debug merge <TBL> [chunk] <X> [into] [chunk] <Y> [option sync=1,byid=0] | For RT table <TBL> merge disk chunk X into disk chunk Y |
  14. | debug drop [chunk] <X> [from] <TBL> [option sync=1] | For RT table <TBL> drop disk chunk X |
  15. | debug files <TBL> [option format=all|external] | list files belonging to <TBL>. 'all' - including external (wordforms, stopwords, etc.) |
  16. | debug close | ask server to close connection from it's side |
  17. | debug compress <TBL> [chunk] <X> [option sync=1] | Compress disk chunk X of RT table <TBL> (wipe out deleted documents) |
  18. | debug split <TBL> [chunk] <X> on @<uservar> [option sync=1] | Split disk chunk X of RT table <TBL> using set of DocIDs from @uservar |
  19. | debug wait <cluster> [like 'xx'] [option timeout=3] | wait <cluster> ready, but no more than 3 secs. |
  20. | debug wait <cluster> status <N> [like 'xx'] [option timeout=13] | wait <cluster> commit achieve <N>, but no more than 13 secs |
  21. | debug meta | Show max_matches/pseudo_shards. Needs set profiling=1 |
  22. | debug trace OFF|'path/to/file' [<N>] | trace flow to file until N bytes written, or 'trace OFF' |
  23. | debug curl <URL> | request given url via libcurl |
  24. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  25. 19 rows in set (0.00 sec)

Same from VIP connection:

  1. mysql> debug;
  2. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  3. | command | meaning |
  4. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  5. | flush logs | emulate USR1 signal |
  6. | reload indexes | emulate HUP signal |
  7. | debug shutdown <password> | emulate TERM signal |
  8. | debug crash <password> | crash daemon (make SIGSEGV action) |
  9. | debug token <password> | calculate token for password |
  10. | debug malloc_stats | perform 'malloc_stats', result in searchd.log |
  11. | debug malloc_trim | pefrorm 'malloc_trim' call |
  12. | debug procdump | ask watchdog to dump us |
  13. | debug setgdb on|off | enable or disable potentially dangerous crash dumping with gdb |
  14. | debug setgdb status | show current mode of gdb dumping |
  15. | debug sleep <N> | sleep for <N> seconds |
  16. | debug tasks | display global tasks stat (use select from @@system.tasks instead) |
  17. | debug sched | display task manager schedule (use select from @@system.sched instead) |
  18. | debug merge <TBL> [chunk] <X> [into] [chunk] <Y> [option sync=1,byid=0] | For RT table <TBL> merge disk chunk X into disk chunk Y |
  19. | debug drop [chunk] <X> [from] <TBL> [option sync=1] | For RT table <TBL> drop disk chunk X |
  20. | debug files <TBL> [option format=all|external] | list files belonging to <TBL>. 'all' - including external (wordforms, stopwords, etc.) |
  21. | debug close | ask server to close connection from it's side |
  22. | debug compress <TBL> [chunk] <X> [option sync=1] | Compress disk chunk X of RT table <TBL> (wipe out deleted documents) |
  23. | debug split <TBL> [chunk] <X> on @<uservar> [option sync=1] | Split disk chunk X of RT table <TBL> using set of DocIDs from @uservar |
  24. | debug wait <cluster> [like 'xx'] [option timeout=3] | wait <cluster> ready, but no more than 3 secs. |
  25. | debug wait <cluster> status <N> [like 'xx'] [option timeout=13] | wait <cluster> commit achieve <N>, but no more than 13 secs |
  26. | debug meta | Show max_matches/pseudo_shards. Needs set profiling=1 |
  27. | debug trace OFF|'path/to/file' [<N>] | trace flow to file until N bytes written, or 'trace OFF' |
  28. | debug curl <URL> | request given url via libcurl |
  29. +-------------------------------------------------------------------------+----------------------------------------------------------------------------------------+
  30. 24 rows in set (0.00 sec)

All debug XXX commands should be regarded as non-stable and subject to modification at any time, so don’t be surprised if they change. This example output may not reflect the actual available commands, so try it on your system to see what is available on your instance. Additionally, there is no detailed documentation provided aside from this short ‘meaning’ column.

As a quick illustration, two commands available only to VIP clients are described below - shutdown and crash. Both require a token, which can be generated with the debug token subcommand, and added to the shutdown_token param in the searchd section of the config file. If no such section exists, or if the provided password hash does not match the token stored in the config, the subcommands will do nothing.

  1. mysql> debug token hello;
  2. +-------------+------------------------------------------+
  3. | command | result |
  4. +-------------+------------------------------------------+
  5. | debug token | aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
  6. +-------------+------------------------------------------+
  7. 1 row in set (0,00 sec)

The subcommand shutdown will send a TERM signal to the server, causing it to shut down. This can be dangerous, as nobody wants to accidentally stop a production service. Therefore, it requires a VIP connection and the password to be used.

The subcommand crash literally causes a crash. It may be used for testing purposes, such as to test how the system manager maintains the service’s liveness or to test the feasibility of tracking coredumps.

If some commands are found to be useful in a more general context, they may be moved from the debug subcommands to a more stable and generic location (as exemplified by the debug tasks and debug sched in the table).

References

SQL commands

Schema management
Data management
  • INSERT - Adds new documents
  • REPLACE - Replaces existing documents with new ones
  • UPDATE - Does in-place update in documents
  • DELETE - Deletes documents
  • TRUNCATE TABLE - Deletes all documents from table
Backup
  • BACKUP - Backs up your tables
SELECT
Flushing misc things
Real-time table optimization
Importing to a real-time table
  • ATTACH TABLE - Moves data from a plain table to a real-time table
  • IMPORT TABLE - Imports previously created RT or PQ table into a server running in the RT mode
Replication
Plain table rotate
Transactions
  • BEGIN - Begins a transaction
  • COMMIT - Finishes a transaction
  • ROLLBACK - Rolls back a transaction
CALL
Plugins
Server status

HTTP endpoints

  • /sql - Allows running an SQL statement over HTTP
  • /cli - HTTP command line interface
  • /insert - Inserts a document into a real-time table
  • /pq/tbl_name/doc - Inserts a PQ rule into a percolate table
  • /update - Updates a document in a real-time table
  • /replace - Replaces a document in a real-time table
  • /pq/tbl_name/doc/N?refresh=1 - Replaces a PQ rule in a percolate table
  • /delete - Deletes a document in a table
  • /bulk - Perform several insert, update or delete operations in a single call. More about bulk inserts here.
  • /search - Performs search
  • /pq/tbl_name/search - Performs reverse search in a percolate table

Common things

Common table settings
Plain table settings
Distributed table settings
RT table settings

Full-text search operators

Functions

Mathematical
  • ABS()) - Returns absolute value
  • ATAN2()) - Returns arctangent function of two arguments
  • BITDOT()) - Returns sum of products of an each bit of a mask multiplied with its weight
  • CEIL()) - Returns smallest integer value greater or equal to the argument
  • COS()) - Returns cosine of the argument
  • CRC32()) - Returns CRC32 value of the argument
  • EXP()) - Returns exponent of the argument
  • FIBONACCI()) - Returns the N-th Fibonacci number, where N is the integer argument
  • FLOOR()) - Returns the largest integer value lesser or equal to the argument
  • GREATEST()) - Takes JSON/MVA array as the argument and returns the greatest value in that array
  • IDIV()) - Returns result of an integer division of the first argument by the second argument
  • LEAST()) - Takes JSON/MVA array as the argument, and returns the least value in that array
  • LN()) - Returns natural logarithm of the argument
  • LOG10()) - Returns common logarithm of the argument
  • LOG2()) - Returns binary logarithm of the argument
  • MAX()) - Returns the bigger of two arguments
  • MIN()) - Returns the smaller of two arguments
  • POW()) - Returns the first argument raised to the power of the second argument
  • RAND()) - Returns random float between 0..1
  • SIN()) - Returns sine of the argument
  • SQRT()) - Returns square root of the argument
Searching and ranking
  • BM25F()) - Returns precise BM25F formula value
  • EXIST()) - Replaces non-existing columns with default values
  • GROUP_CONCAT()) - Produces a comma-separated list of the attribute values of all documents in the group
  • HIGHLIGHT() - Highlights search results
  • MIN_TOP_SORTVAL()) - Returns sort key value of the worst found element in the current top-N matches
  • MIN_TOP_WEIGHT()) - Returns weight of the worst found element in the current top-N matches
  • PACKEDFACTORS()) - Outputs weighting factors
  • REMOVE_REPEATS()) - Removes repeated adjusted rows with the same ‘column’ value
  • WEIGHT()) - Returns fulltext match score
  • ZONESPANLIST()) - Returns pairs of matched zone spans
  • QUERY()) - Returns current full-text query
Type casting
  • BIGINT()) - Forcibly promotes the integer argument to 64-bit type
  • DOUBLE()) - Forcibly promotes given argument to floating point type
  • INTEGER()) - Forcibly promotes given argument to 64-bit signed type
  • TO_STRING()) - Forcibly promotes the argument to string type
  • UINT()) - Forcibly reinterprets given argument to 64-bit unsigned type
  • SINT()) - Interprets 32-bit unsigned integer as signed 64-bit integer
Arrays and conditions
  • ALL()) - Returns 1 if condition is true for all elements in the array
  • ANY()) - Returns 1 if condition is true for any element in the array
  • CONTAINS()) - Checks whether the (x,y) point is within the given polygon
  • IF()) - Checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero or the 3rd one when it is
  • IN()) - Returns 1 if the first argument is equal to any of the other arguments, or 0 otherwise
  • INDEXOF()) - Iterates through all elements in the array and returns index of the first matching element
  • INTERVAL()) - Returns index of the argument that is less than the first argument
  • LENGTH()) - Returns number of elements in MVA
  • REMAP()) - Allows to make some exceptions of expression values depending on the condition values
Date and time
  • NOW()) - Returns current timestamp as an INTEGER
  • CURTIME()) - Returns current time in local timezone
  • UTC_TIME()) - Returns current time in UTC timezone
  • UTC_TIMESTAMP()) - Returns current date/time in UTC timezone
  • SECOND()) - Returns integer second from the timestamp argument
  • MINUTE()) - Returns integer minute from the timestamp argument
  • HOUR()) - Returns integer hour from the timestamp argument
  • DAY()) - Returns integer day from the timestamp argument
  • MONTH()) - Returns integer month from the timestamp argument
  • YEAR()) - Returns integer year from the timestamp argument
  • YEARMONTH()) - Returns integer year and month code from the timestamp argument
  • YEARMONTHDAY()) - Returns integer year, month and day code from the timestamp argument
  • TIMEDIFF()) - Returns difference between the timstamps
Geo-spatial
  • GEODIST()) - Computes geosphere distance between two given points
  • GEOPOLY2D()) - Creates a polygon that takes in account the Earth’s curvature
  • POLY2D()) - Creates a simple polygon in plain space
String
  • CONCAT()) - Concatenates two or more strings
  • REGEX()) - Returns 1 if regular expression matched to string of attribute and 0 otherwise
  • SNIPPET()) - Highlights search results
  • SUBSTRING_INDEX()) - Returns a substring of the string before the specified number of delimiter occurs
  • Other
  • LAST_INSERT_ID()) - Returns ids of documents inserted or replaced by last statement in the current session

Common settings in configuration file

To be put to section common {} in configuration file:

Indexer

indexer is a tool to create plain tables

Indexer settings in configuration file

To be put to section indexer {} in configuration file:

Indexer start parameters
  1. indexer [OPTIONS] [indexname1 [indexname2 [...]]]
  • --all - Rebuilds all tables from the config
  • --buildstops - Reviews the table source, as if it were indexing the data, and produces a list of the terms that are being indexed.
  • --buildfreqs - Adds the quantity present in the table for —buildstops
  • --config, -c - Path to configuration file
  • --dump-rows - Dumps rows fetched by SQL source(s) into the specified file
  • --help - Lists all the parameters
  • --keep-attrs - Allows to reuse existing attributes on reindexing
  • --keep-attrs-names - Allows to specify attributes to reuse from the existing table
  • --merge-dst-range - Runs the filter range given upon merging
  • --merge-killlists - Changes the way kill lists are processed when merging tables
  • --merge - Merges two plain tables into one
  • --nohup - Indexer won’t send SIGHUP if this option is on
  • --noprogress - Prevents displaying progress details
  • --print-queries - Prints out SQL queries that indexer sends to the database
  • --print-rt - Outputs data fetched from sql source(s) as INSERTs to a real-time table
  • --quiet - Prevents displaying anything
  • --rotate - Forces tables rotation after all the tables are built
  • --sighup-each - Forces rotation of each table after it’s built
  • -v - Shows indexer version

Table converter from Manticore v2 / Sphinx v2

index_converter is a tool for converting tables created with Sphinx/Manticore Search 2.x to Manticore Search 3.x table format.

  1. index_converter {--config /path/to/config|--path}
Table converter start parameters
  • --config, -c - Path to tables configuration file
  • --index - Specifies which table should be converted
  • --path - Defines path containing table(s) instead of the configuration file
  • --strip-path - Strips path from filenames referenced by table
  • --large-docid - Allows to convert documents with ids larger than 2^63
  • --output-dir - Writes the new files in a chosen folder
  • --all - Converts all tables from the configuration file / path
  • --killlist-target - Sets the target tables for which kill-lists will be applied

Searchd

searchd is a Manticore server.

Searchd settings in a configuration file

To be put to section searchd {} in configuration file:

Searchd start parameters
  1. searchd [OPTIONS]
  • --config, -c - Path to configuration file
  • --console - Forces running in console mode
  • --coredump - Enables saving core dump on crash
  • --cpustats - Enables CPU time reporting
  • --delete - Removes Manticore service from Microsoft Management Console and other places where the services are registered
  • --force-preread - Forbids the server to serve any incoming connection until pre-reading of the table files completes
  • --help, -h - Lists all the parameters
  • --table (—index) - Forces serving only the specified table
  • --install - Installs searchd as a service into Microsoft Management Console
  • --iostats - Enables input/output reporting
  • --listen, -l - Overrides listen from the configuration file
  • --logdebug, —logdebugv, —logdebugvv - Enables additional debug output in the server log
  • --logreplication - Enables additional replication debug output in the server log
  • --new-cluster - Bootstraps a replication cluster and makes the server a reference node with cluster restart protection
  • --new-cluster-force - Bootstraps a replication cluster and makes the server a reference node bypassing cluster restart protection
  • --nodetach - Leaves searchd in foreground
  • --ntservice - Passed by Microsoft Management Console to searchd to invoke it as a service on Windows platforms
  • --pidfile - Overrides pid_file from the configuration file
  • --port, p - Specifies port searchd should listen on disregarding the port specified in the configuration file
  • --replay-flags - Specifies extra binary log replay options
  • --servicename - Applies the given name to searchd when installing or deleting the service, as would appear in Microsoft Management Console
  • --status - Queries running search to return its status
  • --stop - Stops Manticore server
  • --stopwait - Stops Manticore server gracefully
  • --strip-path - Strips path names from all the file names referenced from the table
  • -v - shows version information
Searchd environment variables

Indextool

Miscellaneous table maintenance functionality useful for troubleshooting.

  1. indextool <command> [options]
Indextool start parameters

Used to dump miscellaneous debug information about the physical table

  1. indextool <command> [options]
  • --config, -c - Path to configuration file
  • --quiet, -q - Keeps indextool quiet - it will not output banner, etc
  • --help, -h - Lists all the parameters
  • -v - Shows version information
  • Indextool - Verifies configuration file
  • --buildidf - Builds IDF file from one or several dictionary dumps
  • --build-infixes - Build infixes for an existing dict=keywords table
  • --dumpheader - Quickly dumps the provided table header file
  • --dumpconfig - Dumps table definition from the given table header file in almost compliant manticore.conf file format
  • --dumpheader - Dumps table header by table name with looking up the header path in the configuration file
  • --dumpdict - Dumps table dictionary
  • --dumpdocids - Dumps document IDs by table name
  • --dumphitlist - Dumps all occurrences of the given keyword/id in the given table
  • --docextract - Runs table check pass of whole dictionary/docs/hits, and collects all the words and hits belonging to requested document
  • --fold - Tests tokenization based on table’s settings
  • --htmlstrip - Filters STDIN using HTML stripper settings for the given table
  • --mergeidf - Merges several .idf files into a single one
  • --morph - Applies morphology to the given STDIN and prints the result to stdout
  • --check - Checks the table data files for consistency
  • --check-id-dups - Checks if there are duplicate ids
  • --check-disk-chunk - Checks one disk chunk of an RT table
  • --strip-path - Strips path names from all the file names referenced from the table
  • --rotate - Defines whether to check table waiting for rotation in --check
  • --apply-killlists - Applies kill-lists for all tables listed in the configuration file

Wordbreaker

Splits compound words into components.

  1. wordbreaker [-dict path/to/dictionary_file] {split|test|bench}
Wordbreaker start parameters.

Spelldump

Used to extract contents of a dictionary file that uses ispell or MySpell format.

  1. spelldump [options] <dictionary> <affix> [result] [locale-name]
  • dictionary - Dictionary’s main file
  • affix - Dictionary’s affix file
  • result - Specifies where the dictionary data should be output to
  • locale-name - Specifies the locale details to use

List of reserved keywords

A complete alphabetical list of keywords that are currently reserved in Manticore SQL syntax (and therefore can not be used as identifiers).

  1. AND, AS, BY, COLUMNARSCAN, DISTINCT, DIV, DOCIDINDEX, EXPLAIN, FACET, FALSE, FORCE, FROM, IGNORE, IN, INDEXES, IS, LIMIT, MOD, NOT, NO_COLUMNARSCAN, NO_DOCIDINDEX, NO_SECONDARYINDEX, NULL, OFFSET, OR, ORDER, REGEX, RELOAD, SECONDARYINDEX, SELECT, SYSFILTERS, TRUE, USE

Documentation for old Manticore versions

References

SQL commands

Schema management
Data management
  • INSERT - Adds new documents
  • REPLACE - Replaces existing documents with new ones
  • UPDATE - Does in-place update in documents
  • DELETE - Deletes documents
  • TRUNCATE TABLE - Deletes all documents from table
Backup
  • BACKUP - Backs up your tables
SELECT
Flushing misc things
Real-time table optimization
Importing to a real-time table
  • ATTACH TABLE - Moves data from a plain table to a real-time table
  • IMPORT TABLE - Imports previously created RT or PQ table into a server running in the RT mode
Replication
Plain table rotate
Transactions
  • BEGIN - Begins a transaction
  • COMMIT - Finishes a transaction
  • ROLLBACK - Rolls back a transaction
CALL
Plugins
Server status

HTTP endpoints

  • /sql - Allows running an SQL statement over HTTP
  • /cli - HTTP command line interface
  • /insert - Inserts a document into a real-time table
  • /pq/tbl_name/doc - Inserts a PQ rule into a percolate table
  • /update - Updates a document in a real-time table
  • /replace - Replaces a document in a real-time table
  • /pq/tbl_name/doc/N?refresh=1 - Replaces a PQ rule in a percolate table
  • /delete - Deletes a document in a table
  • /bulk - Perform several insert, update or delete operations in a single call. More about bulk inserts here.
  • /search - Performs search
  • /pq/tbl_name/search - Performs reverse search in a percolate table

Common things

Common table settings
Plain table settings
Distributed table settings
RT table settings

Full-text search operators

Functions

Mathematical
  • ABS()) - Returns absolute value
  • ATAN2()) - Returns arctangent function of two arguments
  • BITDOT()) - Returns sum of products of an each bit of a mask multiplied with its weight
  • CEIL()) - Returns smallest integer value greater or equal to the argument
  • COS()) - Returns cosine of the argument
  • CRC32()) - Returns CRC32 value of the argument
  • EXP()) - Returns exponent of the argument
  • FIBONACCI()) - Returns the N-th Fibonacci number, where N is the integer argument
  • FLOOR()) - Returns the largest integer value lesser or equal to the argument
  • GREATEST()) - Takes JSON/MVA array as the argument and returns the greatest value in that array
  • IDIV()) - Returns result of an integer division of the first argument by the second argument
  • LEAST()) - Takes JSON/MVA array as the argument, and returns the least value in that array
  • LN()) - Returns natural logarithm of the argument
  • LOG10()) - Returns common logarithm of the argument
  • LOG2()) - Returns binary logarithm of the argument
  • MAX()) - Returns the bigger of two arguments
  • MIN()) - Returns the smaller of two arguments
  • POW()) - Returns the first argument raised to the power of the second argument
  • RAND()) - Returns random float between 0..1
  • SIN()) - Returns sine of the argument
  • SQRT()) - Returns square root of the argument
Searching and ranking
  • BM25F()) - Returns precise BM25F formula value
  • EXIST()) - Replaces non-existing columns with default values
  • GROUP_CONCAT()) - Produces a comma-separated list of the attribute values of all documents in the group
  • HIGHLIGHT() - Highlights search results
  • MIN_TOP_SORTVAL()) - Returns sort key value of the worst found element in the current top-N matches
  • MIN_TOP_WEIGHT()) - Returns weight of the worst found element in the current top-N matches
  • PACKEDFACTORS()) - Outputs weighting factors
  • REMOVE_REPEATS()) - Removes repeated adjusted rows with the same ‘column’ value
  • WEIGHT()) - Returns fulltext match score
  • ZONESPANLIST()) - Returns pairs of matched zone spans
  • QUERY()) - Returns current full-text query
Type casting
  • BIGINT()) - Forcibly promotes the integer argument to 64-bit type
  • DOUBLE()) - Forcibly promotes given argument to floating point type
  • INTEGER()) - Forcibly promotes given argument to 64-bit signed type
  • TO_STRING()) - Forcibly promotes the argument to string type
  • UINT()) - Forcibly reinterprets given argument to 64-bit unsigned type
  • SINT()) - Interprets 32-bit unsigned integer as signed 64-bit integer
Arrays and conditions
  • ALL()) - Returns 1 if condition is true for all elements in the array
  • ANY()) - Returns 1 if condition is true for any element in the array
  • CONTAINS()) - Checks whether the (x,y) point is within the given polygon
  • IF()) - Checks whether the 1st argument is equal to 0.0, returns the 2nd argument if it is not zero or the 3rd one when it is
  • IN()) - Returns 1 if the first argument is equal to any of the other arguments, or 0 otherwise
  • INDEXOF()) - Iterates through all elements in the array and returns index of the first matching element
  • INTERVAL()) - Returns index of the argument that is less than the first argument
  • LENGTH()) - Returns number of elements in MVA
  • REMAP()) - Allows to make some exceptions of expression values depending on the condition values
Date and time
  • NOW()) - Returns current timestamp as an INTEGER
  • CURTIME()) - Returns current time in local timezone
  • UTC_TIME()) - Returns current time in UTC timezone
  • UTC_TIMESTAMP()) - Returns current date/time in UTC timezone
  • SECOND()) - Returns integer second from the timestamp argument
  • MINUTE()) - Returns integer minute from the timestamp argument
  • HOUR()) - Returns integer hour from the timestamp argument
  • DAY()) - Returns integer day from the timestamp argument
  • MONTH()) - Returns integer month from the timestamp argument
  • YEAR()) - Returns integer year from the timestamp argument
  • YEARMONTH()) - Returns integer year and month code from the timestamp argument
  • YEARMONTHDAY()) - Returns integer year, month and day code from the timestamp argument
  • TIMEDIFF()) - Returns difference between the timstamps
Geo-spatial
  • GEODIST()) - Computes geosphere distance between two given points
  • GEOPOLY2D()) - Creates a polygon that takes in account the Earth’s curvature
  • POLY2D()) - Creates a simple polygon in plain space
String
  • CONCAT()) - Concatenates two or more strings
  • REGEX()) - Returns 1 if regular expression matched to string of attribute and 0 otherwise
  • SNIPPET()) - Highlights search results
  • SUBSTRING_INDEX()) - Returns a substring of the string before the specified number of delimiter occurs
  • Other
  • LAST_INSERT_ID()) - Returns ids of documents inserted or replaced by last statement in the current session

Common settings in configuration file

To be put to section common {} in configuration file:

Indexer

indexer is a tool to create plain tables

Indexer settings in configuration file

To be put to section indexer {} in configuration file:

Indexer start parameters
  1. indexer [OPTIONS] [indexname1 [indexname2 [...]]]
  • --all - Rebuilds all tables from the config
  • --buildstops - Reviews the table source, as if it were indexing the data, and produces a list of the terms that are being indexed.
  • --buildfreqs - Adds the quantity present in the table for —buildstops
  • --config, -c - Path to configuration file
  • --dump-rows - Dumps rows fetched by SQL source(s) into the specified file
  • --help - Lists all the parameters
  • --keep-attrs - Allows to reuse existing attributes on reindexing
  • --keep-attrs-names - Allows to specify attributes to reuse from the existing table
  • --merge-dst-range - Runs the filter range given upon merging
  • --merge-killlists - Changes the way kill lists are processed when merging tables
  • --merge - Merges two plain tables into one
  • --nohup - Indexer won’t send SIGHUP if this option is on
  • --noprogress - Prevents displaying progress details
  • --print-queries - Prints out SQL queries that indexer sends to the database
  • --print-rt - Outputs data fetched from sql source(s) as INSERTs to a real-time table
  • --quiet - Prevents displaying anything
  • --rotate - Forces tables rotation after all the tables are built
  • --sighup-each - Forces rotation of each table after it’s built
  • -v - Shows indexer version

Table converter from Manticore v2 / Sphinx v2

index_converter is a tool for converting tables created with Sphinx/Manticore Search 2.x to Manticore Search 3.x table format.

  1. index_converter {--config /path/to/config|--path}
Table converter start parameters
  • --config, -c - Path to tables configuration file
  • --index - Specifies which table should be converted
  • --path - Defines path containing table(s) instead of the configuration file
  • --strip-path - Strips path from filenames referenced by table
  • --large-docid - Allows to convert documents with ids larger than 2^63
  • --output-dir - Writes the new files in a chosen folder
  • --all - Converts all tables from the configuration file / path
  • --killlist-target - Sets the target tables for which kill-lists will be applied

Searchd

searchd is a Manticore server.

Searchd settings in a configuration file

To be put to section searchd {} in configuration file:

Searchd start parameters
  1. searchd [OPTIONS]
  • --config, -c - Path to configuration file
  • --console - Forces running in console mode
  • --coredump - Enables saving core dump on crash
  • --cpustats - Enables CPU time reporting
  • --delete - Removes Manticore service from Microsoft Management Console and other places where the services are registered
  • --force-preread - Forbids the server to serve any incoming connection until pre-reading of the table files completes
  • --help, -h - Lists all the parameters
  • --table (—index) - Forces serving only the specified table
  • --install - Installs searchd as a service into Microsoft Management Console
  • --iostats - Enables input/output reporting
  • --listen, -l - Overrides listen from the configuration file
  • --logdebug, —logdebugv, —logdebugvv - Enables additional debug output in the server log
  • --logreplication - Enables additional replication debug output in the server log
  • --new-cluster - Bootstraps a replication cluster and makes the server a reference node with cluster restart protection
  • --new-cluster-force - Bootstraps a replication cluster and makes the server a reference node bypassing cluster restart protection
  • --nodetach - Leaves searchd in foreground
  • --ntservice - Passed by Microsoft Management Console to searchd to invoke it as a service on Windows platforms
  • --pidfile - Overrides pid_file from the configuration file
  • --port, p - Specifies port searchd should listen on disregarding the port specified in the configuration file
  • --replay-flags - Specifies extra binary log replay options
  • --servicename - Applies the given name to searchd when installing or deleting the service, as would appear in Microsoft Management Console
  • --status - Queries running search to return its status
  • --stop - Stops Manticore server
  • --stopwait - Stops Manticore server gracefully
  • --strip-path - Strips path names from all the file names referenced from the table
  • -v - shows version information
Searchd environment variables

Indextool

Miscellaneous table maintenance functionality useful for troubleshooting.

  1. indextool <command> [options]
Indextool start parameters

Used to dump miscellaneous debug information about the physical table

  1. indextool <command> [options]
  • --config, -c - Path to configuration file
  • --quiet, -q - Keeps indextool quiet - it will not output banner, etc
  • --help, -h - Lists all the parameters
  • -v - Shows version information
  • Indextool - Verifies configuration file
  • --buildidf - Builds IDF file from one or several dictionary dumps
  • --build-infixes - Build infixes for an existing dict=keywords table
  • --dumpheader - Quickly dumps the provided table header file
  • --dumpconfig - Dumps table definition from the given table header file in almost compliant manticore.conf file format
  • --dumpheader - Dumps table header by table name with looking up the header path in the configuration file
  • --dumpdict - Dumps table dictionary
  • --dumpdocids - Dumps document IDs by table name
  • --dumphitlist - Dumps all occurrences of the given keyword/id in the given table
  • --docextract - Runs table check pass of whole dictionary/docs/hits, and collects all the words and hits belonging to requested document
  • --fold - Tests tokenization based on table’s settings
  • --htmlstrip - Filters STDIN using HTML stripper settings for the given table
  • --mergeidf - Merges several .idf files into a single one
  • --morph - Applies morphology to the given STDIN and prints the result to stdout
  • --check - Checks the table data files for consistency
  • --check-id-dups - Checks if there are duplicate ids
  • --check-disk-chunk - Checks one disk chunk of an RT table
  • --strip-path - Strips path names from all the file names referenced from the table
  • --rotate - Defines whether to check table waiting for rotation in --check
  • --apply-killlists - Applies kill-lists for all tables listed in the configuration file

Wordbreaker

Splits compound words into components.

  1. wordbreaker [-dict path/to/dictionary_file] {split|test|bench}
Wordbreaker start parameters.

Spelldump

Used to extract contents of a dictionary file that uses ispell or MySpell format.

  1. spelldump [options] <dictionary> <affix> [result] [locale-name]
  • dictionary - Dictionary’s main file
  • affix - Dictionary’s affix file
  • result - Specifies where the dictionary data should be output to
  • locale-name - Specifies the locale details to use

List of reserved keywords

A complete alphabetical list of keywords that are currently reserved in Manticore SQL syntax (and therefore can not be used as identifiers).

  1. AND, AS, BY, COLUMNARSCAN, DISTINCT, DIV, DOCIDINDEX, EXPLAIN, FACET, FALSE, FORCE, FROM, IGNORE, IN, INDEXES, IS, LIMIT, MOD, NOT, NO_COLUMNARSCAN, NO_DOCIDINDEX, NO_SECONDARYINDEX, NULL, OFFSET, OR, ORDER, REGEX, RELOAD, SECONDARYINDEX, SELECT, SYSFILTERS, TRUE, USE

Documentation for old Manticore versions