String and text data types

Introduction

Strings, character data types, or text. What you want to call it is up to you. Manipulating and outputting text is a very important topic that will be required for many different types of systems that you work with. The YugabyteDB SQL API offers extensive text capability that will be demonstrated here.

About character data types

Character data types

For character data types, see Data types. Note that YugabyteDB implements the data type aliases and that is what is used here.

With PostgreSQL, the use of different character data types has a historical aspect. YugabyteDB — being a more recent implementation — has no such history. Consider keeping your use of character data types simple, ideally just ‘text’, or ‘varchar(n)’ if you require a restricted length. Although it’s your choice, using text and then verifying the length of a character string will allow you to develop your own approach to managing this scenario, rather than encountering errors by exceeding some arbitrary length.

NoteIf you use char(n), character(n), or varchar(n), then the limitation will be the number you assign, which cannot exceed 10,485,760. For unlimited length, use a character data type without a length description, such as ‘text’. However, if you have specific requirements to ignore trailing spaces, then you may wish to consider using char(n).

The following example shows a few ways to work with the different data types.

  1. ./bin/ysqlsh
  2. ysqlsh (11.2)
  3. Type "help" for help.
  4. yugabyte=# create table text_columns(a_text text, a_varchar varchar, a_char char, b_varchar varchar(10), b_char char(10));
  5. CREATE TABLE
  6. yugabyte=# insert into text_columns values('abc ', 'abc ', 'abc ', 'abc ', 'abc ');
  7. ERROR: value too long for type character(1)
  8. yugabyte=# insert into text_columns values('abc ', 'abc ', 'a', 'abc ', 'abc ');
  9. INSERT 0 1
  10. yugabyte=# select * from text_columns
  11. where a_text like 'ab__' and a_varchar like 'ab__'
  12. and b_varchar like 'ab__';
  13. a_text | a_varchar | a_char | b_varchar | b_char
  14. --------+-----------+--------+-----------+------------
  15. abc | abc | a | abc | abc
  16. yugabyte=# select * from text_columns
  17. where a_text like 'ab__' and a_varchar like 'ab__'
  18. and b_varchar like 'ab__' and b_char like 'ab__';
  19. a_text | a_varchar | a_char | b_varchar | b_char
  20. --------+-----------+--------+-----------+--------
  21. (0 rows)
  22. yugabyte=# select length(a_text) as a_text, length(a_varchar) as a_varchar, length(a_char) as a_char,
  23. length(b_varchar) as b_varchar, length(b_char) as b_char
  24. from text_columns;
  25. a_text | a_varchar | a_char | b_varchar | b_char
  26. --------+-----------+--------+-----------+--------
  27. 4 | 4 | 1 | 4 | 3

In the example above, notice that the column b_char does not contain a trailing space and this could impact your SQL. And, if you specify a maximum length on the column definition, the SQL can also generate errors, so you will have to either manually truncate your input values or introduce error handling.

Casting

When you are working with text that has been entered by users through an application, ensure that YugabyteDB understands that it is working with a text input. All values should be cast unless they can be trusted due to other validation measures that have already occurred.

Start YSQL and you can see the impacts of casting.

  1. ./bin/ysqlsh
  2. ysqlsh (11.2)
  3. Type "help" for help.
  4. yugabyte=# select cast(123 AS TEXT), cast('123' AS TEXT), 123::text, '123'::text;
  5. text | text | text | text
  6. ------+------+------+------
  7. 123 | 123 | 123 | 123
  8. yugabyte=# select tablename, hasindexes AS nocast, hasindexes::text AS casted
  9. from pg_catalog.pg_tables
  10. where tablename in('pg_default_acl', 'sql_features');
  11. tablename | nocast | casted
  12. ----------------+--------+--------
  13. pg_default_acl | t | true
  14. sql_features | f | false

In the last example above, the column ‘hasindexes’ is a Boolean data type and by casting it to text, you will receive a text result of true or false.

Manipulating text

There are a lot of functions that can be applied to text. Below the functions are classified into logical groupings - in many cases the capability of the functions overlap and personal choice will determine how you approach solving the problem.

The focus here was to quickly show how each of the functions could be used, along with some examples. It is assumed that you have the yb_demo database installed.

Altering the appearance of text

  1. yugabyte=# \c yb_demo
  2. You are now connected to database "yb_demo" as user "yugabyte".
  3. yb_demo =# select lower('hELLO world') AS LOWER,
  4. upper('hELLO world') AS UPPER,
  5. initcap('hELLO world') AS INITCAP;
  6. lower | upper | initcap
  7. -------------+-------------+-------------
  8. hello world | HELLO WORLD | Hello World
  9. yb_demo =# select quote_ident('ok') AS EASY, quote_ident('I am OK') AS QUOTED, quote_ident('I''m not OK') AS DOUBLE_QUOTED, quote_ident('') AS EMPTY_STR, quote_ident(null) AS NULL_QUOTED;
  10. easy | quoted | double_quoted | empty_str | null_quoted
  11. ------+-----------+---------------+-----------+-------------
  12. ok | "I am OK" | "I'm not OK" | "" |
  13. yb_demo =# select quote_literal('ok') AS EASY, quote_literal('I am OK') AS QUOTED, quote_literal('I''m not OK') AS DOUBLE_QUOTED, quote_literal('') AS EMPTY_STR, quote_literal(null) AS NULL_QUOTED;
  14. easy | quoted | double_quoted | empty_str | null_quoted
  15. ------+-----------+---------------+-----------+-------------
  16. 'ok' | 'I am OK' | 'I''m not OK' | '' |
  17. yb_demo =# select quote_nullable('ok') AS EASY, quote_nullable('I am OK') AS QUOTED, quote_nullable('I''m not OK') AS DOUBLE_QUOTED, quote_nullable('') AS EMPTY_STR, quote_nullable(null) AS NULL_QUOTED;
  18. easy | quoted | double_quoted | empty_str | null_quoted
  19. ------+-----------+---------------+-----------+-------------
  20. 'ok' | 'I am OK' | 'I''m not OK' | '' | NULL

Use quote_ident to parse identifiers in SQL like column names and quote_nullable as a string literal that may also be a null.

You can use “dollar sign quoting” to parse raw text — any text contained within dollar sign quotations are treated as a raw literal. The starting and ending markers do not need to be identical, but must start and end with a dollar sign ($). See the examples below.

  1. yugabyte=# select $$%&*$&$%7'\67458\''""""';;'\//\/\/\""'/'''''"""""'''''''''$$;
  2. ?column?
  3. -----------------------------------------------------------
  4. %&*$&$%7'\67458\''""""';;'\//\/\/\""'/'''''"""""'''''''''
  5. yugabyte=# select $__unique_$ Lots of space
  6. yugabyte=# and multi-line too $__unique_$;
  7. ?column?
  8. ----------------------------------------------
  9. Lots of space +
  10. and multi-line too
  11. yugabyte=# select $$first$$ AS "F1", $$second$$ AS "F2";
  12. F1 | F2
  13. -------+--------
  14. first | second

Some values need to be padded for formatting purposes, and LPAD and RPAD are meant for this purpose. They mean ‘left pad’ and ‘right pad’ respectively. They are normally used to fill with spaces but you could specify anything, including more than a single character. So you could pad with underscores (_) or spaced dots . . ., or anything you wish. You do not specify how much to pad, but the maximum length to pad. Therefore, if your value is already as long as your maximum length, then no padding is required. Note that this can cause a truncation if your field is longer than the maximum length specified.

The reverse of padding is trimming, which will remove spaces if found. Below are examples of using padding and trimming to achieve the results required.

  1. yb_demo=# select name, lpad(name, 10), rpad(name, 15) from users order by name limit 5;
  2. name | lpad | rpad
  3. -------------------+------------+-----------------
  4. Aaron Hand | Aaron Hand | Aaron Hand
  5. Abbey Satterfield | Abbey Satt | Abbey Satterfie
  6. Abbie Parisian | Abbie Pari | Abbie Parisian
  7. Abbie Ryan | Abbie Ryan | Abbie Ryan
  8. Abby Larkin | Abby Larki | Abby Larkin
  9. yb_demo=# select name, lpad(name, 20), rpad(name, 20) from users order by name limit 5;
  10. name | lpad | rpad
  11. -------------------+----------------------+----------------------
  12. Aaron Hand | Aaron Hand | Aaron Hand
  13. Abbey Satterfield | Abbey Satterfield | Abbey Satterfield
  14. Abbie Parisian | Abbie Parisian | Abbie Parisian
  15. Abbie Ryan | Abbie Ryan | Abbie Ryan
  16. Abby Larkin | Abby Larkin | Abby Larkin
  17. yb_demo=# select name, lpad(name, 20, '. '), rpad(name, 20, '.') from users order by name limit 5;
  18. name | lpad | rpad
  19. -------------------+----------------------+----------------------
  20. Aaron Hand | . . . . . Aaron Hand | Aaron Hand..........
  21. Abbey Satterfield | . .Abbey Satterfield | Abbey Satterfield...
  22. Abbie Parisian | . . . Abbie Parisian | Abbie Parisian......
  23. Abbie Ryan | . . . . . Abbie Ryan | Abbie Ryan..........
  24. Abby Larkin | . . . . .Abby Larkin | Abby Larkin.........
  25. yb_demo=# select repeat(' ', ((x.maxlen-length(u.name))/2)::int) || rpad(u.name, x.maxlen) AS "cname"
  26. from users u,
  27. (select max(length(a.name))::int AS maxlen from users a) AS x;
  28. cname
  29. ------------------------------
  30. Stewart Marks
  31. Regan Corkery
  32. Domenic Daugherty
  33. Winfield Donnelly
  34. Theresa Kertzmann
  35. Terrence Emmerich
  36. Hudson Jacobi
  37. Aidan Hagenes
  38. Virgil Schowalter
  39. Rahul Kreiger
  40. Wilhelmine Erdman
  41. Elwin Okuneva
  42. Maximillian Dickinson
  43. Lucie Cormier
  44. Alexandrine Rosenbaum
  45. Jayne Breitenberg
  46. Alexandria Schowalter
  47. Augustine Runolfsdottir
  48. Mathilde Weissnat
  49. Theresa Grant
  50. ...
  51. yb_demo=# select x.RawDay, length(x.RawDay) AS RawLen, x.TrimDay, length(x.TrimDay) AS TrimLen,
  52. x.LTrimDay, length(x.LTrimDay) AS LTrimLen, x.RTrimDay, length(x.RTrimDay) AS RTrimLen
  53. from (select to_char(generate_series, 'Day') AS RawDay,
  54. trim(to_char(generate_series, 'Day')) AS TrimDay,
  55. ltrim(to_char(generate_series, 'Day')) AS LTrimDay,
  56. rtrim(to_char(generate_series, 'Day')) AS RTrimDay
  57. from generate_series(current_date, current_date+6, '1 day')) AS x;
  58. rawday | rawlen | trimday | trimlen | ltrimday | ltrimlen | rtrimday | rtrimlen
  59. -----------+--------+-----------+---------+-----------+----------+-----------+----------
  60. Wednesday | 9 | Wednesday | 9 | Wednesday | 9 | Wednesday | 9
  61. Thursday | 9 | Thursday | 8 | Thursday | 9 | Thursday | 8
  62. Friday | 9 | Friday | 6 | Friday | 9 | Friday | 6
  63. Saturday | 9 | Saturday | 8 | Saturday | 9 | Saturday | 8
  64. Sunday | 9 | Sunday | 6 | Sunday | 9 | Sunday | 6
  65. Monday | 9 | Monday | 6 | Monday | 9 | Monday | 6
  66. Tuesday | 9 | Tuesday | 7 | Tuesday | 9 | Tuesday | 7

The final padding example above shows how you can center text and the trim example shows the impacts of the different trims on a value that is padded. Note that the ‘Day’ value is right-padded to 9 characters which is why a left-trim has no impact upon the field length at all, only the right-trim or a ‘full’ trim will remove spaces.

You can also state that a text value is ‘escaped’ by prefixing with an ‘e’ or ‘E’. Take a look at this example.

  1. yugabyte=# select E'I''ve told YugabyteDB that this is an escaped string\n\tso I can specify escapes safely' as escaped_text;
  2. escaped_text
  3. ---------------------------------------------------
  4. I've told YugabyteDB that this is an escaped string+
  5. so I can specify escapes safely
  6. yugabyte=# select E'a\\b/c\u00B6' as escaped_txt, 'a\\b/c\u00B6' as raw_txt;
  7. escaped_txt | raw_txt
  8. -------------+--------------
  9. a\b/c | a\\b/c\u00B6

Note\n refers to a new line, and \t is a tab, hence the formatted result.

YugabyteDB also has DECODE and ENCODE for decoding and encoding from, or to, binary data. It caters for ‘base64’, ‘hex’ and ‘escape’ representations. Decode will give the output in BYTEA data type. Additionally, you can use the TO_HEX command to convert an ascii number to its digital representation.

Joining strings

You can concatenate strings of text in several different ways. For robustness, you should ensure that everything being passed is interpreted as text (by casting) so that unexpected results do not appear in edge cases. Here are some examples that show that YugabyteDB is leniant in passing in variables, but you should implement more robust casting for proper treatment of strings.

  1. yb_demo=# select 'one' || '-' || 2 || '-one' AS "121";
  2. 121
  3. -----------
  4. one-2-one
  5. yb_demo=# select 2 || '-one-one' AS "211";
  6. 211
  7. -----------
  8. 2-one-one
  9. yb_demo=# select 1 || '-one' || repeat('-two', 2) AS "1122";
  10. 1122
  11. ---------------
  12. 1-one-two-two
  13. yb_demo=# select 1::text || 2::text || 3::text AS "123";
  14. 123
  15. -----
  16. 123
  17. yb_demo=# select 1 || 2 || 3 AS "123";
  18. ERROR: operator does not exist: integer || integer
  19. LINE 1: select 1 || 2 || 3 AS "123";
  20. ^
  21. HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
  22. yb_demo=# select concat(1,2,3) AS "123";
  23. 123
  24. -----
  25. 123
  26. yb_demo=# select concat_ws(':', 1,2,3) AS "123 WS";
  27. 123 WS
  28. --------
  29. 1:2:3
  30. (1 row)
  31. yb_demo =# select left(vendor,1) AS V, string_agg(distinct(category), ', ' ORDER BY category) AS CATEGORIES
  32. from products group by left(vendor,1) order by 1;
  33. v | categories
  34. ---+----------------------------------
  35. A | Doohickey, Gadget, Gizmo
  36. B | Doohickey, Gadget, Gizmo, Widget
  37. C | Doohickey, Gadget, Gizmo, Widget
  38. D | Gadget, Gizmo, Widget
  39. E | Gadget, Gizmo, Widget
  40. F | Doohickey, Gadget, Gizmo, Widget
  41. G | Doohickey, Gadget, Widget
  42. H | Doohickey, Gadget, Gizmo, Widget
  43. I | Gizmo, Widget
  44. J | Doohickey, Gadget, Gizmo, Widget
  45. K | Doohickey, Gadget, Gizmo, Widget
  46. L | Doohickey, Gadget, Gizmo, Widget
  47. M | Doohickey, Gadget, Gizmo, Widget
  48. N | Doohickey, Gadget, Widget
  49. O | Doohickey, Gadget, Gizmo, Widget
  50. P | Doohickey, Gadget, Gizmo, Widget
  51. Q | Doohickey
  52. R | Doohickey, Gadget, Gizmo, Widget
  53. S | Doohickey, Gadget, Gizmo, Widget
  54. T | Gizmo, Widget
  55. U | Gadget
  56. V | Doohickey, Widget
  57. W | Doohickey, Gadget, Gizmo, Widget
  58. Z | Gizmo

In the example above, we explore the LEFT function, but the string_agg function is best used by an input of a series or a set of data as done in SQL rows. The example shows how the aggregated string has its own order by compared to the outer SQL which is the vendors being classified A-Z.

There is also the REVERSE function that reverses the contents of text in a simple manner as shown in the next example.

  1. yb_demo=# select reverse(to_char(current_date, 'DD-MON-YYYY'));
  2. reverse
  3. -------------
  4. 9102-LUJ-92

You can use the FORMAT function parse user input as parameters to a SQL statement in order to minimise the impact of unexpected data that is typical of a SQL injection attack. The most popular method is to use the EXECUTE command within a procedure as this is not available at the YSQL command prompt, only within the YSQL plpgsql environment. The FORMAT command is used to finalise the complete SQL statement and passed to EXECUTE to run. As we are not simulating YSQL plpgsql here, let’s illustrate how to use the FORMAT function only.

  1. yb_demo=# select format('Hello %s, today''s date is %s', 'Jono', to_char(current_date, 'DD-MON-YYYY'), 'discarded');
  2. format
  3. -----------------------------------------
  4. Hello Jono, today's date is 29-JUL-2019
  5. yb_demo=# select format('On this day, %2$s, %1$s was here', 'Jono', to_char(current_date, 'DD-MON-YYYY'));
  6. format
  7. -----------------------------------------
  8. On this day, 29-JUL-2019, Jono was here
  9. yb_demo=# select format('SELECT %2$I, %3$I from %1$I where name = %4$L', 'users', 'birth_date', 'email', 'Brody O''Reilly');
  10. format
  11. --------------------------------------------------------------------
  12. SELECT birth_date, email from users where name = 'Brody O''Reilly'

Substituting text

Substituting text with other text can be a complex task as you need to fully understand the scope of the data that the functions can be subject to. A common occurrence is failure due to an unexpected value being passed through, like NULL, an empty string '', or a value that YugabyteDB would interpret as a different data type like true or 3.

The treatment of nulls in mathematical operations is often problematic, as is string joins as joining a null to a value results in a null. Coalescing the inputs will avoid these issues as shown in the examples below.

  1. yb_demo=# select trunc(avg(coalesce(discount,0))::numeric,3) AS "COALESCED", trunc(avg(discount)::numeric,3) AS "RAW" from orders;
  2. COALESCED | RAW
  3. -----------+-------
  4. 0.530 | 5.195
  5. yb_demo=# select 'Hello ' || null AS GREETING, 'Goodbye ' || coalesce(null, 'Valued Customer') AS GOODBYE;
  6. greeting | goodbye
  7. ----------+-------------------------
  8. | Goodbye Valued Customer

The above shows how substituting when null can have a significant impact upon the results you achieve or even the behaviour of your application. Below concentrates on changing existing text with other text.

  1. yb_demo=# select overlay(password placing 'XXXXXXXXXXXXXXX' from 1 for length(password)) AS SCRAMBLED from users limit 5;
  2. scrambled
  3. -----------------
  4. XXXXXXXXXXXXXXX
  5. XXXXXXXXXXXXXXX
  6. XXXXXXXXXXXXXXX
  7. XXXXXXXXXXXXXXX
  8. XXXXXXXXXXXXXXX
  9. yb_demo=# select regexp_replace('Hi my number is +999 9996-1234','[[:alpha:]]','','g');
  10. regexp_replace
  11. --------------------
  12. +999 9996-1234
  13. yb_demo=# select 'I think I can hear an ' || repeat('echo.. ', 3) AS CLICHE;
  14. cliche
  15. ---------------------------------------------
  16. I think I can hear an echo.. echo.. echo..
  17. yb_demo=# select replace('Gees I love Windows', 'Windows', 'Linux') AS OBVIOUS;
  18. obvious
  19. -------------------
  20. Gees I love Linux

The REGEXP_REPLACE function along with the other REGEX functions require an entire chapter to themselves with the sophistication that can be achieved - which is well beyond this scope of this introductory walk through. The example above strips out all characters of the alphabet and replaces them with an empty string. The ‘g’ flag is ‘global’ that results in the replace to occur throughout the entire string, without the ‘g’ flag the replace will stop after the first substitution. Note that the result contains spaces which is why it appears odd. You might think that this example shows an extraction of non-alphabetical characters, but it is just replacing them with an empty string.

Extracting text

There are several ways of extracting text from text, in some cases it might be part of ‘cleaning’ the text, note that removing leading or trailing spaces is covered by the trim functions shown above. The remaining functions here show how parts of text can be manipulated.

  1. yb_demo=# select left('123456', 3);
  2. left
  3. ------
  4. 123
  5. yb_demo=# select right('123456', 3);
  6. right
  7. -------
  8. 456
  9. yb_demo=# select substr('123456', 3);
  10. substr
  11. --------
  12. 3456
  13. yb_demo=# select substr('123456', 3, 2);
  14. substr
  15. --------
  16. 34
  17. yb_demo=# select substr('123456', position('4' in '123456')+1, 2);
  18. substr
  19. --------
  20. 56
  21. yb_demo=# select substring('123456', position('4' in '123456')+1, 2);
  22. substring
  23. -----------
  24. 56
  25. yb_demo=# select replace(substr(email, position('@' in email)+1, (length(email)
  26. -position('.' in substr(email, position('@' in email)+1)))), '.com', '') AS "Domain", count(*)
  27. from users
  28. group by 1;
  29. Domain | count
  30. ---------+-------
  31. hotmail | 813
  32. yahoo | 838
  33. gmail | 849

NoteThe command SUBSTRING has overloaded equivalents that accept POSIX expressions. The above example shows you the simple use of SUBSTRING which can also be used as SUBSTR. therefore it is recommended to only use the full SUBSTRING command when using POSIX.

As stated above for REGEXP_REPLACE, the full explanation of regular expressions requires its own comprehensive documentation that is not covered here. Here is an example illustrating its use.

  1. yb_demo=# select name as Fullname, regexp_match(name, '(.*)(\s+)(.*)') AS "REGEXED Name",
  2. (regexp_match(name, '(.*)(\s+)(.*)'))[1] AS "First Name",
  3. (regexp_match(name, '(.*)(\s+)(.*)'))[3] AS "Last Name"
  4. from users limit 5;
  5. fullname | REGEXED Name | First Name | Last Name
  6. ----------------+----------------------+------------+-----------
  7. Jacinthe Rowe | {Jacinthe," ",Rowe} | Jacinthe | Rowe
  8. Walter Mueller | {Walter," ",Mueller} | Walter | Mueller
  9. Fatima Murphy | {Fatima," ",Murphy} | Fatima | Murphy
  10. Paxton Mayer | {Paxton," ",Mayer} | Paxton | Mayer
  11. Mellie Wolf | {Mellie," ",Wolf} | Mellie | Wolf

NoteIn the example abov, we are asking the ‘name’ column to be segmented by the existence of a space (\s) and then reporting the first and third set of text reported by the match. The regular expression returns a text array, not a text value, and thus you have to reference the array index to access the value as text. Note that this SQL would be very vulnerable to errors caused by data entry, including a middle name or missing either a first or last name would cause errors.

Now, let’s look at some manipulation and splitting of text so that you can process it in pieces. For this example, I will be using a sample extract from a bank file that is used for processing payments. This example could apply if the entire file was uploaded as a single text entry into a table and you select it and then process it.

  1. yb_demo=# create table bank_payments(bank_file text);
  2. CREATE TABLE
  3. yb_demo=# insert into bank_payments values($$"CMGB","1.0","95012141352105","999999","30128193018492","20","","GBP","B","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","","909170/1","AB"
  4. "CMGB","1.0","95012141352105","999999","95012113864863","10.00","","GBP","B","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","Remitters name 18","Tech ref for automatic processing5","AT","/t.x",
  5. "CMGB","1.0","95012141352105","","30128193018492","21","","GBP","C","Beneficiary name18","Txt on senders acc","","Txt for credit acc","","","","","909175/0","AB"
  6. "CMGB","1.0","95012141352105","","30128193018492","22","","GBP","I","Beneficiary name18","Txt on senders acc","text","","","","","","909175/1","AB"
  7. "CMGB","1.0","95012141352105","","30128193018492","23","","GBP","F","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","","909171/0","AB"$$);
  8. INSERT 0 1
  9. yb_demo=# select regexp_split_to_table(bank_file, chr(10)) from bank_payments;
  10. regexp_split_to_table
  11. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  12. "CMGB","1.0","95012141352105","999999","30128193018492","20","","GBP","B","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","","909170/1","AB"
  13. "CMGB","1.0","95012141352105","999999","95012113864863","10.00","","GBP","B","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","Remitters name 18","Tech ref for automatic processing5","AT","/t.x",
  14. "CMGB","1.0","95012141352105","","30128193018492","21","","GBP","C","Beneficiary name18","Txt on senders acc","","Txt for credit acc","","","","","909175/0","AB"
  15. "CMGB","1.0","95012141352105","","30128193018492","22","","GBP","I","Beneficiary name18","Txt on senders acc","text","","","","","","909175/1","AB"
  16. "CMGB","1.0","95012141352105","","30128193018492","23","","GBP","F","Beneficiary name18","Txt on senders acc","Txt for credit acc","","","","","","909171/0","AB"
  17. yb_demo=# select split_part(f.line, ',' , 8) AS "currency",
  18. split_part(f.line, ',' , 5) AS "Account"
  19. from (select regexp_split_to_table(bank_file, chr(10)) AS "line" from bank_payments) AS f;
  20. currency | Account
  21. ----------+------------------
  22. "GBP" | "30128193018492"
  23. "GBP" | "95012113864863"
  24. "GBP" | "30128193018492"
  25. "GBP" | "30128193018492"
  26. "GBP" | "30128193018492"

Remember to drop the table ‘bank_payments’ if it is no longer required.

  1. yb_demo=# select reverse(translate(replace(lower(i.input), ' ', ''),
  2. 'abcdefghijklmnopqrstuvwxyz',
  3. 'A8Cd349h172!mN0pQr$TuVw*yZ')) AS "simplePWD"
  4. from (select 'type a word here' AS "input") AS i;
  5. simplePWD
  6. ---------------
  7. 3r3hdr0wA3pyT

The TRANSLATE command above will replace multiple different characters in a single command which can be useful. In the example above, the ‘a’ is replaced with a ‘A’, and ‘b’ is replaced with the number ‘8’, and so forth.

Obtaining information of text

Rather than format or change the contents of text, you often might want to understand particular attributes of the text. Below are some examples of using commands to return information of the text.

  1. yb_demo=# select x.c AS CHAR, ascii(x.c) AS ASCII
  2. from (select regexp_split_to_table(i.input, '') AS "c"
  3. from (select 'hello' AS input) AS i) AS x;
  4. char | ascii
  5. ------+-------
  6. h | 104
  7. e | 101
  8. l | 108
  9. l | 108
  10. o | 111
  11. yb_demo=# select bit_length('hello'), char_length('hello'), octet_length('hello');
  12. bit_length | char_length | octet_length
  13. ------------+-------------+--------------
  14. 40 | 5 | 5
  15. yb_demo=# select array_agg(chr(ascii(x.c))) AS "CHAR"
  16. from (select regexp_split_to_table(i.input, '') AS "c"
  17. from (select 'hello' AS input) AS i) AS x;
  18. CHAR
  19. -------------
  20. {h,e,l,l,o}
  21. yb_demo=# select avg(length(name))::int AS AVG_LENGTH from users;
  22. avg_length
  23. ------------
  24. 14
  25. yb_demo=# select name from users
  26. where position('T' in name) > 2
  27. and position('p' in name) = length(name)
  28. order by name;
  29. name
  30. -----------------
  31. Cory Tromp
  32. Demario Tromp
  33. Demetris Tromp
  34. Deon Tromp
  35. Emelia Tromp
  36. Ivah Tromp
  37. Jany Torp
  38. Jared Tromp
  39. Judd Tromp
  40. Larue Torp
  41. Magdalen Torp
  42. Margarita Tromp
  43. Marjolaine Torp
  44. Patrick Torp
  45. Porter Tromp
  46. Rebeka Tromp
  47. yb_demo=# select name, position('ar' in name) AS posn, strpos(name, 'ar') as strpos
  48. from users
  49. where strpos(name, 'ark') > 0
  50. order by name desc limit 10;
  51. name | posn | strpos
  52. ----------------+------+--------
  53. Yasmin Stark | 10 | 10
  54. Veronica Stark | 12 | 12
  55. Tamia Larkin | 8 | 8
  56. Stewart Marks | 5 | 5
  57. Ryann Parker | 8 | 8
  58. Rudy Larkin | 7 | 7
  59. Rodolfo Larkin | 10 | 10
  60. Novella Marks | 10 | 10
  61. Markus Hirthe | 2 | 2
  62. Mark Klein | 2 | 2
  63. yb_demo=# select m.name
  64. from (select to_char(generate_series, 'Month') AS name
  65. from generate_series(current_date-364, current_date, '1 month')) AS m
  66. where starts_with(m.name, 'J');
  67. name
  68. -----------
  69. January
  70. June
  71. July

Something a bit more advanced

If you like a bit of a challenge, below is an example that URL escapes a string. There is still some more room for tweaking in its current form, that is left for you to do.

  1. yugabyte=# select string_agg(case
  2. when to_hex(ascii(x.arr::text))::text
  3. in('20','23','24','25','26','40','60','2b','2c','2f','3a','3b','3c','3d','3e','3f',
  4. '5b','5c','5d','5e','7b','7c','7d') then '%' || to_hex(ascii(x.arr::text))::text
  5. else x.arr
  6. end, '') AS "url_escaped"
  7. from (select regexp_split_to_table('www.url.com/form?name="My name"&dob="1/1/2000"&email="[email protected]"', '')) AS x (arr);
  8. url_escaped
  9. ------------------------------------------------------------------------------------------------
  10. www.url.com%2fform%3fname%3d"My%20name"%26dob%3d"1%2f1%2f2000"%26email%3d"hello%40example.com"

Conclusion

Text or strings are part of every conceivable system. YugabyteDB provides you with comprehensive capabilities to manage and manipulate all your text within the database.