Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for null values in Elastic-like bulk requests #2363

Closed
5 tasks done
Nick-S-2018 opened this issue Jul 2, 2024 · 10 comments
Closed
5 tasks done

Adding support for null values in Elastic-like bulk requests #2363

Nick-S-2018 opened this issue Jul 2, 2024 · 10 comments
Assignees

Comments

@Nick-S-2018
Copy link
Collaborator

Nick-S-2018 commented Jul 2, 2024

Bug Description:

As of now, Manticore is unable to process fields with null values that can be passed in _bulk requests. E.g.:

curl -sX POST http://localhost:9308/_bulk 
-H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "test1" } }
{ "title" : "123"}
'
{"items":[{"index":{"_index":"test1","_type":"doc","_id":"0","_version":1,"result":"created","_shards":{"total":1,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1,"status":201}}],"errors":false,"took":1}

curl localhost:9308/_bulk -H "Content-Type: application/x-ndjson" -d '
{ "index" : { "_index" : "test2" } }
{ "title" : null}
'
{"error":"unsupported value type 'null' in field 'title'"}

We need to add support for them since it's necessary for integration with different Elastic tools, like Logstash.

Manticore Search Version:

Manticore 6.3.3 1de3cb364@24062617 dev

Operating System Version:

Ununtu 22.04 LTS

Have you tried the latest development version?

Yes

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@tomatolog
Copy link
Contributor

fixed at 8193943 now HTTP JSON POST requests that populate data accepts null as full-text or string value

@zhangsanhuo
Copy link

fixed at 8193943 now HTTP JSON POST requests that populate data accepts null as full-text or string value

Could you please add support for manticore fields with bigint attributes and timestamp with Null value?

@tomatolog
Copy link
Contributor

could be better to write your suggestions there along with requests examples #2312

@sanikolaev
Copy link
Collaborator

support for manticore fields with bigint attributes

Manticore supports bigint attributes - https://manual.manticoresearch.com/Creating_a_table/Data_types#Big-Integer

Is there any problem with it?

timestamp with Null value

The related issue seems to be #2236

@zhangsanhuo
Copy link

zhangsanhuo commented Jul 3, 2024

could be better to write your suggestions there along with requests examples #2312

Yes, my test data is as follows, create a RT table:

drop table t1;CREATE TABLE t1 (
id bigint,
up_date text,
name text,
@version text,
@timestamp timestamp,
unix_ts_in_secs integer,
number bigint,
pub_date timestamp
) charset_table='non_cjk,cjk' morphology='icu_chinese,stem_en';

logstash synchronization file:

input {
  jdbc {
    type => "t1"
    jdbc_driver_library => "/root/mysql-connector-j-8.0.32/mysql-connector-j-8.0.32.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/test"
    jdbc_user => "root"
    jdbc_password => "root"
    jdbc_paging_enabled => true
    jdbc_page_size => "10"
    tracking_column => "unix_ts_in_secs"
    #record_last_run => "true"
    record_last_run => "false"
    last_run_metadata_path => "/root/logstash_jdbc_last_run"
    clean_run => "false"
    use_column_value => true
    tracking_column_type => "numeric"
    schedule => "*/5 * * * * *"
    statement => "SELECT *, UNIX_TIMESTAMP(up_date) AS unix_ts_in_secs FROM t1 WHERE (UNIX_TIMESTAMP(up_date) > :sql_last_value AND up_date < NOW()) ORDER BY up_date ASC"
  }
}

output {
  stdout { codec =>  "rubydebug"}
    elasticsearch {
      index => "t1"
      hosts => ["http://localhost:9308"]
    }
}

MySQL table structure and data:

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT '',
  `number` varchar(11) DEFAULT NULL,
  `pub_date` datetime DEFAULT current_timestamp(),
  `up_date` datetime DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=14 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

INSERT INTO `test`.`t1` (`id`, `name`, `number`, `pub_date`, `up_date`) VALUES (1, 'kevin1', '1000001', NULL, '2024-07-03 16:47:48');
INSERT INTO `test`.`t1` (`id`, `name`, `number`, `pub_date`, `up_date`) VALUES (2, 'kevin2', NULL, '2024-07-03 16:51:11', '2024-07-03 16:53:18');

I created two examples, one with number(bigint in RT table) as Null value and the other with pub_date(timestamp in RT table) as Null value.

When using logstash to synchronize data to manticore, an error is reported:

[2024-07-03T08:47:50,171][ERROR][logstash.outputs.elasticsearch][main][30a15ecf5e6cbe42f11b334f8da41fd6cfa5267d46f7b11973efecaf11a13318] Encountered a retryable error (will retry with exponential backoff) {:code=>409, :url=>"http://localhost:9308/_bulk", :content_length=>233}
[2024-07-03T08:53:34,394][ERROR][logstash.outputs.elasticsearch][main][30a15ecf5e6cbe42f11b334f8da41fd6cfa5267d46f7b11973efecaf11a13318] Encountered a retryable error (will retry with exponential backoff) {:code=>409, :url=>"http://localhost:9308/_bulk", :content_length=>250}
Data synchronization cannot be performed.

@zhangsanhuo
Copy link

Because in MySQL, the content of a datetime type field cannot be set to an empty string, but it can be set to Null. Therefore, when I try to synchronize the datetime field (part of the content is Null) in the MySQL database to manticore through logstash, an error occurs and the data cannot be synchronized.

@sanikolaev
Copy link
Collaborator

@zhangsanhuo did you close this issue as a duplicate of #2236?

@zhangsanhuo
Copy link

@zhangsanhuo did you close this issue as a duplicate of #2236?

I'm sorry, I don't understand what this means. I don't know much about Githup. Could you explain what it means? Thank you for your patience.

@sanikolaev
Copy link
Collaborator

Nevermind, @zhangsanhuo. My apologies. I thought you had closed this issue and was wondering why, since there seem to be unresolved matters. However, I now see it was initially created for a different topic and was closed after the initial request was addressed. If there's another known problem left can you please create a separate issue about it, so we don't get confused about different topics in the same issue?

@zhangsanhuo
Copy link

Nevermind, @zhangsanhuo. My apologies. I thought you had closed this issue and was wondering why, since there seem to be unresolved matters. However, I now see it was initially created for a different topic and was closed after the initial request was addressed. If there's another known problem left can you please create a separate issue about it, so we don't get confused about different topics in the same issue?

Thanks for your help, I created a new issue of #2380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants