fix: change default similarity_metric for milvus#1000
fix: change default similarity_metric for milvus#1000e7217 wants to merge 1 commit intoMarker-Inc-Korea:mainfrom
Conversation
|
Hi, I thought Milvus is supporting cosine similarity metric according to the docs of Milvus. |
|
@e7217 But is it solved your Milvus issue? Then I will happy to merge it. |
|
@vkehfdl1 I made an assumption based on the log messages. From the log, it appears that it shows |
@vkehfdl1 Sorry, I found that the default metric type is COSINE in the pymilvus repo. |
@vkehfdl1 This issue occurred because the Milvus server version is lower than 2.3.0. For those who are experiencing the same issue, I hope this information helps. Thank you. |
|
This fix might add another error as side effect, regarding the default metric for milvus search. I am trying to use milvus-lite (that is supported by pymilvus 2.4.2). I am able to ingest correctly my data using the following config: vectordb:
- name: bge_m3_milvus
db_type: milvus
embedding_model: huggingface_bge_m3
collection_name: huggingface_bge_m3
uri: local_milvus.db
embedding_batch: 50
index_type: FLAT
params:
metric_type: ""But when the search module kicks in: [12/03/24 00:51:04] INFO [_client.py:1038] >> HTTP Request: GET https://api.gradio.app/gradio-messaging/en "HTTP/1.1 _client.py:1038
200 OK"
[12/03/24 00:51:08] INFO [evaluator.py:228] >> Embedding BM25 corpus... evaluator.py:228
INFO [evaluator.py:248] >> BM25 corpus embedding complete. evaluator.py:248
INFO [SentenceTransformer.py:218] >> Load pretrained SentenceTransformer: BAAI/bge-m3 SentenceTransformer.py:218
[12/03/24 00:51:10] INFO [SentenceTransformer.py:357] >> 2 prompts are loaded, with the keys: ['query', SentenceTransformer.py:357
'text']
INFO [connections.py:381] >> Pass in the local path local_milvus.db, and run it using connections.py:381
milvus-lite
[12/03/24 00:51:12] INFO [evaluator.py:205] >> Running node line retrieve_node_line... evaluator.py:205
INFO [node.py:55] >> Running node retrieval... node.py:55
INFO [run.py:165] >> Running retrieval node - semantic retrieval module... run.py:165
INFO [base.py:18] >> Initialize retrieval node - VectorDB base.py:18
INFO [connections.py:381] >> Pass in the local path local_milvus.db, and run it using connections.py:381
milvus-lite
INFO [base.py:31] >> Running retrieval node - VectorDB module... base.py:31
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|###############################################################################################| 1/1 [00:00<00:00, 10.55it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|##############################################################################################| 1/1 [00:00<00:00, 137.38it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|##############################################################################################| 1/1 [00:00<00:00, 174.38it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|##############################################################################################| 1/1 [00:00<00:00, 187.40it/s]
Batches: 0%| | 0/1 [00:00<?, ?it/s]
Batches: 100%|##############################################################################################| 1/1 [00:00<00:00, 182.19it/s]
ERROR [decorators.py:147] >> RPC error: [search], <MilvusException: (code=1100, message=fail to decorators.py:147
search: metric type not match: invalid [expected=][actual=L2]: invalid parameter)>,
<Time:{'RPC start': '2024-12-03 00:51:12.248113', 'RPC error': '2024-12-03
00:51:12.248668'}>
ERROR [decorators.py:147] >> RPC error: [search], <MilvusException: (code=1100, message=fail to decorators.py:147
search: metric type not match: invalid [expected=][actual=L2]: invalid parameter)>,
<Time:{'RPC start': '2024-12-03 00:51:12.251065', 'RPC error': '2024-12-03
00:51:12.251423'}>
ERROR [decorators.py:147] >> RPC error: [search], <MilvusException: (code=1100, message=fail to decorators.py:147
search: metric type not match: invalid [expected=][actual=L2]: invalid parameter)>,
<Time:{'RPC start': '2024-12-03 00:51:12.253529', 'RPC error': '2024-12-03
00:51:12.253798'}>
ERROR [decorators.py:147] >> RPC error: [search], <MilvusException: (code=1100, message=fail to decorators.py:147
search: metric type not match: invalid [expected=][actual=L2]: invalid parameter)>,
<Time:{'RPC start': '2024-12-03 00:51:12.255825', 'RPC error': '2024-12-03
00:51:12.256095'}>
ERROR [decorators.py:147] >> RPC error: [search], <MilvusException: (code=1100, message=fail to decorators.py:147
search: metric type not match: invalid [expected=][actual=L2]: invalid parameter)>,
<Time:{'RPC start': '2024-12-03 00:51:12.258249', 'RPC error': '2024-12-03
00:51:12.258537'}>
Ingesting VectorDB... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:03
Evaluating... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0% 0/4 0:00:00As you can see in the error message, the L2 metric is passed for search, even if metric_type param is set to empty string. |
|
@elsatch Hi! What will be the distance metric do you want to use? You can configure it through "similarity_metric" key in the config YAML file. Like this! If you want to "already ingested" vectordb, simply use below. |
|
@elsatch vectordb:
- name: autorag_2024_xx_xx
db_type: milvus
embedding_model: openai
collection_name: autorag_2024_xx_xx
uri: http://192.xxx.xxx.xxx:19530
embedding_batch: 50
similarity_metric: l2
index_type: IVF_FLAT
params:
nlist : 16384Please refer to the following PR for details: @vkehfdl1 |
|
description
Milvus not supported
similarity_metric:cosineand only L2 and IP are supported. Therefore, I updated the default similarity_metric.log