The Unstructured open-source library does not offer built-in support for calling embedding providers to obtain embeddings for pieces of text. Alternatively, the Unstructured Ingest CLI and the Unstructured Ingest Python library offer built-in support for calling embedding providers as part of an ingest pipeline. Learn how. Also, you can use common third-party tools and libraries to get embeddings for document elements’ text within JSON files that are produced by calling the Unstructured open-source library. For example, the following sample Python script:Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
- Takes an Unstructured open-source library-generated JSON file as input.
- Reads in the JSON file’s contents as a JSON object.
- Uses the sentence-transformers/all-MiniLM-L6-v2
model on Hugging Face to generate embeddings for each
textfield of each document element in the JSON file. - Adds the generated embeddings next to each corresponding
textfield in the original JSON. - Saves the results back to the original JSON file.
Python

