Getting Started with the Search API

This guide will walk you through the basics of using the Zeta Alpha Search API. You'll learn how to perform simple searches, apply filters on your document fields and on the access rights of the documents, get facet results and also how to use different retrieval methods including keyword, neural and hybrid search along with a reranker.

Prerequisites

Before you begin, make sure you have:

An active Zeta Alpha account with API access
An API key with the necessary permissions
The base URL for the API: https://api.zeta-alpha.com/v0/service/documents/search

Step 1: Performing a Simple Search

Let's start with a basic search query. We'll use the mixed retrieval method which is a good default method since it performs a hybrid search that combines keyword and neural search.

First, set up the following environment variables:

TENANT=<tenant>
ZA_API_KEY=<your-api-key>

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  --header 'Content-Type: application/json' \
  --header "X-Auth: $ZA_API_KEY" \
  --data '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "semantic search",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10
  }'

This request will return the first 10 results of documents related to the query "semantic search". Of course, by design, only documents that the user has access to will be returned.

The response will contain the list of hits of the aforementioned documents under the hits field, where each hit contains a set of fields defined by the system and a set of tenant-specific metadata fields as they were configured during the creation of the tenant's index. The tenant-specific fields are placed under the custom_metadata field of the hit along with the representations field that contains the data extracted from the document during the ingestion.

For example, for a tenant index with the following custom metadata fields:

source: The source of the document
authors: The authors of the document
date: The publication date of the document
category: The category of the document

a hit object would look like this:

{
	"document_id": "abcd01abc0ab0a012a01234a01a0a0123a012345",
	"_id": "abcd01abc0ab0a012a01234a01a0a0123a012345_0",
	"uri": "https://example.com/document/abcd01abc0ab0a012a01234a01a0a0123a012345",
	"uri_hash": "abcd01abc0ab0a012a01234a01a0a0123a012345_0",
	"content_source_id": "123e4567-e89b-12d3-a456-426614174000", // identifier of the content source of the document
	"content_ingestion_job_id": "340c2441-fg8d-5099-bg13-d2bfe3dg3349", // identifier of the job that ingested the document to the content source
	"highlight": "",
	"document_content": [
		{
			"content_type": "application/pdf",
			"content_path": {
				"url_content": {
					"url": "document-assets/abcd01abc0ab0a012a01234a01a0a0123a012345/pdf_url"
				}
			}
		},
		{
			"content_type": "text/plain",
			"content_path": {
				"url_content": {
					"url": "document-assets/abcd01abc0ab0a012a01234a01a0a0123a012345/text_url"
				}
			}
		},
		{
			"content_type": "image/jpeg",
			"content_path": {
				"url_content": {
					"url": "document-assets/abcd01abc0ab0a012a01234a01a0a0123a012345/image_url"
				}
			}
		}
	],
	"custom_metadata": {
		"source": "Towards Data Science",
		"authors": ["John Doe", "Bob Smith"],
		"date": "2022-01-01",
		"category": "Artificial Intelligence",
		"representations": {
			"text": "Semantic search uses the power of neural networks to understand the context and meaning of user queries, providing semantically relevant results, opposed to keyword-based search that rely on exact keyword matches.",
		} // the text extracted from the ingested document
	},
	"share_uri": "https://example.com/document/abcd01abc0ab0a012a01234a01a0a0123a012345",
	"organize_doc_id": "abcd01abc0ab0a012a01234a01a0a0123a012345_0",
	"score": 18.429789
}

Step 2: Access Rights and Document Privacy

The access rights are in the core of the Zeta Alpha platform since they are attached to the indexed documents during the ingestion process. At retrieval time, the permissions of the user which are defined by the API key are checked against the access rights of the document. In this way, it is ensured that each user can only access the documents that they are authorized to see.

By default all documents that are accessible to the user can be returned in a search response. However, you can further control the access level of the search results by specifying the visibility parameter in the search request, which can be:

own_private: Returns only the private documents of the user
private: Returns the private documents of the user and the user's organization
public: Returns only the public documents

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  --header 'Content-Type: application/json' \
  --header "X-Auth: $ZA_API_KEY" \
  --data '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "semantic search",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10,
    "visibility": ["private"]
  }'

Step 3: Adding filters to the search request

You can narrow down your search results by adding filters on the fields of the index you have configured for your tenant. For example, for the aforementioned index, you can filter the documents with source Towards Data Science as follows:

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  --header 'Content-Type: application/json' \
  --header "X-Auth: $ZA_API_KEY" \
  --data '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "semantic search",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10,
    "filters": {
      "equals_to": {
        "field_path": "custom_metadata.source",
        "field_value": "Towards Data Science"
      }
    }
  }'

You can combine multiple filters using and_operator or or_operator:

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  --header 'Content-Type: application/json' \
  --header "X-Auth: $ZA_API_KEY" \
  --data '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "semantic search",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "filters": {
      "and_operator": [
        {
          "equals_to": {
            "field_path": "custom_metadata.source",
            "field_value": "Towards Data Science"
          }
        },
        {
          "is_in": {
            "field_path": "custom_metadata.category",
            "field_values": ["Artificial Intelligence", "Machine Learning"]
          }
        },
        {
          "greater_than": {
            "field_path": "custom_metadata.date",
            "field_value": "2021-01-01"
          }
        }
      ]
    },
    "page": 1,
    "page_size": 10
}'

note

For the full list of available filters please refer to the full API documentation.

Facets allow you to get aggregated information about your search results. For example, for the aforementioned index, this is how to request facet results for the source field:

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  -H "Content-Type: application/json" \
  -H "X-Auth: YOUR_API_KEY" \
  -d '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "semantic search",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "facets": [
      {
        "field_path": "custom_metadata.source",
        "max_facet_terms": 5,
        "facet_terms_order": {
          "by": "term_count",
          "direction": "desc"
        }
      }
    ],
    "page": 1,
    "page_size": 10
  }'

The above request will return the top 5 sources of the total search results for this query, ordered by the number of documents in each source.

note

For the full list of available facets please refer to the full API documentation.

Step 5: Picking the correct retrieval method

The Search API supports a variety of retrieval methods, which can be further extended and fine-tuned to meet the quality and performance requirements of your tenant.

The retrieval methods that are offered out-of-the-box are:

keyword: Keyword-based search which relies on term frequency, document frequency, field weights and boosting functions to measure the relevance score of the document for a given query.
knn: Neural search using the K-Nearest Neighbors retrieval method. This method is based on the similarity of the embeddings of the query and the documents.
mixed: Hybrid search that combines keyword and knn retrieval methods. It is useful when you want to combine the benefits of both methods.

Fine-tuning on your domain

The search relevance configuration for keyword search, including field weights, boosting functions and more can be tuned per tenant and index.

Also, the embeddings model used for the knn retrieval method can be fine-tuned on your domain.

Finally, it is also possible to configure the way the two methods are combined in the mixed retrieval method.

KNN (K-Nearest Neighbors) Retrieval - Neural Search

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  -H "Content-Type: application/json" \
  -H "X-Auth: YOUR_API_KEY" \
  -d '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "What is semantic search?",
    "retrieval_method": "knn",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10
  }'

Keyword Search

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  -H "Content-Type: application/json" \
  -H "X-Auth: YOUR_API_KEY" \
  -d '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "What is semantic search?",
    "retrieval_method": "keyword",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10
  }'

Mixed Retrieval (Keyword + KNN)

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  -H "Content-Type: application/json" \
  -H "X-Auth: YOUR_API_KEY" \
  -d '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "What is semantic search?",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10
  }'

Step 6: Applying a Reranker

On top of your retrieval method, you can apply a reranker to further refine the search results in order to improve the search quality. The reranker is a cross-encoder model that is trained to rerank the top search results based on the embeddings of the query and the documents.

info

The reranker can be configured and fine-tuned to meet the quality and performance requirements of your tenant.

curl -X POST https://api.zeta-alpha.com/v0/service/documents/search \
  -H "Content-Type: application/json" \
  -H "X-Auth: YOUR_API_KEY" \
  -d '{
    "tenant": "'"$TENANT"'",
    "search_engine": "zeta_alpha",
    "query_string": "What is semantic search?",
    "retrieval_method": "mixed",
    "retrieval_unit": "document",
    "page": 1,
    "page_size": 10,
    "reranker": "true",
    "rerank_top_n": 30
  }'

Conclusion

This guide has introduced you to the basics of using the Zeta Alpha Search API. You've learned how to perform simple searches, apply filters and facets, use different retrieval methods and apply a reranker. For more advanced usage and a complete list of available parameters, please refer to the full API documentation.

If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.

Prerequisites​

Step 1: Performing a Simple Search​

Step 2: Access Rights and Document Privacy​

Step 3: Adding filters to the search request​

Step 4: Getting Facet Results​

Step 5: Picking the correct retrieval method​

KNN (K-Nearest Neighbors) Retrieval - Neural Search​

Keyword Search​

Mixed Retrieval (Keyword + KNN)​

Step 6: Applying a Reranker​

Conclusion​