How to Create a Custom Index
This guide walks you through creating a custom index tailored to your specific document structure and search requirements. By defining custom fields, search profiles, and display settings, you can optimize how documents are indexed, searched, and presented to users.
For comprehensive configuration options and detailed specifications, refer to the Index Reference.
This guide follows a step-by-step approach to building a complete index configuration in JSON format. By the end, you'll have a working example index configuration that you can adapt to your needs.
Prerequisites
Before creating a custom index, ensure you have:
- Access to the Zeta Alpha Platform UI with appropriate permissions
- An existing tenant (see Create a Tenant if you haven't created one)
Step 1: Define General Index Settings
Start by identifying the basic configuration for your index:
name
: Unique identifier for the index (e.g., "product-catalog")description
: Brief explanation of the index's purposetenant
: The tenant this index belongs todefault
: Set tofalse
if your tenant already has a default index (only one default index is allowed per tenant)features
: Enable specific index capabilitiesneural_search
: Enable semantic search powered by vector embeddingsmodel_serving_url
: URL of your embeddings serviceembedding_dimension
: Dimension of the embedding vectors (e.g., 768 for many transformer models)
tags
: Enable document tagging by including an empty object
Example configuration:
{
"name": "product-catalog",
"description": "Index for e-commerce product listings",
"tenant": "acme-corp",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.production.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
}
}
Step 2: Configure Cluster Connection
Specify how to connect to your OpenSearch cluster. Currently, OpenSearch is the only supported backend. If you require a different backend, contact your support team.
Required configurations:
backend
: Index backend type (currently only "opensearch" is supported)host
: OpenSearch cluster hostnameport
: OpenSearch port (typically 9200)settings
(optional): Connection-specific settingsuse_ssl
: Whether to use SSL/TLS for connectionshttp_auth
: Authentication credentials as["username", "password"]
verify_certs
: Whether to verify SSL certificates
Example configuration:
{
"cluster_connection": {
"backend": "opensearch",
"host": "opensearch-cluster-master-headless.opensearch.svc.cluster.local",
"port": 9200,
"settings": {
"use_ssl": true,
"http_auth": ["admin", "secure_password"],
"verify_certs": true
}
}
}
Step 3: Configure Storage Settings
Define where document data is stored at different stages of the ingestion pipeline:
- Ingestion storage: Stores the initial document requests as they arrive
- Processing storage: Stores intermediate transformations and processing artifacts
Storage Backend Options
You can use either AWS S3 or Azure Blob Storage as your storage backend. Choose the configuration that matches your infrastructure.
Option 1: AWS S3 Storage
Configure S3 storage with the following settings:
backend
: Set to"s3"
s3_bucket_name
: S3 bucket for storing documentss3_key_prefix
: Prefix for organizing objects within the bucketmax_file_size
: Maximum file size in bytes (documents exceeding this will be rejected)
AWS credentials (optional):
aws_access_key_id
: AWS access key (prefer IAM roles when possible)aws_secret_access_key
: AWS secret key (prefer IAM roles when possible)aws_region
: AWS region (e.g., "us-east-1")aws_endpoint_url
: Custom endpoint for S3-compatible services (MinIO, LocalStack, etc.)
When running on AWS, use IAM roles for service accounts (IRSA) instead of static credentials. This approach:
- Eliminates the need to store sensitive keys in configuration
- Provides automatic credential rotation
- Enables fine-grained permission management
- Improves overall security posture
Only use static AWS credentials when IRSA is unavailable or when connecting to non-AWS S3-compatible services.
Example S3 configuration:
{
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "ingestion",
"aws_region": "us-east-1"
},
"max_file_size": 209715200
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "processing",
"aws_region": "us-east-1"
}
}
}
}
Option 2: Azure Blob Storage
Configure Azure Blob Storage with the following settings:
backend
: Set to"azure"
azure_account_url
: Azure Blob Storage account URL (format:https://your-account.blob.core.windows.net
)azure_container_name
: Container name in Azure Blob Storageazure_blob_prefix
: Prefix for organizing blobs within the containermax_file_size
: Maximum file size in bytes (documents exceeding this will be rejected)
Azure credentials (optional):
azure_credential
: Azure credential string (prefer managed identities when possible)
When running on Azure, use managed identities (workload identity for AKS) instead of static credentials. This approach:
- Eliminates the need to store sensitive keys in configuration
- Provides automatic credential rotation
- Enables fine-grained permission management
- Improves overall security posture
Only use the azure_credential
field when managed identities are unavailable or when connecting from non-Azure environments.
Example Azure Blob Storage configuration:
{
"storage_settings": {
"ingesting": {
"backend": "azure",
"azure": {
"azure_account_url": "https://acmestorage.blob.core.windows.net",
"azure_container_name": "documents",
"azure_blob_prefix": "ingestion"
},
"max_file_size": 209715200
},
"processing": {
"backend": "azure",
"azure": {
"azure_account_url": "https://acmestorage.blob.core.windows.net",
"azure_container_name": "documents",
"azure_blob_prefix": "processing"
}
}
}
}
Step 4: Define Document Fields
Define the schema for your documents by specifying fields, their types, and how they behave during indexing and search.
Field Configuration Properties
For each field, configure:
-
name
: Field identifier (e.g., "product_id", "brand") -
type
: Data type - common types include:"string"
: Text content"date"
: Timestamps and dates"number"
: Numeric values (integers or floats)"geolocation"
: Geographic coordinates
-
alias
(optional): Alternative field name with special system behaviors:"metadata.DCMI.title"
: Marks this field as the document title (used for display and embedding generation)"metadata.DCMI.abstract"
: Designates the document description (used for display and embedding generation)"metadata.DCMI.created"
: Identifies the creation date field (defaults to ingestion date if not provided; used for sorting)
-
search_options
: Controls field behavior during indexing and retrieval:is_sort_field
: Allow sorting search results by this fieldis_facet_field
: Enable faceted search (return value counts for filtering)is_filter_field
: Allow filtering documents by this fieldis_returned_in_search_results
: Include this field in search API responsesis_used_in_search
: Include this field in full-text search queries
Example: E-commerce Product Index
For a product catalog index, you might define:
{
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
]
}
Step 5: Define Search Profiles (Optional)
Search profiles allow you to fine-tune search relevance by configuring how documents are ranked for different query types. You can boost specific fields, adjust scoring weights, and optimize for your use case.
Example: Boosting Brand Matches
This profile increases the relevance of documents when the search query matches the brand name:
{
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2.0
}
}
]
}
}
}
]
}
}
This configuration doubles the relevance score when a query term matches the brand field, making branded searches more effective.
For comprehensive search relevance options including semantic search tuning, reranking, and advanced boosting strategies, refer to the search_profiles_configuration
section in the API reference. Additional boosting capabilities can be configured upon request.
Step 6: Configure Client Display Settings
Client settings control how documents appear in the Zeta Alpha Navigator UI, including card layouts, filters, and sorting options.
Display Configuration
Map document fields to UI display elements:
title_field
: Field to display as the document title (e.g., "product_name")description_field
: Field to display as the document description (e.g., "product_description")source_field
: Field to display as the document source (e.g., "brand")date_field
: Field to display as the document dateurl_field
: Field containing the link to the source contentimage_url_field
: Field containing the document's image URL
Search Filter Configuration
Define which filters appear in the UI and how they behave:
field_name
: Field to filter by (must match a field indocument_fields_configuration
)display_name
: User-friendly name shown in the UIfilter_type
: Widget type for the filter (e.g., "checkbox", "faceted_checkbox", "range")url_param
: URL parameter name for sharing filtered search URLsfilter_type_settings
: Type-specific configuration:checkbox
: Static list of filter optionsvalues
: Array of options withlabel
(display name) andvalue
(filter value)
Example configuration:
{
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}
Step 7: Create the Index
Once you've assembled all configuration sections, create your index using the Platform UI:
-
Navigate to the Indexes section in the Platform UI
-
Click View to see the list of existing indexes
-
Click Create Index
-
Paste your complete JSON configuration into the editor
-
Review and submit
Your new index will appear in the indexes list and be ready to accept documents.
Next Steps
- Create a custom connector: Build a content source to ingest documents into your new index
- Configure search profiles: Fine-tune search relevance for your use case
Example JSON Index Configuration
Here's the complete configuration we built throughout this guide, ready to use as a template:
{
"name": "product-catalog",
"description": "Index for e-commerce product listings",
"tenant": "acme-corp",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.production.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
},
"cluster_connection": {
"backend": "opensearch",
"host": "opensearch-cluster-master-headless.opensearch.svc.cluster.local",
"port": 9200,
"settings": {
"use_ssl": true,
"http_auth": ["admin", "secure_password"],
"verify_certs": true
}
},
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "ingestion"
},
"max_file_size": 209715200
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "processing"
}
}
},
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
],
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2.0
}
}
]
}
}
}
]
},
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}
Step 5: (Optional) Define a search profile to tune the search experience
You can define a search profile to tune the search relevance. A search profile is a set of rules that define how the search engine should rank the documents.
For this example, we will define a profile to boost the relevance of keyword searches when the query is found in the brand
field.
{
...
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2
}
}
]
}
}
}
]
},
...
}
For a complete set of options and configurations, please refer to the search_relevance_configuration
section in the index configuration. Also, more boosting capabilities can be added upon requests.
Step 6: Identify the client settings
The client settings are a set of settings that further customize default behavior on the Zeta Alpha Navigator. For this guide we are only setting the following configurations:
display_configuration
: Defines how the index content is rendered in the frontend. The field names refer to the names provided in thedocument_fields_configuration
.title_field
: The field used for rendering the title of the document card. For example, theproduct_name
field.description_field
: The field used for rendering the description of the document card. For example, theproduct_description
field.source_field
: The field used for rendering the source of the product. For example, thebrand
field.
search_filters_configuration
: When defined, the search filters will be limited to the ones defined in this list of filters.field_name
: This refers to the field name in thedocument_fields_configuration
that this filter will filter by. For example, thebrand
field.display_name
: The name that will be displayed as the filter name in the front end. For example, "Brand".filter_type
: Identifier of the filter type, this string is used by the front end to choose the widget that will display this filter. For example, "checkbox".url_param
: This string will be used by front end to display in the url as a url param. For example, "brand".filter_type_settings
: Filter specific configuration, this could include default values and display names.checkbox
: Display configuration for checkboxes.values
: List of values to be displayed and filtered by in the checkbox. On this example is a list of brandslabel
: Display name of the value to filter by. For example, "Apple".value
: Value to filter by. For example, "apple".
For example:
{
...
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}
Step 7: Create the index
Finally, compile all your settings into one JSON file, as shown in this example JSON index configuration.
To create a new index, you can use the Zeta Alpha Platform UI, click on View
to view the indexes,
then click on
Create Index
and paste your JSON configuration.
Your new index will be now listed in the indexes list.
Next Steps
Example JSON index configuration
{
"name": "my_custom_index",
"description": "This is a custom index",
"tenant": "test-tenant",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.namespace.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
},
"cluster_connection": {
"backend": "opensearch",
"host": "localhost",
"port": 9200,
"settings": {
"use_ssl": false,
"http_auth": [
"username",
"password"
],
"verify_certs": false
}
},
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "my-bucket",
"s3_key_prefix": "document-requests"
},
"max_file_size": 204857600
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "my-bucket",
"s3_key_prefix": "zeta-objects"
}
}
},
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
],
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2
}
}
]
}
}
}
]
},
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}