Skip to main content

How to Create a Custom Index

This guide walks you through creating a custom index tailored to your specific document structure and search requirements. By defining custom fields, search profiles, and display settings, you can optimize how documents are indexed, searched, and presented to users.

info

For comprehensive configuration options and detailed specifications, refer to the Index Reference.

This guide follows a step-by-step approach to building a complete index configuration in JSON format. By the end, you'll have a working example index configuration that you can adapt to your needs.

Prerequisites

Before creating a custom index, ensure you have:

  1. Access to the Zeta Alpha Platform UI with appropriate permissions
  2. An existing tenant (see Create a Tenant if you haven't created one)

Step 1: Define General Index Settings

Start by identifying the basic configuration for your index:

  • name: Unique identifier for the index (e.g., "product-catalog")
  • description: Brief explanation of the index's purpose
  • tenant: The tenant this index belongs to
  • default: Set to false if your tenant already has a default index (only one default index is allowed per tenant)
  • features: Enable specific index capabilities
    • neural_search: Enable semantic search powered by vector embeddings
      • model_serving_url: URL of your embeddings service
      • embedding_dimension: Dimension of the embedding vectors (e.g., 768 for many transformer models)
    • tags: Enable document tagging by including an empty object

Example configuration:

{
"name": "product-catalog",
"description": "Index for e-commerce product listings",
"tenant": "acme-corp",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.production.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
}
}

Step 2: Configure Cluster Connection

Specify how to connect to your OpenSearch cluster. Currently, OpenSearch is the only supported backend. If you require a different backend, contact your support team.

Required configurations:

  • backend: Index backend type (currently only "opensearch" is supported)
  • host: OpenSearch cluster hostname
  • port: OpenSearch port (typically 9200)
  • settings (optional): Connection-specific settings
    • use_ssl: Whether to use SSL/TLS for connections
    • http_auth: Authentication credentials as ["username", "password"]
    • verify_certs: Whether to verify SSL certificates

Example configuration:

{
"cluster_connection": {
"backend": "opensearch",
"host": "opensearch-cluster-master-headless.opensearch.svc.cluster.local",
"port": 9200,
"settings": {
"use_ssl": true,
"http_auth": ["admin", "secure_password"],
"verify_certs": true
}
}
}

Step 3: Configure Storage Settings

Define where document data is stored at different stages of the ingestion pipeline:

  • Ingestion storage: Stores the initial document requests as they arrive
  • Processing storage: Stores intermediate transformations and processing artifacts

Storage Backend Options

You can use either AWS S3 or Azure Blob Storage as your storage backend. Choose the configuration that matches your infrastructure.

Option 1: AWS S3 Storage

Configure S3 storage with the following settings:

  • backend: Set to "s3"
  • s3_bucket_name: S3 bucket for storing documents
  • s3_key_prefix: Prefix for organizing objects within the bucket
  • max_file_size: Maximum file size in bytes (documents exceeding this will be rejected)

AWS credentials (optional):

  • aws_access_key_id: AWS access key (prefer IAM roles when possible)
  • aws_secret_access_key: AWS secret key (prefer IAM roles when possible)
  • aws_region: AWS region (e.g., "us-east-1")
  • aws_endpoint_url: Custom endpoint for S3-compatible services (MinIO, LocalStack, etc.)
Security Best Practice

When running on AWS, use IAM roles for service accounts (IRSA) instead of static credentials. This approach:

  • Eliminates the need to store sensitive keys in configuration
  • Provides automatic credential rotation
  • Enables fine-grained permission management
  • Improves overall security posture

Only use static AWS credentials when IRSA is unavailable or when connecting to non-AWS S3-compatible services.

Example S3 configuration:

{
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "ingestion",
"aws_region": "us-east-1"
},
"max_file_size": 209715200
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "processing",
"aws_region": "us-east-1"
}
}
}
}

Option 2: Azure Blob Storage

Configure Azure Blob Storage with the following settings:

  • backend: Set to "azure"
  • azure_account_url: Azure Blob Storage account URL (format: https://your-account.blob.core.windows.net)
  • azure_container_name: Container name in Azure Blob Storage
  • azure_blob_prefix: Prefix for organizing blobs within the container
  • max_file_size: Maximum file size in bytes (documents exceeding this will be rejected)

Azure credentials (optional):

  • azure_credential: Azure credential string (prefer managed identities when possible)
Security Best Practice

When running on Azure, use managed identities (workload identity for AKS) instead of static credentials. This approach:

  • Eliminates the need to store sensitive keys in configuration
  • Provides automatic credential rotation
  • Enables fine-grained permission management
  • Improves overall security posture

Only use the azure_credential field when managed identities are unavailable or when connecting from non-Azure environments.

Example Azure Blob Storage configuration:

{
"storage_settings": {
"ingesting": {
"backend": "azure",
"azure": {
"azure_account_url": "https://acmestorage.blob.core.windows.net",
"azure_container_name": "documents",
"azure_blob_prefix": "ingestion"
},
"max_file_size": 209715200
},
"processing": {
"backend": "azure",
"azure": {
"azure_account_url": "https://acmestorage.blob.core.windows.net",
"azure_container_name": "documents",
"azure_blob_prefix": "processing"
}
}
}
}

Step 4: Define Document Fields

Define the schema for your documents by specifying fields, their types, and how they behave during indexing and search.

Field Configuration Properties

For each field, configure:

  • name: Field identifier (e.g., "product_id", "brand")

  • type: Data type - common types include:

    • "string": Text content
    • "date": Timestamps and dates
    • "number": Numeric values (integers or floats)
    • "geolocation": Geographic coordinates
  • alias (optional): Alternative field name with special system behaviors:

    • "metadata.DCMI.title": Marks this field as the document title (used for display and embedding generation)
    • "metadata.DCMI.abstract": Designates the document description (used for display and embedding generation)
    • "metadata.DCMI.created": Identifies the creation date field (defaults to ingestion date if not provided; used for sorting)
  • search_options: Controls field behavior during indexing and retrieval:

    • is_sort_field: Allow sorting search results by this field
    • is_facet_field: Enable faceted search (return value counts for filtering)
    • is_filter_field: Allow filtering documents by this field
    • is_returned_in_search_results: Include this field in search API responses
    • is_used_in_search: Include this field in full-text search queries

Example: E-commerce Product Index

For a product catalog index, you might define:

{
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
]
}

Step 5: Define Search Profiles (Optional)

Search profiles allow you to fine-tune search relevance by configuring how documents are ranked for different query types. You can boost specific fields, adjust scoring weights, and optimize for your use case.

Example: Boosting Brand Matches

This profile increases the relevance of documents when the search query matches the brand name:

{
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2.0
}
}
]
}
}
}
]
}
}

This configuration doubles the relevance score when a query term matches the brand field, making branded searches more effective.

Advanced Search Configuration

For comprehensive search relevance options including semantic search tuning, reranking, and advanced boosting strategies, refer to the search_profiles_configuration section in the API reference. Additional boosting capabilities can be configured upon request.

Step 6: Configure Client Display Settings

Client settings control how documents appear in the Zeta Alpha Navigator UI, including card layouts, filters, and sorting options.

Display Configuration

Map document fields to UI display elements:

  • title_field: Field to display as the document title (e.g., "product_name")
  • description_field: Field to display as the document description (e.g., "product_description")
  • source_field: Field to display as the document source (e.g., "brand")
  • date_field: Field to display as the document date
  • url_field: Field containing the link to the source content
  • image_url_field: Field containing the document's image URL

Search Filter Configuration

Define which filters appear in the UI and how they behave:

  • field_name: Field to filter by (must match a field in document_fields_configuration)
  • display_name: User-friendly name shown in the UI
  • filter_type: Widget type for the filter (e.g., "checkbox", "faceted_checkbox", "range")
  • url_param: URL parameter name for sharing filtered search URLs
  • filter_type_settings: Type-specific configuration:
    • checkbox: Static list of filter options
      • values: Array of options with label (display name) and value (filter value)

Example configuration:

{
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}

Step 7: Create the Index

Once you've assembled all configuration sections, create your index using the Platform UI:

  1. Navigate to the Indexes section in the Platform UI

  2. Click View to see the list of existing indexes

    View Indexes

  3. Click Create Index

  4. Paste your complete JSON configuration into the editor

  5. Review and submit

    Create Index

Your new index will appear in the indexes list and be ready to accept documents.

Next Steps

Example JSON Index Configuration

Here's the complete configuration we built throughout this guide, ready to use as a template:

{
"name": "product-catalog",
"description": "Index for e-commerce product listings",
"tenant": "acme-corp",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.production.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
},
"cluster_connection": {
"backend": "opensearch",
"host": "opensearch-cluster-master-headless.opensearch.svc.cluster.local",
"port": 9200,
"settings": {
"use_ssl": true,
"http_auth": ["admin", "secure_password"],
"verify_certs": true
}
},
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "ingestion"
},
"max_file_size": 209715200
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "acme-documents",
"s3_key_prefix": "processing"
}
}
},
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
],
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2.0
}
}
]
}
}
}
]
},
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}

Step 5: (Optional) Define a search profile to tune the search experience

You can define a search profile to tune the search relevance. A search profile is a set of rules that define how the search engine should rank the documents. For this example, we will define a profile to boost the relevance of keyword searches when the query is found in the brand field.

{
...
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2
}
}
]
}
}
}
]
},
...
}

For a complete set of options and configurations, please refer to the search_relevance_configuration section in the index configuration. Also, more boosting capabilities can be added upon requests.

Step 6: Identify the client settings

The client settings are a set of settings that further customize default behavior on the Zeta Alpha Navigator. For this guide we are only setting the following configurations:

  • display_configuration: Defines how the index content is rendered in the frontend. The field names refer to the names provided in the document_fields_configuration.
    • title_field: The field used for rendering the title of the document card. For example, the product_name field.
    • description_field: The field used for rendering the description of the document card. For example, the product_description field.
    • source_field: The field used for rendering the source of the product. For example, the brand field.
  • search_filters_configuration: When defined, the search filters will be limited to the ones defined in this list of filters.
    • field_name: This refers to the field name in the document_fields_configuration that this filter will filter by. For example, the brand field.
    • display_name: The name that will be displayed as the filter name in the front end. For example, "Brand".
    • filter_type: Identifier of the filter type, this string is used by the front end to choose the widget that will display this filter. For example, "checkbox".
    • url_param: This string will be used by front end to display in the url as a url param. For example, "brand".
    • filter_type_settings: Filter specific configuration, this could include default values and display names.
      • checkbox: Display configuration for checkboxes.
        • values: List of values to be displayed and filtered by in the checkbox. On this example is a list of brands
          • label: Display name of the value to filter by. For example, "Apple".
          • value: Value to filter by. For example, "apple".

For example:

{
...
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}

Step 7: Create the index

Finally, compile all your settings into one JSON file, as shown in this example JSON index configuration. To create a new index, you can use the Zeta Alpha Platform UI, click on View to view the indexes, View Indexes then click on Create Index and paste your JSON configuration. Create Index Your new index will be now listed in the indexes list.

Next Steps

Example JSON index configuration

{
"name": "my_custom_index",
"description": "This is a custom index",
"tenant": "test-tenant",
"default": false,
"features": {
"neural_search": {
"model_serving_url": "http://sentence-encoder-api.namespace.svc.cluster.local:8080",
"embedding_dimension": 768
},
"tags": {}
},
"cluster_connection": {
"backend": "opensearch",
"host": "localhost",
"port": 9200,
"settings": {
"use_ssl": false,
"http_auth": [
"username",
"password"
],
"verify_certs": false
}
},
"storage_settings": {
"ingesting": {
"backend": "s3",
"s3": {
"s3_bucket_name": "my-bucket",
"s3_key_prefix": "document-requests"
},
"max_file_size": 204857600
},
"processing": {
"backend": "s3",
"s3": {
"s3_bucket_name": "my-bucket",
"s3_key_prefix": "zeta-objects"
}
}
},
"document_fields_configuration": [
{
"name": "product_id",
"type": "string",
"search_options": {
"is_sort_field": true,
"is_facet_field": false,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_name",
"type": "string",
"alias": "metadata.DCMI.title",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "product_description",
"type": "string",
"alias": "metadata.DCMI.abstract",
"search_options": {
"is_sort_field": false,
"is_facet_field": false,
"is_filter_field": false,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "brand",
"type": "string",
"search_options": {
"is_sort_field": false,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
},
{
"name": "price",
"type": "number",
"search_options": {
"is_sort_field": true,
"is_facet_field": true,
"is_filter_field": true,
"is_returned_in_search_results": true,
"is_used_in_search": true
}
}
],
"search_profiles_configuration": {
"profiles": [
{
"name": "boost_brand",
"keyword_settings": {
"query_settings": {
"field_search_configs": [
{
"field_path": "brand",
"boosting_score": {
"weight": 2
}
}
]
}
}
}
]
},
"client_settings": {
"display_configuration": {
"title_field": "product_name",
"description_field": "product_description",
"source_field": "brand"
},
"search_filters_configuration": [
{
"field_name": "brand",
"display_name": "Brand",
"filter_type": "checkbox",
"url_param": "brand",
"filter_type_settings": {
"checkbox": {
"values": [
{
"label": "Apple",
"value": "apple"
},
{
"label": "Samsung",
"value": "samsung"
},
{
"label": "Google",
"value": "google"
}
]
}
}
}
]
}
}