Concept Reference
Index configuration
A new index is created by posting to the /v1/indexes
endpoint. It has the following configuration:
Query params
tenant
: The name of the tenant that owns the index.
Request body
name
: (string, required) Human friendly name given to the index. For example "Zeta Alpha".default
: (boolean, optional) Whether the index is used at query time if no other index is specified. Only only index can be set as default per tenant.description
: (string, optional) A description of the index. For example "Main index for the research navigator app".cluster_connection
: (object, required) It specifies what index backend to use and how to access it.backend
: (string, required) The type of backend to use. Possible values are "opensearch".host
: (string, required) The hostname of the index. For example "opensearch-cluster-master-headless.opensearch.svc.cluster.local".port
: (integer, required) The port of the index. For example 9200.settings
: (object, optional) Connection settings.use_ssl
: (boolean, optional) Whether to use SSL when connecting to the index.http_auth
: (array of strings, optional) Credentials to use when connecting to the index. For example ["username", "password"].verify_certs
: (boolean, optional) Whether to verify the SSL certificate when connecting to the index.ssl_show_warn
: (boolean, optional) Whether to show a warning when connecting to the index withverify_certs
disabled.ca_certs
: (string, optional) The path to the CA certificate to use when connecting to the index.client_cert
: (string, optional) The path to the client certificate to use when connecting to the index.client_key
: (string, optional) The path to the client key to use when connecting to the index.
storage_settings
: (object, required) It specifies how the auxiliary data for this index is stored.ingesting
: (object, required) It specifies how large data ingested by the pipeline is stored. When ingesting data into thedocument_content
ordocument_content_path.base64_content
fields, then this data is stored in the backend specified here.backend
: (enum, required) The backend to use for storing the ingested data. Possible values are "s3", "azure", "disk".s3
: (object, optional) The configuration for the S3 backend.s3_bucket_name
: (string, required) The name of the S3 bucket.s3_key_prefix
: (string, optional) The prefix to use when storing the data in the S3 bucket.
azure
: (object, optional) The configuration for the Azure backendazure_account_url
: (string, required) The URL of the Azure account.azure_container_name
: (string, required) The name of the Azure container.azure_blob_prefix
: (string, optional) The prefix to use when storing the data in the Azure container.azure_credential
: (string, optional) The storage account shared key (account key or access key) to use when connecting to the Azure account.
disk
: (object, optional) The configuration for the disk backend.disk_location
: (string, optional) The path to the directory where the data will be stored.
max_file_size
: (integer, optional) The maximum size of the files to store in the backend. This is in bytes. The default value is 1024**3 (100MB).
processing
: (object, required) It specifies how the data used by the pipeline is stored.backend
: (enum, required) The backend to use for storing the data. Possible values are "s3", "azure", "disk".s3
: (object, optional) The configuration for the S3 backend.s3_bucket_name
: (string, required) The name of the S3 bucket.s3_key_prefix
: (string, optional) The prefix to use when storing the data in the S3 bucket.
azure
: (object, optional) The configuration for the Azure backendazure_account_url
: (string, required) The URL of the Azure account.azure_container_name
: (string, required) The name of the Azure container.azure_blob_prefix
: (string, optional) The prefix to use when storing the data in the Azure container.azure_credential
: (string, optional) The storage account shared key (account key or access key) to use when connecting to the Azure account.
disk
: (object, optional) The configuration for the disk backend.disk_location
: (string, optional) The path to the directory where the data will be stored.
compression
: (object, optional) The configuration for the compression of the data.compression_algorithm
: (enum, required) The algorithm to use for compressing the data. Possible values are "bz2", "gzip", "lzma", "snappy", "zlib", "zstd". The recommended value is "zlib".level
: (integer, optional) The level of compression to use. This is an integer between 0 and 9, where 0 is no compression and 9 is the maximum compression.
features
: (object, optional) Configures the available features for this index.neural_search
: (object, optional) Configures the neural search feature.model_serving_url
: (string, required) The URL of the model server that will be used to compute vector embeddings. For example "http://sentence-encoder-api.production.svc.cluster.local:8080/v0.6".model_serving_url_pipeline
: (string, optional) The URL of the model server that will be used to compute vector embeddings in the pipeline (offline). For example "http://sentence-encoder-api.production.svc.cluster.local:8080/v0.6". If not passed, then themodel_serving_url
will be used.
document_fields_configuration
: (array of objects, optional) Specifies the name of the fields that the tenant wants in the index, as well as how they behave during indexing and retrieval.name
: (string, required) The name of the field that will be used to store values in the index, as well as when retrieving and filtering documents. This can be a nested field, for example `authors.first_name``.type
: (enum, required) The type of field. Possible values are "document_id", "string", "date", "number", "geolocation", "bounding_box", "document_content". Note that any of these types can be multi-valued, meaning they will accept a list of values. Furthermore, when defining a nested field name likeauthors.first_name
, the values will be stored in a flat structure. For example, if we defineauthors.first_name
andauthors.last_name
and later ingest data like{"authors": [{"first_name": "John", "last_name": "Doe"}, {"first_name": "Jane", "last_name": "Doe"}]}
, then the index will contain the following fields:authors.first_name: ["John", "Jane"]
andauthors.last_name: ["Doe", "Doe"]
. This structure minimizes indexing time and storage as well as retrieval time. However, it will not be possible to restrict search results to the ones that contain at least one author withfirst_name=John
andlast_name=Doe
. If this is a requirement, then the tenant should define the field asnested
. The fieldsdocument_id
anddocument_content
are only relevant when used inside anested
field.alias
: (string, optional) An alternative field name that can be used to retrieve documents. Note that the alias must be unique.search_options
: (object, optional) How the field behaves during indexing and retrieval. Note that it also determines how the field is configured under the hood.is_sort_field
: (boolean, optional) Whether the field can be used for sorting documents at retrieval time.is_facet_field
: (boolean, optional) Whether the field can be used for faceted search. In other words, the search API is able to return a list of existing values for this field along with the document counts.is_filter_field
: (boolean, optional) Whether the field can be used for filtering documents at retrieval time.is_returned_in_search_results
: (boolean, optional) Whether the field is part of the search API response payload.is_used_in_search
: (boolean, optional) Whether the field is used in full text search.supporting_subqueries
: (boolean, optional) Whether the field can be used in subqueries. Subqueries are used to filter nested objects as if they were root-level documents. Search results return the parent document with the nested object filtered.
analyzer_options
: (object, optional) How the field is analyzed during indexing and retrieval. This is only for fields of typestring
that also havesearch_options.is_used_in_search
set to true.analyzer
: (string, optional) The name of the analyzer to use. We provide some default analyzers, like thezav-en-nostem
. Otherwise the name needs to match a custom analyzer defined in thedocument_field_analysis
section of the index payload. This analyzer will be used for both indexing and retrieval. If the tenant wants to have separate analyzers for indexing and retrieval, then they should not defineanalyzer
and instead pass the name of the analyzers in theindex_analyzer
andsearch_analyzer
fields.search_analyzer
: (string, optional) The name of the analyzer to use at retrieval time.index_analyzer
: (string, optional) The name of the analyzer to use at indexing time.
nested_fields
: (array of objects, optional) Whentype=nested
then this contains the list of nested fields. The schema of each object in the array is the same as for thedocument_fields_configuration
field. Note that thename
field should not be prefixed with the name of the parent field. For example, if the parent field isauthors
, then the nested fields should befirst_name
andlast_name
, notauthors.first_name
andauthors.last_name
.
document_field_analysis
: (object, optional) Defines custom analyzers that can be used in thedocument_fields_configuration
.filter
: (object, optional)char_filter
: (object, optional)normalizer
: (object, optional)analyzer
: (object, optional)
client_settings
: (object, optional) Defines index specific rendering information.display_configuration
: (object, optional) Defines how the index content is rendered in the frontend. The field names refer to the names provided in thedocument_fields_configuration
.title_field
: (string, optional) The field used for rendering the title of the document card.date_field
: (string, optional) The field used for rendering the date of the document card.created_by_field
: (string, optional) The field used for rendering the authors of the document card.description_field
: (string, optional) The field used for rendering the description of the document card.url_field
: (string, optional) The field used for rendering the link to the source content.source_field
: (string, optional) The field used for rendering the source of the document card.bounding_boxes_field
: (string, optional) The field used for rendering the bounding boxes in the PDF viewer.image_url_field
: (string, optional) The field used for rendering the image of the document card.document_metadata_fields
: (array of objects, optional) Defines how other metadata is rendered in the card. This could be the number of references to this document (for example in scientific documents), a list of documents linked to the current one, etc.type
: (enum, required) The type of metadata. Possible values are "github", "twitter", "counter".field_name
: (string, optional) The field that contains the data to be rendered (used in the counter type).url_field
: (string, optional) If the rendered element is clickable, this specifies the link URL.list_field_name
: (string, optional) The field that contains the list of metadata to be rendered (used for the github and twitter types).icon
: (enum. optional) The icon associated with the rendered element. Possible values are "github", "twitter", "reference", "citation".label
: (string, optional) The label associated with the rendered element. This can be used in the tooltip, for example.
search_filters_configuration
: (array of objects, optional) When defined, the search filters will be limited to the ones defined in this list of filters.field_name
: (string, required) This refers to the field name in thedocument_fields_configuration
that this filter will filter by.display_name
: (string, required) The name that will be displayed as the filter name in the front end.filter_type
: (string, optional) Identifier of the filter type, this string is used by the front end to choose the widget that will display this filter.url_param
: (string, optional) This string will be used by front end to display in the url as a url param.filter_type_settings
: (object, optional) Filter specific configuration, this could include default values and display names.checkbox
: (object, optional) Display configuration for checkboxes.values
: (array, required) List of values to be display and filter by in the checkbox.label
: (string, required) Display name of the value to filter by.value
: (string, required) Value to filter by.
search_sorting_configuration
: (object, optional) Defines the ordering options for the frontend. The field names refer to the names provided in thedocument_fields_configuration
.field_name
: (string, required) The field to sort by.display_name
: (string, required) The name that will be rendered in the frontend for this sorting option.url_param
(string, optional) The parameter that the frontend will use in the URL when this sorting option is selected.retrieval_unit
: (string, optional) If specified, this option is only shown for the selected retrieval unit.
search_relevance_configuration
: (array of objects, optional) Define the default search profile to use, per retrieval unit.retrieval_unit
: (string, optional) If specified, this search profile will only be used as default when searching for the retrieval unit.search_profile_name
: Search profile to use as default, this name refers to thename
field for theprofiles
defined under thesearch_profiles_configuration
.
default_filters_configuration
: (object, optional) Defines the default filters that are always applied to the search queries. The field names refer to the names provided in thedocument_fields_configuration
.and_operator
: (array of objects, optional) Defines the filters that are applied with the AND operator. Each object in the array has the same schema as thedefault_filters_configuration
object.or_operator
: (array of objects, optional) Defines the filters that are applied with the OR operator. Each object in the array has the same schema as thedefault_filters_configuration
object.not_operator
: (object, optional) Defines the filters that are applied with the NOT operator. The object has the same schema as thedefault_filters_configuration
object.nested_operator
: (object, optional) Defines the filters that are applied with the nested operator. This filter is used with nested fields.field_path
: (string, required) The path to the nested field. For example, if the nested field isauthors
, then the nested filters may refer toauthors.first_name
.nested_filter
: (object, required) The filter to apply to the nested fields. The object has the same schema as thedefault_filters_configuration
object. The field names used inside this filter could be relative to the nested field or not. For example forauthors
, the field names could befirst_name
orauthors.first_name
.
exists
: (object, optional) Defines the filters that are applied with the exists operator.field_path
: (string, required) The path to the field.
equals_to
: (object, optional) Defines the filters that are applied with the exact match operator.field_path
: (string, required) The path to the field.field_value
: (string, required) The value to compare to.
greater_than
: (object, optional) Defines the filters that are applied with the greater than operator.field_path
: (string, required) The path to the field.field_value
: (string, required) The value to compare to.
greater_than_or_equal_to
: (object, optional) Defines the filters that are applied with the greater than or equal to operator.field_path
: (string, required) The path to the field.field_value
: (string, required) The value to compare to.
less_than
: (object, optional) Defines the filters that are applied with the less than operator.field_path
: (string, required) The path to the field.field_value
: (string, required) The value to compare to.
less_than_or_equal_to
: (object, optional) Defines the filters that are applied with the less than or equal to operator.field_path
: (string, required) The path to the field.field_value
: (string, required) The value to compare to.
is_in
: (object, optional) Defines the filters that are applied with the is in operator.field_path
: (string, required) The path to the field.field_values
: (array of strings, required) The values to compare to.
geo_distance
: (object, optional) Defines the filters that are applied with the geo distance operator.field_path
: (string, required) The path to the field.point
: (object, required) The point to compare to.lat
: (float, required) The latitude of the point to compare to.lon
: (float, required) The longitude of the point to compare to.
distance
: (string, required) The distance to compare to.
geo_bounding_box
: (object, optional) Defines the filters that are applied with the geo bounding box operator.field_path
: (string, required) The path to the field.top_left_point
: (object, required) The top left point of the bounding box.lat
: (float, required) The latitude of the point.lon
: (float, required) The longitude of the point.
bottom_right_point
: (object, required) The bottom right point of the bounding box.lat
: (float, required) The latitude of the point.lon
: (float, required) The longitude of the point.
search_profiles_configuration
: (object, optional) Defines the search pipelines that are used when searching.profiles
: (array of object, required) Holds all relevance profiles for this index.name
: (string, optional) Name of the search profile. This name is used to select a default profile in thesearch_relevance_configuration
or at query time using thesearch_profile_name
field.query_settings
: (object, optional) Define the configuration for boosting documents fields, used on keyword search.field_search_configs
: (array of objects, required) Each element of this array represents a document field to be boosted.field_path
: (string, required) The path of the field.must_match_query
: (boolean, optional) If true, only documents containing the search query on the document field will be returned. Defaults to false.boosting_score
: (object, optional) If the query is included in the document field, the document will be boosted.weight
: (float, required) Multiplier to be used for boosting.
constant_score
: (object, optional) If defined and the query is included in the document then the document relevance score will be equal to theweight
.weight
: (float, required) Relevance score of the document.
functions_boosting_settings
: (object, optional) Use this configuration to boost a document based on its field value. Possible values are "sum" (default), "multiply", "avg", "first", "max", "min".score_aggregation_method
: (enum, optional) Refers to the method that is used to combine the scores given by each of the functions onfunction_configs
.function_configs
: (array of objects, required) List of functions to be applied.field_path
: (string, required) The path of the field.function_type
: (enum, required) The function to be applied, currently the only function supported is "weighted_value" which takes the value on the document field and multiplies it by aweight
.function_config
(object, required) Define the function configuration.weighted_value
(object, required) Configuration for theweighted_value
function.field_value
: (numeric or string, required) The function will be applied when the document field is equal to this parameter.weight
: (float, required) Score multiplier.
On successful response, the pipeline service will create and configure a new index for the tenant. Take note of the id
field in the response payload. This field will be the required index_id
field of other endpoints.