Skip to main content

Getting Started with the Ingestion API

This tutorial will guide you through setting up and using the Ingestion API to add, update, retrieve, and delete documents. By the end of this tutorial, you will have a basic understanding of how to manage documents within the Zeta Alpha platform using the ingestion API.

Prerequisites

Before you begin, make sure you have the following:

  • An active Zeta Alpha account with API access
  • An API key with the ingestion-manager role

Step 1: Adding a Document

Let's add a new document to the system using the Ingestion API. The document will be linked to a PDF file stored on Google Drive.

First, set up the following environment variables:

TENANT=<tenant>
ZA_API_KEY=<your-api-key>
DOCUMENT_ID=<document-id>
ACCESS_RIGHT=<access-right-name>

where:

  • tenant is your Zeta Alpha tenant name
  • your-api-key is your Zeta Alpha API key
  • document-id is the unique identifier for the document we are going to ingest
  • access-right-name is the name of the access right you want to assign to the document

Next, use the following cURL command to upload the document:

curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/document-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"documents": [
{
"uri": "https://zeta-alpha.com",
"document_content": {
"content_type": "application/pdf",
"from_": {
"url": "https://drive.google.com/uc?export=download&id=1g_zXjyQ0qqrKpVBPURp2xvylxeg6SCBa"
}
},
"allow_access_rights": [
{
"name": "'"$ACCESS_RIGHT"'"
}
],
"document_id": "'"$DOCUMENT_ID"'",
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
}
}
]
}'

The document will have the status ingesting until the ingestion process is complete. You can check the status of the document by retrieving it using the document ID.

info

You can use the same endpoint to create and update multiple documents in a single request. Just add more document objects to the documents array.

Example response:

{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "ingesting"
}
],
"errors": []
}

Step 2: Updating Access Rights

Access rights can also be updated in batches. The batch update approach ensures that the changes are propagated quickly.

curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/access-rights-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"documents": [
{
"document_id": "'"$DOCUMENT_ID"'",
"allow_access_rights": [
{
"name": "'"$ACCESS_RIGHT"'"
}
]
}
]
}'

This update will reach the index faster than using the document update endpoint.

Example response:

{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "updated"
}
],
"errors": []
}

Step 3: Retrieving a Document

Retrieve a document's metadata and status using the document ID.

curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/$DOCUMENT_ID?tenant=$TENANT&include_document_status=true" \
--header "X-Auth: $ZA_API_KEY"

The response will include the document's metadata and status, as well as a link to download the document content (if any).

Example response:

{
"uri": "https://zeta-alpha.com",
"content_ingestion_job_id": "<job-id>",
"document_content": {
"content_type": "application/pdf",
"from_": {},
"download_url": "/ingestion/documents/<document-id>/content?tenant=<tenant>&index_id=<index-id>"
},
"allow_access_rights": [
{
"name": "<access-right-name>"
}
],
"document_id": "<document-id>",
"is_indexable": true,
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
},
"index_id": "<index-id>",
"tenant": "<tenant>",
"deleted": false,
"request_created_at": "2024-08-27T10:08:59.921365+00:00",
"request_last_updated_at": "2024-08-27T10:08:59.921378+00:00"
}

Step 4: Retrieving Document Content

Fetch the content of a document by its ID and save it locally as a PDF file.

curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/$DOCUMENT_ID/content?tenant=$TENANT" \
--header "X-Auth: $ZA_API_KEY" \
--output doc.pdf

Step 5: Filtering Documents

You can filter documents based on various criteria. Here's how to retrieve a list of documents with their status.

curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents?tenant=$TENANT&page_size=10&include_document_status=true" \
--header "X-Auth: $ZA_API_KEY"

This will return a paginated list of documents with their metadata and status.

Example response:

{
"count": 1,
"results": [
{
"uri": "https://zeta-alpha.com",
"content_ingestion_job_id": "<job-id>",
"document_content": {
"content_type": "application/pdf",
"from_": {},
"download_url": "/ingestion/documents/<document-id>/content?tenant=<tenant>&index_id=<index-id>"
},
"allow_access_rights": [
{
"name": "<access-right-name>",
}
],
"document_id": "<document-id>",
"is_indexable": true,
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
},
"index_id": "<index-id>",
"tenant": "<tenant>",
"deleted": false,
"request_created_at": "2024-08-27T10:08:59.921365+00:00",
"request_last_updated_at": "2024-08-27T10:08:59.921378+00:00"
}
],
"page": 1,
"page_size": 10
}

Step 6: Deleting Documents

Delete documents in batches by specifying their IDs.

curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/delete-document-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"document_ids": [
"'"${DOCUMENT_ID}"'"
]
}'

The document will be marked for deletion. A workflow will be triggered to delete the document and its associated metadata and content. The document will be removed from the Ingestion API once the deletion process is complete. While the document is being deleted, you will see that deleted: True when you retrieve it.

Example response:

{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "ingesting"
}
],
"errors": []
}

Conclusion

Congratulations! You've successfully interacted with the Ingestion API, adding, updating, retrieving, and deleting documents. You can now explore more advanced features and customization options.

Next Steps

If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.