Getting Started with the Ingestion API
This tutorial will guide you through setting up and using the Ingestion API to add, update, retrieve, and delete documents. By the end of this tutorial, you will have a basic understanding of how to manage documents within the Zeta Alpha platform using the ingestion API.
Prerequisites
Before you begin, make sure you have the following:
- An active Zeta Alpha account with API access
- An API key with the
ingestion-manager
role
Step 1: Adding a Document
Let's add a new document to the system using the Ingestion API. The document will be linked to a PDF file stored on Google Drive.
First, set up the following environment variables:
TENANT=<tenant>
ZA_API_KEY=<your-api-key>
DOCUMENT_ID=<document-id>
ACCESS_RIGHT=<access-right-name>
where:
tenant
is your Zeta Alpha tenant nameyour-api-key
is your Zeta Alpha API keydocument-id
is the unique identifier for the document we are going to ingestaccess-right-name
is the name of the access right you want to assign to the document
Next, use the following cURL
command to upload the document:
curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/document-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"documents": [
{
"uri": "https://zeta-alpha.com",
"document_content": {
"content_type": "application/pdf",
"from_": {
"url": "https://drive.google.com/uc?export=download&id=1g_zXjyQ0qqrKpVBPURp2xvylxeg6SCBa"
}
},
"allow_access_rights": [
{
"name": "'"$ACCESS_RIGHT"'"
}
],
"document_id": "'"$DOCUMENT_ID"'",
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
}
}
]
}'
The document will have the status ingesting
until the ingestion process is complete. You can check the status of the document by retrieving it using the document ID.
You can use the same endpoint to create and update multiple documents in a single request. Just add more document objects to the documents
array.
Example response:
{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "ingesting"
}
],
"errors": []
}
Step 2: Updating Access Rights
Access rights can also be updated in batches. The batch update approach ensures that the changes are propagated quickly.
curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/access-rights-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"documents": [
{
"document_id": "'"$DOCUMENT_ID"'",
"allow_access_rights": [
{
"name": "'"$ACCESS_RIGHT"'"
}
]
}
]
}'
This update will reach the index faster than using the document update endpoint.
Example response:
{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "updated"
}
],
"errors": []
}
Step 3: Retrieving a Document
Retrieve a document's metadata and status using the document ID.
curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/$DOCUMENT_ID?tenant=$TENANT&include_document_status=true" \
--header "X-Auth: $ZA_API_KEY"
The response will include the document's metadata and status, as well as a link to download the document content (if any).
Example response:
{
"uri": "https://zeta-alpha.com",
"content_ingestion_job_id": "<job-id>",
"document_content": {
"content_type": "application/pdf",
"from_": {},
"download_url": "/ingestion/documents/<document-id>/content?tenant=<tenant>&index_id=<index-id>"
},
"allow_access_rights": [
{
"name": "<access-right-name>"
}
],
"document_id": "<document-id>",
"is_indexable": true,
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
},
"index_id": "<index-id>",
"tenant": "<tenant>",
"deleted": false,
"request_created_at": "2024-08-27T10:08:59.921365+00:00",
"request_last_updated_at": "2024-08-27T10:08:59.921378+00:00"
}
Step 4: Retrieving Document Content
Fetch the content of a document by its ID and save it locally as a PDF file.
curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/$DOCUMENT_ID/content?tenant=$TENANT" \
--header "X-Auth: $ZA_API_KEY" \
--output doc.pdf
Step 5: Filtering Documents
You can filter documents based on various criteria. Here's how to retrieve a list of documents with their status.
curl --request GET \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents?tenant=$TENANT&page_size=10&include_document_status=true" \
--header "X-Auth: $ZA_API_KEY"
This will return a paginated list of documents with their metadata and status.
Example response:
{
"count": 1,
"results": [
{
"uri": "https://zeta-alpha.com",
"content_ingestion_job_id": "<job-id>",
"document_content": {
"content_type": "application/pdf",
"from_": {},
"download_url": "/ingestion/documents/<document-id>/content?tenant=<tenant>&index_id=<index-id>"
},
"allow_access_rights": [
{
"name": "<access-right-name>",
}
],
"document_id": "<document-id>",
"is_indexable": true,
"custom_metadata": {
"DCMI.title": "A New Generation of Discovery Engines Helps You Do More with Less",
"DCMI.source": "zeta-alpha.com",
"DCMI.creator": [
{
"full_name": "Zeta Alpha"
}
],
"DCMI.abstract": "The ability to discover institutional knowledge is critical to business success even in the best of times — but that is especially true when times are tough. From the “great resignation” to economy-driven layoffs, enterprise teams are having to find creative ways to do more with less. In the era of automation and AI, the answer to understaffed teams and increasing demands is often technology. That is as true in the realm of knowledge management as it is in marketing or operations. And the good thing is, we are witnessing a breakthrough in AI for search right now. It’s called Neural Search."
},
"index_id": "<index-id>",
"tenant": "<tenant>",
"deleted": false,
"request_created_at": "2024-08-27T10:08:59.921365+00:00",
"request_last_updated_at": "2024-08-27T10:08:59.921378+00:00"
}
],
"page": 1,
"page_size": 10
}
Step 6: Deleting Documents
Delete documents in batches by specifying their IDs.
curl --request POST \
--url "https://api.zeta-alpha.com/v0/service/ingestion/documents/delete-document-batches?tenant=$TENANT" \
--header 'Content-Type: application/json' \
--header "X-Auth: $ZA_API_KEY" \
--data '{
"document_ids": [
"'"${DOCUMENT_ID}"'"
]
}'
The document will be marked for deletion. A workflow will be triggered to delete the document and its associated metadata and content. The document will be removed from the Ingestion API once the deletion process is complete. While the document is being deleted, you will see that deleted: True
when you retrieve it.
Example response:
{
"index_id": "<your-index-id>",
"tenant": "<tenant>",
"documents": [
{
"document_id": "<document-id>",
"version": "1",
"content_ingestion_job_id": "<job-id>",
"status": "ingesting"
}
],
"errors": []
}
Conclusion
Congratulations! You've successfully interacted with the Ingestion API, adding, updating, retrieving, and deleting documents. You can now explore more advanced features and customization options.
Next Steps
If you have any questions or encounter any issues, please don't hesitate to reach out to our support team.