Create an OpenReview Connector
An OpenReview connector enables you to ingest academic papers from OpenReview conferences into the Zeta Alpha platform. This guide shows you how to create and configure an OpenReview connector for your data ingestion workflows.
Info: This guide presents an example configuration for an OpenReview connector. For a complete set of configuration options, see the OpenReview Connector Configuration Reference.
Prerequisites
Before you begin, ensure you have:
- Access to the Zeta Alpha Platform UI
- A tenant created
- An index created
Note: OpenReview does not require authentication credentials for public conference data.
Step 1: Create the OpenReview Basic Configuration
To create an OpenReview connector, define a configuration file with the following basic fields:
is_document_owner
: (boolean) Indicates whether this connector "owns" the crawled documents. When set totrue
, other connectors cannot crawl the same documents.base_url
: (string) URL of the OpenReview API (e.g.,"https://api.openreview.net"
or"https://api2.openreview.net"
).conference
: (object) Configuration for the conference to crawl:name
: (string) Name of the conferencevenues
: (array of strings) List of venue identifiers to crawl within the conferenceid_type
: (string) Type of identifier used for venues, typically "venueid" or "invitation"
since
: (datetime) Only crawl papers submitted or updated after this datetime (format:"YYYY-MM-DDTHH:MM:SSZ"
).crawl_no_decision
: (boolean, optional, default: true) Whether to crawl papers that have no acceptance decision yet.limit
: (integer, optional) Maximum number of papers to crawl.skips
: (integer, optional) Number of papers to skip at the beginning.content_source_name
: (string, optional) Custom name for the content source. Defaults to "OpenReview".logo_url
: (string, optional) The URL of a logo to display on document cards
Example Configuration
{
"name": "My OpenReview Connector",
"description": "My OpenReview connector for NeurIPS 2024",
"is_indexable": true,
"connector": "openreview",
"connector_configuration": {
"is_document_owner": true,
"base_url": "https://api2.openreview.net",
"conference": {
"name": "NeurIPS 2024",
"venues": [
"NeurIPS.cc/2024/Conference"
],
"id_type": "venueid"
},
"since": "2024-01-01T00:00:00Z",
"crawl_no_decision": true,
"limit": 5000,
"content_source_name": "OpenReview - NeurIPS 2024",
"logo_url": "https://example.com/neurips-logo.png"
}
}
Step 2: Add Field Mapping Configuration
When crawling OpenReview, the connector extracts document metadata and content as described in the OpenReview Connector Configuration Reference. You can map these OpenReview fields to your index fields using the field_mappings
configuration.
Example Field Mappings
The following example shows field mappings for the default index fields:
{
...
"connector_configuration": {
...
"field_mappings": [
{
"content_source_field_name": "title",
"index_field_name": "DCMI.title"
},
{
"content_source_field_name": "abstract",
"index_field_name": "DCMI.abstract"
},
{
"content_source_field_name": "authors.full_name",
"index_field_name": "DCMI.creator"
},
{
"content_source_field_name": "created",
"index_field_name": "DCMI.created"
},
{
"content_source_field_name": "modified",
"index_field_name": "DCMI.modified"
},
{
"content_source_field_name": "subject",
"index_field_name": "DCMI.subject"
},
{
"content_source_field_name": "source",
"index_field_name": "DCMI.source"
},
{
"content_source_field_name": "uri",
"index_field_name": "uri"
},
{
"content_source_field_name": "document_content_type",
"index_field_name": "document_content_type"
},
{
"content_source_field_name": "document_content_path",
"index_field_name": "document_content_path"
}
],
...
}
}
Step 3: Configure Access Rights
You can configure access rights to control who can view the ingested OpenReview papers:
allow_access_rights
: (array of objects, optional) Users with any of these access rights will be able to access the documents. If not passed, no user will be able to retrieve the documents.deny_access_rights
: (array of objects, optional) Users with these access rights will not be able to access the documents.
Example Configuration with Access Rights
{
...
"connector_configuration": {
...
"allow_access_rights": [
{
"name": "public"
}
],
...
}
}
Step 4: Create the OpenReview Content Source
To create your OpenReview connector in the Zeta Alpha Platform UI:
- Navigate to your tenant and click View next to your target index
- Click View under Content Sources for the index
- Click Create Content Source
- Paste your JSON configuration
- Click Submit
Crawling Behavior
The connector crawls papers from OpenReview conferences, extracting:
- Paper title, abstract, and description
- Author information
- Subject areas and keywords
- Submission and modification timestamps
- Acceptance status
- Identifiers and citations
- PDF content for full-text indexing
The connector processes papers based on your time range, venue selection, and limit settings. Use the since
parameter to incrementally crawl new submissions, and the crawl_no_decision
parameter to control whether papers without acceptance decisions are included.
Common Conference Examples
Here are some common OpenReview conference configurations:
ICLR 2024:
{
"base_url": "https://api2.openreview.net",
"conference": {
"name": "ICLR 2024",
"venues": ["ICLR.cc/2024/Conference"],
"id_type": "venueid"
}
}
NeurIPS 2024:
{
"base_url": "https://api2.openreview.net",
"conference": {
"name": "NeurIPS 2024",
"venues": ["NeurIPS.cc/2024/Conference"],
"id_type": "venueid"
}
}
ICML 2024:
{
"base_url": "https://api2.openreview.net",
"conference": {
"name": "ICML 2024",
"venues": ["ICML.cc/2024/Conference"],
"id_type": "venueid"
}
}