Extracting Metadata & Files Using Vault Loader

Vault Loader allows you to export fields, values, and files for documents, document relationships, object records, users, and groups in CSV format. Many users utilize these exports to create a template for CSV input files when creating, updating, or “upserting” Vault data. You can also use the extracts for detailed data analysis and processing in another application, or for migrating data into another Vault.

How to Extract

To extract metadata and files from your Vault:

Using the left panel of the Loader tab, navigate to the Extract page.
Select an Entity Type.
Optional: If extracting document metadata, use the Document Type picklist to filter your export to a specific document type, subtype, or classification.
Optional: Select the Include Non-editable Fields checkbox.
Optional: Select the Only Extract Column Headers checkbox.
Optional: Select the Include Source Files checkbox. This option is only available when extracting documents and document versions.
Optional: Select the Include Renditions checkbox. This option is only available when extracting documents and document versions.
Optional: Select the Include Text checkbox. This option is only available when extracting documents and document versions.
Optional: Select the Override Default Column Selection checkbox.
Optional: In the Where Clause field, enter fields on which to filter and click the Validate button to check for syntax errors.
Click Extract.

For extracts where the Entity Type is Groups, Vault does not support filtering, and the Where Clause field is disabled. For extracts where the Entity Type is Users, Vault allows WHERE clause filtering but does not support validation.

After you click Extract to begin the object or document metadata extract, Vault begins processing the request. When finished, you’ll receive a Vault notification and email with request details and CSV output files. Learn more.

Extract Types & Limits

Extract Type	Output	Limit
Document/Document Version Metadata	CSV output	1,000,000
Document/Document Version Source Files	CSV output with links to file staging	10,000
Document/Document Version Renditions	CSV output with links to file staging	2,000
Document/Document Relationships	CSV output	1,000,000
Object Record Metadata	CSV output	1,000,000
Object Relationships	CSV output	10
Users	CSV output	1,000,000
Groups	CSV output	1,000,000

Vault Loader can also hit a limit in terms of query size (50,000 character maximum). If you see this error, update your extract settings to include fewer fields, for example, by not including non-editable fields or by choosing specific fields to include.

Include Non-editable Fields

Select this option to extract both editable and non-editable fields. Non-editable fields include data like creation dates and modification dates. Extracting all fields and values provides you with a more complete record of the data in your Vault, which is useful for data analysis in other applications. Note that these extracts only include non-editable fields that are required.

When extracting metadata for subsequent editing and reloading, do not include non-editable fields. With the exception of ID, non-editable fields are not allowed in the input.

Include Text

Select this option to generate a text (.txt) file for each extracted document and document version that contains the document’s text. Vault exports the files to your Vault’s file staging. When an export includes multiple documents or document versions and you select the Include Text checkbox, the CSV output includes a separate row for each .txt file with a corresponding document’s text.

Only Extract Column Headers

Select this option to extract only field names. This kind of extract provides you with an input file template to which you can add values to create new records.

Note: When using this option, Vault excludes the ID field. If you are updating existing records, you will need to add the ID field to the CSV input file.

Override Default Column Selection

To choose specific columns to include in the extract, you can select the Override Default Column Selection checkbox. This opens a column selection pane. You can move fields between the Available Columns box and the Selected Columns box using the left and right arrow buttons, or by double-clicking the desired column. In the Selected Columns box, rearrange the column order using the up and down arrow buttons.

The Override Default Column Selection checkbox is not available for extracts where the Entity Type is Users.

You can apply the following filters to the available columns:

Show editable fields: Shows only fields that users can edit.
Show required fields: Shows only fields marked as required.
Show outbound relationships: Shows only fields that are either object or document references.

To reset all settings to default, click Restore defaults.

Column filtering is not available for extracts where the Entity Type is Users.

Include Source Files & Include Renditions

Select these options to extract the source files and/or renditions associated with a document or a document version. Vault exports the files to your Vault’s file staging. When an export includes documents or document versions with multiple rendition types and you select the Include Renditions checkbox, the CSV output includes a separate row for each rendition type.

Using the Where Clause Field

The Where Clause field allows you to filter the extract. For example, use object record IDs to extract metadata from documents where:

country__c = '00C000000000102' AND product__v = '00P000000000119'

WHERE clause filtering is not available for extracts where the Entity Type is Groups.

You can filter on any field configured on the selected object type. If filtering a document extract by document type, you’ll also need to ensure that your fields exist on that document type.

When extracting document relationships, you can filter by relationship type. For example, to find all document attachments:

relationship_type__v = 'attachments__v'

You can also filter using the name__v value. For single-select fields, use the relationship name in the WHERE clause. For multi-value fields, use a subselect statement. In this example, Country is a single-select field and Product is a multi-select field:

document_country__cr.name__v = 'United States' AND product__v IN (SELECT id FROM document_product__vr WHERE name__v = 'Cholecap')

You can click into the Where Clause field and then access the token {…} icon to search for fields. Click the Validate button to check for syntax errors. WHERE clause validation is not available for extracts where the Entity Type is Users.

The Where Clause field supports the same VQL operators and functions as the VQL WHERE clause, but it only supports the MAXROWS and SKIP VQL clauses. Learn more about VQL, including subselect syntax and finding document relationships, in the Vault Developer Portal.

Reindexing

Vault indexes metadata to facilitate quick searching and to effectively apply access controls. Vault extracts depend on the indexed metadata. When values change or when users create new documents or object records, Vault must update the index. While reindexing is in progress, newly created documents or object records may not be included in your extract. In most cases, the reindexing takes less than one (1) minute.

ID Values

All extracted records include their system-managed ID values. When performing update or upsert actions, IDs are the primary method of identifying existing records in your input files. Although these are not editable, ID values are always included in the first column of the output.

Object Reference Fields

Document and object extracts identify object references using the external_id__v and name__v fields, in addition to the internal id field.

For example, a Marketing Campaign object record extract might look like this:

`id`	`name__v`	`product__c`	`product__c.name__v`	`product__c.external_id__v`
OBE000000000303	Cap Your Cholesterol	00P000000000202	CholeCap	CC
OBE000000000304	Ever Wonder	00P000000000101	WonderDrug	WD