Vault eTMF users may upload a large number of documents frequently. The TMF Bot can automatically classify new documents and automatically set metadata, saving your organization time and effort. Auto-classification with the TMF Bot can help reduce the number of classification errors, and surface potential issues sooner, thus increasing compliance. The TMF Bot can also automatically set the Study value for documents.
If configured to use TMF Bot for auto-classification, Vault will analyze documents added to the Document Inbox and populate their Document Type, Subtype, and Classification fields, with the current status of each document listed in the TMF Bot column. This column is empty when TMF Bot is not enabled or if TMF Bot confidence level is lower than the PCT.
Once this auto-classification is complete, you have the chance to review the classification before completing and removing it from the Document Inbox.
If configured to use TMF Bot for metadata extraction, Vault automatically populates the Study, Study Country, and Site fields on documents, including any documents created via email processors or API.
Note: If you are unable to locate the answers you need, please see our TMF Bot FAQ.
How to use Auto-Classification or Metadata Extraction
Once an Admin has deployed an Auto-Classification Trained Model and a Metadata Extraction model in your eTMF Vault, no additional action is needed on your part. The following methods of adding documents to the Document Inbox result in auto-classification and automatic population of the Study, Study Country, or Study Site fields by the TMF Bot when possible:
- Uploading a document normally and selecting Classify documents later
- Drag and drop documents into the Document Inbox
- Upload via Vault API
- Upload via Vault Loader
- Upload via File Staging Server
- Upload via Vault Mobile
- Create documents via Email Ingestion
The TMF Bot then follows these steps to queue, auto-classify, and automatically populate the Study field on uploaded documents:
- Vault checks the origin of each file and assigns it to a classification queue:
- Documents uploaded via API, Vault Loader, File Staging Server, or email are placed in a bulk processing queue, ensuring that large imports do not slow down typical auto-classification processes.
- All other documents, including those uploaded via Vault Mobile, are placed in an express processing queue.
- The TMF Bot automatically scans each added document. If you navigate to the Document Inbox, you can see the progress for each document in the TMF Bot field. If you cannot see the TMF Bot field, you can add it as a column in your Document Inbox. Each document will list one of the following statuses:
- Express Queued…: The TMF Bot is waiting to process the document from the express queue.
- Bulk Queued…: The TMF Bot is waiting to process the document from the bulk queue.
- Done: The file has finished processing.
- If the TMF Bot can auto-classify the document, the document has its Type, Subtype, Classification, or Study fields populated, and the Tags field will include the TMF Bot Auto-classified tag.
- If a Metadata Extraction model is deployed, the TMF Bot scans the file name and content to identify a Study > Study Country > Study Site hierarchy match in your Clinical Operations Vault. To populate Study, Study Country, and Study Site metadata, TMF Bot verifies or applies the following:
- The user must have permissions on the matched Studies.
- These Studies are not in study migration mode.
- There is no ambiguity with another matched hierarchy.
- If parent matches are found, only their children are considered as possible matches.
- If there are multiple parent matches, their children matches can be used to determine a single parent.
- If parents are not found, children must be unique to retrieve the hierarchy. If TMF Bot does not identify a hierarchy match, or rules above are violated, then the TMF Bot does not put anything in the Study, Study Country, or Study Site fields for that document.
Note: It is important to note that, while the Bot will search for Study, Study Country, and Site matches, it will set none of those fields if the user has specified the Study in the initial upload of the document to the Inbox. Thus, if you want the Bot to try to populate these fields, you must not specify a Study when you load the document to the Inbox.
Note: In order to populate the Site field, the Site must be at least three characters long. In cases where the Site is less than five characters, the TMF Bot will not populate the field unless it identifies a matching Study.
For example, imagine a study, AVEG 027, that has the following sites in it: F1, US4, and US245.
- Matches on the name “F1” will be completely ignored.
- Matches on “US4” will only be considered if the Bot also finds a match for “AVEG 027”.
- Matches on “US245” can be actioned, even if a match for “AVEG 027” is not found.
Note: If the Tags field is not selectable as a column within the Document Inbox, an Admin may need to update the default security of the Tags document field in Admin > Configuration > Document Fields > Base Document > Tags > Security Overrides . Default security should be set to Read Only.
While the time to process each document can vary, Vault aims to have each file processed in five (5) seconds.
Accepting Auto-Classifications and Metadata Extractions
Once documents have a value of Done in the TMF Bot field, use the checkboxes to select documents, then click Complete to enter any necessary document fields. Note that you can only complete documents with the same classification in bulk.
Once completed, the uploaded documents are available for additional processing. Document tags will indicate whether the document was Auto-classified or had any metadata set by the TMF Bot.
Rejecting an Auto-Classification or Metadata Extraction
If you find that TMF Bot applied an incorrect classification, you can navigate to the document and select Reclassify as normal. When you manually reclassify a document, Vault tags the document as TMF Bot Misclassified.
If you find that TMF Bot populated an incorrect Study, Study Country, or Study Site for a document, you can navigate to the document and manually update those fields.
Note: The TMF Bot Misclassified tag only applies to documents loaded into the Inbox, not documents reclassified through the TMF Bot QC step.
Auto-Classification Limitations
- Some document classifications may not be available to the TMF Bot. This is often because there were not enough documents to train the TMF bot on that classification or because the classifications were deliberately excluded from training.
- The TMF Bot only auto-classifies documents if it is confident in its selection. Documents typically have low confidence when the document could easily be classified as two or more different document types.
- Some categories of documents cannot be auto-classified. These include:
- Audio or Video files
- Non-text files, such a ZIP files, statistical files, or database files
- Non-English files
- Files where Vault cannot extract text, for example, if the text is too blurry or if the file is password-protected or encrypted.