Training machine learning models
Can multiple Trained Models be deployed at the same time?
You can deploy multiple Trained Models if they have a different Trained Model Type. As of right now, since there is only one Trained Model Type, only one Trained Model can be deployed. Deploying a new Trained Model replaces the existing one.
Can the TMF Bot be retrained?
You can create as many Trained Models as you like, using different sets of documents, different Prediction Confidence Thresholds, or different Minimum Documents per Document Type values. These can all be reviewed and evaluated before deploying within the system, allowing you to select the best one to deploy.
If we train and deploy a model but then withdraw it, will the nightly job detect that a model is not deployed and attempt to train and deploy a model automatically?
The Auto-on TMF Bot feature will only create, train, and deploy one Trained Model per major release. If you withdraw that model, it will not be re-deployed. In the next major release, a new model will be created, trained, and deployed if no other model is deployed, or if the previous release’s Auto-Trained Model is deployed.
Can the TMF Bot be trained on similar documents that file to different Document Types in different scenarios?
We have seen some success with filing similar documents into different classifications, though these documents often do not have a high enough confidence for TMF Bot to auto-classify them. Setting different classifications for similar documents with no data-driven reason (information in the file content or file name that specifies where it should go) does not perform well in TMF Bot.
Can we modify or add to the parameters used to train the machine learning models?
You can control the Prediction Confidence Threshold (what confidence does the model need to have in its prediction before it will auto-classify the document) and the Minimum Documents per Document Type (how many documents need to be in a document type before the model predicts that document type); however, no other parameters can be added or modified.
How many documents do I need to train the TMF Bot?
We recommend at least 1,000 Steady State (Approved or Final) documents. In general, the more documents you have, the more accurate the model will be.
Is there a limit or an optimal number of documents that I can use to train the Trained Model?
Trained Models have a limit of 200,000 documents that you can use for training. The optimal number of documents is 200,000. If you have fewer than 200,000 documents, use as many as you can. Optimally, all of the documents sent to the Trained Model would be 100% correctly classified.
Will the Trained Model be more accurate if I give it more documents?
In general, yes, the more documents you provide the Trained Model, the more accurate the model will be. However, this depends on the Prediction Confidence Threshold and the accuracy of the classifications in your system.
Will training a Trained Model cause slow performance in my Vault?
No, it will not. The Trained Model uses queues and threads that are separate from those Vault uses on a daily basis.
Does Vault automatically split the document set into a training set and testing set while training Trained Models?
Vault will automatically split the information pulled for training a Trained Model into 80% for the training set and 20% for the testing set. We stratify on Document Type to ensure we have the 80/20 split for each Document Type.
Evaluation and User Acceptance Testing
How do I evaluate the TMF Bot?
There are two main approaches to evaluating the TMF Bot:
- Create and train Trained Models in your Production environment and evaluate the results provided after training completes. If the results meet your expectations, deploy the Trained Model.
- Create, train, and deploy a Trained Model in your QA or Sandbox environment using the Train Model From Production Data action.Verify that the auto-classification works properly. Then create and train Trained Models in your Production environment and evaluate the provided results after training completes. If the results meet your expectations, deploy the Trained Model.
Can I pilot the TMF Bot within a Sandbox environment?
Yes, by using the Train Model From Production Data action to train a Trained Model. You’ll still need to train a Trained Model in your production Vault, however, as the Trained Model cannot be moved from Sandbox or Pre-Release to production.
How do I disable TMF Bot if I do not wish to use it?
The TMF Bot is auto-on for all eTMF customers with more than 1,500 Steady state documents. To disable an automatically deployed model, you will need to withdraw it via the Withdraw Model user action.
Uploaded Documents
Can I still choose to classify documents immediately?
Yes, you can. The TMF Bot will work on auto-classifying documents within the Document Inbox. However, if you choose to use the “Classify Now” option on the upload screen, you can still classify the document yourself.
Can we have the TMF Bot only auto-classify certain Document Types or documents for certain Studies?
The TMF Bot will only auto-classify documents to Document Types on which you’ve trained it. However, we DO NOT recommend only training on a small number of Document Types as this will result in many misclassified documents. All documents that enter the Document Inbox when you deploy a Trained Model will be auto-classified. There is no concept of pilot studies; however, you could work with a few studies at first to exclusively use the Document Inbox.
Do I need to name the file in a specific way for the TMF Bot to auto-classify the document correctly?
No, you do not. We’ve seen success with the TMF Bot where there are no defined naming conventions. However, having a good naming convention is one of the factors that can help the TMF Bot be more accurate, even though it’s not a requirement.
Does the TMF Bot include scanned documents or any documents that need Optical Character Recognition (OCR)?
Yes, it does. The TMF Bot has an express pipeline where it performs Optical Character Recognition (OCR) on the uploaded document if necessary.
Does the TMF Bot use the text within a document when it auto-classifies?
Yes, it does. The TMF Bot uses the following information to auto-classify a document:
- Number of pages
- Number of characters
- File type, file size
- File name,
- Extracted text from the document
If the document has hand-written text, is that ok?
Yes, it is. Our OCR generally ignores hand-written text, and TMF Bot uses all other text for auto-classification.
Can the TMF Bot classify emails?
Yes, it can. We have seen some great success in the TMF Bot auto-classifying emails into the appropriate Relevant Communication Document Type. As with all auto-classification scenarios, there are cases where the confidence isn’t high enough for TMF Bot to auto-classify the emails.
What happens after auto-classification
After the TMF Bot auto-classifies a document, who updates the Name?
If you are using Document Type auto-naming, the name will be automatically updated. If the document’s name is manually controlled, you need to modify the name while you Complete the document from the Document Inbox.
How does TMF Bot consider Security Profiles?
A user must have the Create Document permission on the Document Type the TMF Bot is trying to use for auto-classification. A user must also have general classify permissions within their permission set.
If the TMF Bot cannot auto-classify a document, how are the owners notified?
Because we have a goal to auto-classify documents within 5 seconds, we will not notify the Owner. Instead, you can use the TMF Bot field to understand if the document has finished processing with the TMF Bot or not. If it has finished and there is no auto-classification, then the user can Complete the document, selecting the appropriate Document Type.
Is auto-classification captured in the audit trail?
Yes, it is. The audit trail will show that the system has updated the Type, Subtype, and Classification of the document.
Who is the Owner of a document that TMF Bot has classified?
The user who uploaded the document is still the document Owner. The TMF Bot simply updates the document after uploading.
Will sharing settings be applied to documents in the Document Inbox picked up by the TMF Bot?
Yes, they will. Sharing Settings are not affected by the TMF Bot.
Will the TMF Bot also promote my document to the Final or Approved state?
No, it will not. The TMF Bot auto-classifies documents, but those documents remain in the Document Inbox until completed by a user. Their state does not change.
How do documents leave the Document Inbox?
You must still Complete the document to move it out of your Document Inbox.
Do I have a chance to check the Document Type the TMF Bot provided?
You can check the document’s classification within the Document Inbox before Completing it.
Misclassifications
Can I reclassify a document if the TMF Bot selected the wrong classification?
Yes, you can. The Reclassify option is available within the Document Inbox.
Can I report on documents that TMF Bot misclassified (documents a user reclassified after the TMF Bot auto-classified them)?
The Prediction Metrics object tracks information about a model’s performance and is available as a section on the Trained Model page layout. You can use this object to create Prediction Metrics reports at the Classification-specific and Global Weighted Average levels.
We also capture this information in the Predictions object in Business Admin. This object is not reportable within the system as Vault stores this data in JSON strings. However, this data can be exported to and manipulated within Excel.
When a user reclassifies a document auto-classified by the TMF Bot, does the machine learning model learn from this?
The machine learning model does not learn from reclassifications due to the large amount of time required to train it. However, we will track this feedback so that we can recommend retraining your machine learning models in a future release.
Future functionality
Does Veeva plan to have the TMF Bot used as part of quality control?
Yes, we still plan to allow the TMF Bot to be used as part of the QC process where the TMF Bot can check a document before it goes to the QC Reviewer and, if it finds issues, send those issues back to the Owner to resolve before the QC Reviewer proceeds. This feature is targeted for 2022.
Can I use the TMF Bot to check the classifications of my older documents?
This enhancement is on the roadmap. We plan to allow the TMF Bot to be run against an entire Study and provide details about which documents it believes are misclassified.
Does the TMF Bot support multiple languages?
In the initial release, only documents with English text can be used for training and auto-classification. TMF Bot may support non-English documents in a future release.
When will the TMF Bot be able to auto-classify non-English documents?
There is no specific timeline, though this is on the roadmap. Note that there are cases where a multilingual document in both English and another language may still be auto-classified by the TMF Bot.
Will document auto-classification be available in the Vault Platform to be leveraged by other Vault Applications?
As of right now, the document auto-classification capabilities are specific to Clinical Operations. We may see more capabilities in other Vault Applications in the future.
Can the TMF Bot determine if a document is a new version of an existing document?
It cannot. We have considered this for the future, but it’s much further away for now.
Will the TMF Bot populate Study, Study Country, and Site?
This enhancement is on the roadmap, but the first iteration of the TMF Bot (21R2) only includes auto-classification.