**Source URL:** https://clinical.veevavault.help/en/lr/73297/index.md

# TMF Bot FAQ

# Training machine learning models

### What data is used to train the TMF Bot?

The TMF Bot has two options for sourcing data to train its model. For Production Vaults, TMF Bot uses document content from the same Vault. In non-Production Vaults, Admins can choose to use content from the Production Vault or the same non-Production Vault. In all cases, the training data and AI model are fully encapsulated within your environment, just as documents and data are completely separated among customer environments.

### Can multiple _Trained Models_ be deployed at the same time?

You can deploy multiple _Trained Models_ if they have a different _Trained Model Type_. Currently there are two model types: Auto-classification models and Metadata Extraction models. You can deploy one of each of these model types, but you cannot have two Auto-classification models or two Metadata Extraction models deployed at the same time. Deploying a new Auto-classification model or a new Metadata Extraction model replaces the existing one.

### Can we retrain the TMF Bot?

You can create as many _Trained Models_ as you like, using different sets of documents, different _Prediction Confidence Thresholds_, or different _Minimum Documents per Document Type_ values. These can all be reviewed and evaluated before deploying within the system, allowing you to select the best one to deploy.

### If we train and deploy a model but then withdraw it, will the nightly job detect that a model is not deployed and attempt to train and deploy a model automatically?

The Auto-on TMF Bot feature will only create, train, and deploy one Auto-Classification _Trained Model_ per major release. If you withdraw that model, it will not be re-deployed. In the next major release, a new model will be created, trained, and deployed if no other model is deployed, or if the previous release's Auto-Trained Model is deployed.

### Can the TMF Bot be trained on similar documents that file to different document types in different scenarios?

We have seen some success with filing similar documents into different classifications, though these documents often do not have a high enough confidence for TMF Bot to auto-classify them. Setting different classifications for similar documents with no data-driven reason (information in the file content or file name that specifies where it should go) does not perform well in TMF Bot.

### Can we modify or add to the parameters used to train the machine learning models?

Currently, you can control two parameters: the _Prediction Confidence Threshold_ (what confidence does the model need to have in its prediction before it will auto-classify the document) and the _Minimum Documents per Document Type_ (how many documents need to be in a document type before the model predicts that document type). However, you cannot add or modify any other parameters.

Note that the Prediction Confidence Threshold (PCT) is not the same as accuracy. In our analysis of TMF documents, we have found that a PCT of 0.80 can result in predictions that are 96-99% accurate, depending on the customer. At this time, then, we recommend setting the PCT to 0.85. 

It is also important to note that the Bot does not need many examples of a classification to train on it and predict with accuracy. Setting _Minimum Documents per Document Type_ to a higher value will result in the Bot training on fewer classification. Because the Bot can only predict a classification on which it has trained, if you do not train on a classification and a document of that type is loaded to the Inbox, the Bot can either fail to classify it, or misclassify it. For that reason, we recommend you set the _Minimum Documents per Document Type_ to 10 to ensure the Bot trains on more classifications.

In the future, we may remove the option to adjust these parameters, and instead set them to values that ensure optimal performance.

### How many documents do I need to train the TMF Bot?

We recommend at least 1,000 steady state (_Approved_ or _Final_) documents. In general, the more documents you have, the more accurate the model will be.

At minimum, to train or test a _Trained Model_, your document set must have at least 3 document types with more documents than the _Minimum Documents per Document Type_ selected on the _Trained Model_.

### Is there a limit or an optimal number of documents that I can use to train the Auto-classification _Trained Model_?

_Trained Models_ have a limit of 200,000 documents that you can use for training. The optimal number of documents is 200,000. If you have fewer than 200,000 documents, use as many as you can. Optimally, all of the documents sent to the _Trained Model_ would be 100% correct in terms of classification and metadata assignment.

### Will the _Trained Model_ be more accurate if I give it more documents?

In general, yes, the more documents you provide the Auto-classification _Trained Model_, the more accurate the model will be. However, this depends on the _Prediction Confidence Threshold_ and the accuracy of the classifications in your system.

Testing more documents on a Metadata Extraction _Trained Model_ will not improve its accuracy, but it will provide a better representation, via _Trained Model_ performance metrics, of the potential performance of metadata extraction in your Vault.

### How can I make a Metadata Extraction _Trained Model_ more accurate?

Metadata Extraction _Trained Models_ will search the document filename and content for strings that match the name of _Study_, _Study Country_ and _Study Site_ records in your Vault. It will not interpret other values such as Country Code or Principal Investigator name.

To optimize TMF Bot  Metadata extraction, we recommend using site names of at least six (6) characters long and assigning a unique site name across studies. Also, you can increase metadata extraction coverage by updating your business process to include these values in filename as a best practice (note that you need a separator between the values).

### Will training a _Trained Model_ cause slow performance in my Vault?

No, it will not. The _Trained Model_ uses queues and threads that are separate from those Vault uses on a daily basis.

### Does Vault automatically split the document set into a training set and testing set while training _Trained Models_?

With Auto-classification _Trained Models_, Vault will automatically split the information pulled for training a _Trained Model_ into 80% for the training set and 20% for the testing set. We stratify on document type to ensure we have the 80/20 split for each document type.

With Metadata Extraction _Trained Models_, the entire document set will be used for testing, up to a maximum of 40,000 documents.

### Does the TMF Bot populate _Study Country_ and _Site_?

The TMF Bot can currently populate the _Study_, _Study Country_, and _Site_ values on documents if you have a Study Metadata Extraction model deployed. The TMF Bot only populates _Site_ names that are at least three characters long. The TMF Bot will only populate _Site_ names with fewer than five characters if there is a matching _Study_.

For documents under _Approved_ legal hold, the TMF Bot will not populate a _Study_ value if the value to set matches the _Study_ value on a legal hold record in an _Approved_ state. 

# Evaluation and User Acceptance Testing

### How do I evaluate the TMF Bot?

There are two main approaches to evaluating the TMF Bot:

1. [Create and train Trained Models](/en/lr/72747/#training-the-trained-model) in your Production environment and [evaluate](/en/lr/72739/) the results provided after training completes. If the results meet your expectations, deploy the *Trained Model*.
2. Create, train, and deploy an Auto-classification _Trained Model_ in your QA or Sandbox environment using the [**Train Model From Production Data**](/en/lr/72747/#training-a-trained-model-in-pre-release-or-sandbox-environments-with-production-data) action. Verify that the auto-classification works properly. Then create and train *Trained Models* in your Production environment and [evaluate](/en/lr/72739/) the provided results after training completes. If the results meet your expectations, deploy the *Trained Model*.

### Can I pilot the TMF Bot within a Sandbox environment?

Yes, by using the **Train Model From Production Data** action (for Auto-Classification models) or the **Test Model from Production Data** action (for Metadata Extraction models) to train or test a _Trained Model_. You'll still need to train or test a _Trained Model_ in your production Vault, however, as the _Trained Model_ cannot be moved from Sandbox or Pre-Release to production.

### How do I disable TMF Bot if I do not wish to use it?

The TMF Bot is auto-on for all eTMF customers with more than 1,500 Steady state documents. To disable an automatically deployed model, you will need to withdraw it via the Withdraw Model user action.

# Uploaded Documents

### Can I still choose to classify documents immediately?

Yes, you can. The TMF Bot will work on auto-classifying documents within the Document Inbox. However, if you choose to use the **Classify Now** option on the upload screen, you can still classify the document yourself.

### Can we have the TMF Bot only auto-classify certain document types or documents for certain Studies?

The TMF Bot will only auto-classify documents to document types on which you've trained it. However, we DO NOT recommend only training on a small number of document types as this will result in many misclassified documents. All documents that enter the Document Inbox when you deploy a _Trained Model_ will be auto-classified. There is no concept of pilot studies; however, you could work with a few studies at first to exclusively use the Document Inbox.

### Do I need to name the file in a specific way for the TMF Bot to auto-classify the document correctly?

No, you do not. We've seen success with the TMF Bot where there are no defined naming conventions. However, having a good naming convention is one of the factors that can help the TMF Bot be more accurate, even though it's not a requirement.

### Does the TMF Bot include scanned documents or any documents that need Optical Character Recognition (OCR)?

Yes, it does. The TMF Bot has an express pipeline where it performs Optical Character Recognition (OCR) on the uploaded document if necessary.

### Does the TMF Bot use the text within a document when it auto-classifies?

Yes, it does. The TMF Bot uses the following information to auto-classify a document:

* Number of pages
* Number of characters
* File type, file size
* File name,
* Extracted text from the document

### If the document has hand-written text, is that ok?

Yes, it is. Our OCR generally ignores hand-written text, and TMF Bot uses all other text for auto-classification.

### Can the TMF Bot classify emails?

Yes, it can. We have seen some great success in the TMF Bot auto-classifying emails into the appropriate Relevant Communication document type. As with all auto-classification scenarios, there are cases where the confidence isn't high enough for TMF Bot to auto-classify the emails.

# What happens after auto-classification?

### After the TMF Bot auto-classifies a document, who updates the Name?

If you are using document type auto-naming, the name will be automatically updated. If the document's name is manually controlled, you need to modify the name while you **Complete** the document from the Document Inbox.

### How does TMF Bot consider Security Profiles?

A user must have the Create Document permission on the document type the TMF Bot is trying to use for auto-classification. A user must also have general classify permissions within their permission set.

### If the TMF Bot cannot auto-classify a document, how are the owners notified?

Because we have a goal to auto-classify documents within 5 seconds, we will not notify the Owner. Instead, you can use the _TMF Bot_ field to understand if the document has finished processing with the TMF Bot or not. If it has finished and there is no auto-classification, then the user can **Complete** the document, selecting the appropriate document type.

### Is auto-classification captured in the audit trail?

Yes, it is. The audit trail will show that the system has updated the Type, Subtype, and Classification of the document.

### Who is the Owner of a document that TMF Bot has classified?

The user who uploaded the document is still the document Owner. The TMF Bot simply updates the document after uploading.

### Will sharing settings be applied to documents in the Document Inbox picked up by the TMF Bot?

Yes, they will. Sharing Settings are not affected by the TMF Bot.

### Will the TMF Bot also promote my document to the Final or Approved state?

No, it will not. The TMF Bot auto-classifies documents, but those documents remain in the Document Inbox until completed by a user. Their state does not change.

### How do documents leave the Document Inbox?

You must still Complete the document to move it out of your Document Inbox.

### Do I have a chance to check the document type the TMF Bot provided?

You can check the document's classification within the Document Inbox before Completing it.

# Misclassifications

### Can I reclassify a document if the TMF Bot selected the wrong classification?

Yes, you can. The **Reclassify** option is available within the Document Inbox. When you manually reclassify a document, Vault tags the document as **TMF Bot Misclassified**.

### Can I report on documents that TMF Bot misclassified (documents a user reclassified after the TMF Bot auto-classified them)?

The [_Prediction Metrics_](/en/lr/72747/#about-the-prediction-metrics-object) object tracks information about a model's performance and is available as a section on the _Trained Model_ page layout. You can use this object to create Prediction Metrics reports at the Classification-specific and Global Weighted Average levels.

We also capture this information in the [_Predictions_](/en/lr/72747/#about-the-prediction-object) object in Business Admin. This object is not reportable within the system as Vault stores this data in JSON strings. However, this data can be exported to and manipulated within Excel.

You can also use the **TMF Bot Misclassified** tag to filter on documents that users have manually reclassified so that you can identify and analyze misclassified documents. The **TMF Bot Misclassified** tag only applies to documents loaded into the Inbox.

### When a user reclassifies a document auto-classified by the TMF Bot, does the machine learning model learn from this?

The machine learning model does not learn from reclassifications due to the large amount of time required to train it. However, we will track this feedback so that we can recommend retraining your machine learning models in a future release.

# What happens after Study Metadata Extraction?

### How does the TMF Bot consider Security Profiles?

A user must have _View_ permission on the _Study_ that the TMF Bot is trying to set on the document. If the user cannot View that _Study_, the TMF Bot will not set that _Study_ on the document.

### If the TMF Bot cannot set the Metadata values on a document, how are the owners notified?

Because we have a goal to set metadata quickly, we will not notify the Owner. Instead, you can use the TMF Bot field to understand if the document has finished processing with the TMF Bot or not. If it has finished and there is no _Study_, then the user can **Complete** the document, selecting the appropriate _Study_.

### Is Metadata Extraction captured in the audit trail?

Yes, it is. The audit trail will show that the system has updated the _Study_ field on the document.

### Will sharing settings be applied to documents in the Document Inbox picked up by the TMF Bot?

Yes, they will. Sharing Settings are not affected by the TMF Bot unless the TMF Bot updates metadata that is used to determine Sharing Settings on a document while it is in the _Unclassified_ lifecycle.

### Will the TMF Bot also promote my document to the Final or Approved state?

No, it will not. The TMF Bot acts on documents, but those documents remain in the Document Inbox until completed by a user. Their state does not change.

### How do documents leave the Document Inbox?

You must still **Complete** the document to move it out of your Document Inbox.

### Do I have a chance to check the Study the TMF Bot provided?

You can check the document's _Study_ within the Document Inbox before **Completing** it.

# Future functionality

### Can I use the TMF Bot to check the classifications of my older documents?

Yes. You can configure a workflow to include a step where the TMF Bot will check the classification and metadata of an existing document as it goes through the workflow. 

### Will document auto-classification be available in the Vault Platform to be leveraged by other Vault Applications?

As of right now, the document auto-classification capabilities are specific to Clinical Operations. We may see more capabilities in other Vault Applications in the future.

### Can the TMF Bot determine if a document is a new version of an existing document?

It cannot. We have considered this for the future, but it's much further away for now.
