Skip to main content

Step 2: Collections and documents

To empower our assistant with knowledge, integrating extensive documentation serves as the foundation for its continuous learning. This task involves strategically loading documents that will be part of various collections.

For our example, we'll load documents containing museum information. The first step is to create a collection for these documents.

Create collection

Let's navigate to the Collections section from the sidebar menu, and once there, click on the button .

New collectionNew collection

A window will open for us to fill in the information for our collection. We'll enter the following information:

FieldDescriptionValue
TitleName of our collectionMuseums
DescriptionBrief description of the collection.Documents containing information about museums
TagsList of tags related to the collection.
Access modeSelect the access mode to the collection's documents. Public access or members only.Acceso público
MembersList of members who will have access to the documents and their associated privilege.

Once all the necessary information is provided, click on the Create button.

Create museums collectionCreate museums collection

Now we'll have our empty collection ready to upload our documents about museums.

Uploading Documents

In our new museums collection, click on the button to add our documents containing all the information we want our future assistant to learn.

tip

In this example, we've used documents about Spanish museums that you can find at the following link: museum documents.

Through a window, we can provide all the information about the document we are uploading. You can upload an external file (PDF, Word, Presentation, Spreadsheet) or you can write the text using the Markdown editor.

Let's now explore all the information we can provide about our document and the values we'll set for our example:

Create museum docCreate museum doc

FieldDescriptionValue
CollectionAssociated collection for the document.Museums
TitleDocument name.Guggenheim
DescriptionBrief document description.Information about the Guggenheim museum
Document SourceChoose the document's source. Markdown editor or existing file.Existing file
AudiencesTarget audiences for the document.Global
LanguageDocument language.Spanish
Do Not AdvertiseIndicates if the document should be advertised or not.Disabled
TagsList of document-related tags.

We'll follow this procedure for all the documents we want to upload into our museum collection. Once all the documents are uploaded in our collection, we'll be able to access them in both the collections section and the documents section of the sidebar menu.

Museums collectionMuseums collectionMuseums documentsMuseums documents
Museum collectionMuseum documents

Split method

Once the document is uploaded, on its detail page, you can view all its related information. You'll also be able to edit various fields such as the title, description, audiences,...

An important field to consider for your documents is the split method. This field determines how Ainhoa will extract information from the document based on its structure.

We have the following options available:

Split methodSplit method

OptionDescription
AutoAinhoa selects the most appropriate method based on the document.
NoneNo splitting is done in the document text. Useful when the document is simple and doesn't require separation into defined sections or structures.
ParagraphThis method is used when the document is organized into clear sections with distinctive titles. Ainhoa will extract information in the form of paragraphs, aiding comprehension and allowing for a more coherent reading of information.
BlockWhen a document lacks a clear structure or is a scanned document, this method is useful. Ainhoa will extract the text in blocks.

Selecting the appropriate splitting method is crucial for Ainhoa to understand and process the document's information optimally, adapting to the specific layout and format to provide accurate and coherent results.

In our example, we'll choose the Auto option for all documents.

Train collection

After uploading our documents, we need to train them. This stage is crucial as it enables Ainhoa to learn and comprehend the content of these documents, allowing her to assimilate information effectively. During this process, Ainhoa actively processes the data, identifying patterns, concepts, and relationships within the document information. This learning enables her to respond accurately and contextually to user queries or requests based on the information contained in those documents.

We can carry out the training process document by document or, as in our case, train all the documents in the collection.

On the museums collection page, we click the Train button, and shortly afterward, all the collection's documents will be ready for use.

Musseums trainedMusseums trained

tip

We can go to the detail of one of the documents and check in the Passages tab how Ainhoa has extracted the information from the document.

Document passagesDocument passages

Great, now our museum information is ready. Next, let's explore how to perform searches on this information.