Step 2: Collections and documents

To empower our assistant with knowledge, integrating extensive documentation serves as the foundation for its continuous learning. This task involves strategically loading documents that will be part of various collections.

For our example, we'll load documents containing museum information. The first step is to create a collection for these documents.

Create collection

Let's navigate to the Collections section from the sidebar menu, and once there, click on the button .

New collection

A window will open for us to fill in the information for our collection. We'll enter the following information:

Field	Description	Value
Title	Name of our collection	Museums
Description	Brief description of the collection.	Documents containing information about museums
Tags	List of tags related to the collection.
Access mode	Select the access mode to the collection's documents. Public access or members only.	Acceso público
Members	List of members who will have access to the documents and their associated privilege.

Once all the necessary information is provided, click on the Create button.

Create museums collection

Now we'll have our empty collection ready to upload our documents about museums.

Uploading Documents

In our new museums collection, click on the button to add our documents containing all the information we want our future assistant to learn.

tip

In this example, we've used documents about Spanish museums that you can find at the following link: museum documents.

Through a window, we can provide all the information about the document we are uploading. You can upload an external file (PDF, Word, Presentation, Spreadsheet) or you can write the text using the Markdown editor.

Let's now explore all the information we can provide about our document and the values we'll set for our example:

Create museum doc

Field	Description	Value
Collection	Associated collection for the document.	Museums
Title	Document name.	Guggenheim
Description	Brief document description.	Information about the Guggenheim museum
Document Source	Choose the document's source. Markdown editor or existing file.	Existing file
Audiences	Target audiences for the document.	Global
Language	Document language.	Spanish
Do Not Advertise	Indicates if the document should be advertised or not.	Disabled
Tags	List of document-related tags.

We'll follow this procedure for all the documents we want to upload into our museum collection. Once all the documents are uploaded in our collection, we'll be able to access them in both the collections section and the documents section of the sidebar menu.


Museum collection	Museum documents

Split method

Once the document is uploaded, on its detail page, you can view all its related information. You'll also be able to edit various fields such as the title, description, audiences,...

An important field to consider for your documents is the split method. This field determines how Ainhoa will extract information from the document based on its structure.

We have the following options available:

Split method

Option	Description
Auto	Ainhoa selects the most appropriate method based on the document.
None	No splitting is done in the document text. Useful when the document is simple and doesn't require separation into defined sections or structures.
Paragraph	This method is used when the document is organized into clear sections with distinctive titles. Ainhoa will extract information in the form of paragraphs, aiding comprehension and allowing for a more coherent reading of information.
Block	When a document lacks a clear structure or is a scanned document, this method is useful. Ainhoa will extract the text in blocks.

Selecting the appropriate splitting method is crucial for Ainhoa to understand and process the document's information optimally, adapting to the specific layout and format to provide accurate and coherent results.

In our example, we'll choose the Auto option for all documents.

Train collection

After uploading our documents, we need to train them. This stage is crucial as it enables Ainhoa to learn and comprehend the content of these documents, allowing her to assimilate information effectively. During this process, Ainhoa actively processes the data, identifying patterns, concepts, and relationships within the document information. This learning enables her to respond accurately and contextually to user queries or requests based on the information contained in those documents.

We can carry out the training process document by document or, as in our case, train all the documents in the collection.

On the museums collection page, we click the Train button, and shortly afterward, all the collection's documents will be ready for use.

Musseums trained

tip

We can go to the detail of one of the documents and check in the Passages tab how Ainhoa has extracted the information from the document.

Document passages

Great, now our museum information is ready. Next, let's explore how to perform searches on this information.

Step 2: Collections and documents

Create collection​

Uploading Documents​

Split method​

Train collection​

Create collection

Uploading Documents

Split method

Train collection