Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Latam-GPT is new A large language was developed in and for Latin America. The project, headed by a non-profit Chilean National Center for Artificial Intelligence (Cenia), aims to help the region achieve technological independence by developing an open source and a model dressed in Latin American languages and contexts.
“This work cannot take only one group or one country in Latin America: it is a challenge that requires every participation,” says Alvaro Soto, CEO, in an interview with Wire En Espanol. “Latam-GPT is a project that is up to open, free and, above all, for two years, with a very double process, who gathers citizens from different countries who want to cooperate. The bottom initiatives recently and began to participate in the project.”
The project stands out for his collaborative spirit. “We are not looking for competition with Openai, Deepseek or Google. We want a model specific for Hispanic and Caribbean, aware of cultural requirements and challenges that it implies, as history and unique cultural aspects” explains SOTO.
Thanks to 33 strategic partnerships with institutions in Latin America and the Caribbean, the project gathered the data corpus more than eight Terabytes of the text, equivalent to a million books. This database has enabled the development of a language model with 50 billion parameters, a scale that makes it comparable to GPT-3.5 and it gives it a medium to high capacity to achieve complex tasks such as reasoning, translation and associations.
Latam-GPT is trained in a regional database that compiles information of 20 Latin American countries and Spain, with an impressive total of 2,645,500 documents. Data distribution shows a significant concentration in the largest countries in the region, and Brazil is a leader with 685,000 documents, and Mexico with 385,000, Colombia with 220,000, and Argentina with 210,000 documents. The numbers reflect the size of these markets, their digital development and the availability of structured content.
“Initially, we will start the language model. We expect that its effects in general tasks, but with superior performance in topics specific to Hispanic America. If we ask about topics relevant to our knowledge, her knowledge will be much deeper”
The first model is the starting point for the development of the family of advanced technologies in the future, including those with image and video and for scaling to larger models. “As this is an open project, we can use the group in Colombia, it could adapt it to the school education system or it in Brazil could adapt it to the health sector. The idea is to open the door for different areas such as CEENIA director.