f
TAGS
H

Generative Reo Māori AI – The Good, the Bad and the Ugly

On April 10, 2024, AI researchers gathered at the 2024 Artificial Intelligence Researchers Association Conference for a panel discussion around Generative Reo Māori AI. The panel was chaired by Associate Professor Te Taka Keegan (Waikato-Maniapoto, Ngāti Porou, Ngāti Whakaue) from the University of Waikato. The panellists consisted of:

  • Lynell Tuffery Huria (Ngāti Ruanui, Ngāruahine Rangi, Ngā Rauru Kītahi) Tumu Whakahaere | Managing Partner, Kahui Legal

  • Te Mihinga Kōmene (Waikato-Maniapoto, Ngāpuhi, Ngāti Tamaterā, Ngāti Whakaue,   Ngāti Pikiao, Ngāti Porou)  Māori Language Education Specialist, PhD Ākonga & Contractor

  • Tūreiti Keith (Ngāi Te Rangi, Ngāti Ranginui) Senior Data Scientist, Te Hiku Media

  • Ria Tomoana (Ngāti Kahungunu, Ngāti Pāhauwera, Te Atiawa ki Te Upoko o Te Ika) Research Manager, Te Mātāwai

  • Basil Keane (Ngāti Kahungunu, Rangitāne, Ngāpuhi) licensed Māori language translator and interpreter, Te Mātāwai.

The panellists discussed generative reo Māori AI from their own particular backgrounds; they shared some illuminating examples and thought provoking perspectives. The discussions took place under three headings; the Good, the Bad and the Ugly. Some of the main topics discussed are summarised below.

The Good

Current technologies for chatbot AI work very well for conversational te reo Māori. They are available 24 hours a day, 7 days a week, they are always in a cheerful mood, they do not mind you asking the same question over and over, and they provide safe, non judgemental spaces where learners of te reo Māori can find confidence to converse in te reo Māori.

There are many on-ramps to learning te reo Māori. This is another tool that can clearly encourage and enhance the engagement in the learning of te reo Māori.

One of the big issues when learning a new language is the transition from learning the grammatical rules from a book to being able to operationalise it in a conversation with a speaker of that language. A real difficulty for te reo Māori is that there are few speakers and a very small number of communities where the language is being spoken by 50% or more of the people. Consequently, there is a big cluster of learners of te reo Māori who get stuck at that pre conversational level where they have enough language for a basic understanding of conversations but not enough to participate in a good conversation. And they have difficulty finding opportunities to practice te reo. A reo Māori AI tool provides a conversational immersive experience to assist these speakers to continue and become confident speakers.

In general, the AI can come up with some useful responses, while taking care to be responsible. If you ask it the Lotto numbers it will respond with advice about gambling. If you ask it about law information as well as giving some responses it will also recommend getting legal advice. When asked to translate a mōteatea it refused saying this is specific to an iwi; these checks and balances are very important going forward.

An important factor is that it provides interest and accessibility to te reo Māori in a digital environment. Digital spaces are not inherently Māori spaces. There is a lack or resources for teachers and students, and this provides one avenue to address the huge digital inequity in language resources available for teachers and learners.

The Bad

While automated translation services are getting better, they are not yet good enough. The time taken to fix the translations is so long that at the moment it is better to just translate without the automated services.

Open AI models share their information without hesitation or reservation. There are times where stringent safeguards need to placed on Māori language data.

There is bias in the data that is used to create the models and consequently bias in the output. For example, in Western culture there is typically ‘one true history’. But in Māori culture, especially in pre-colonial culture there are typically multiple histories, with multiple versions. Every event usually has several different kōrero associated with it and this understanding is actually embedded in the language. But if you ask an AI tool about a historic event you get a single sanitised version of the event.

The question arises; will our languages be further colonised in these open platforms?

There is an incorrect language use which could potentially be detrimental to those learning language and the language in the long term. While the syntax is invariable grammatically correct often the output does not sound correct to a Māori ear.

Open models trained only on open data are terrible at te reo Māori. Closed models trained on unnamed data sources are very good at te reo Māori. The question must be asked, where has the data come from to train these models? A lot of people are placing Māori language data on the web, unknowingly making it available to be accessed. But what rights do others have to simply take that data? Māori language data is considered to be kōrero from tūpuna. It is a taonga, it is sacred. Māori have a relationship with it, and Māori must have a right to protect it. From a Māori perspective you would have the decency to speak to the people first, who’s data you wanted to access, before you accessed it. It does not feel like that has happened in this case.

Data and information placed in the public domain is free to access and use; that is a standard concept that underpins the intellectual property system. Legally, this enables data and information to be scraped from the Internet. As a consequence, te reo Māori is being scraped from the Internet for these tools. There is no free prior and informed consent. There is no access and benefit sharing arrangements being reached. Those tools are then selling that resource back to Māori, and Māori are paying for it. This should not be the case, especially for the descendants of the language.

In reo Māori communities, there is a fear of using AI. There is a fear of creating new knowledge derived from traditional Māori knowledge and then who owns that knowledge and that language? There is a fear that AI tools will ultimately re-colonise Māori knowledge and Māori language.

The Ugly

There are a lot of reo Māori nuances that generative Māori Ai tools do not take into consideration. For example, ChatGPT will write a karakia, in the form of a traditional incantation, it will reference Tane Mahuta, but it will also finish off with Āmene, a Christian way of closing a karakia. This highlights that it is not appreciating the mix it has of customary concepts with more contemporary aspects.

There is a Māori language newspaper from the 1800s called Te Hoa Māori. The reo Māori is actually pretty good in it. It is translations of proselytising Christian material from around the world. It talks about an orphan boy in Dublin and his issues. It is a really interesting example because it's from a time where the reo is very nice, but it does highlight what te reo Māori can look like with all of the cultural aspects stripped out.

Over the years that have been environmental impacts that have significantly changed te reo Māori. For example, European arrival, creating an orthography, and as soon as you codify it starts making changes. The Rangatahi series written by Hoani Waititi became a mainstay of te reo teaching throughout New Zealand, thus the Whānau Apanui reo became a significant base for Māori language learners. AI reo will become a predominant and authoritative version of te reo Māori, which will potentially be learnt by speakers not just in Aotearoa but around the world.

Stuff News in a collaboration with Microsoft and Straker Translations have made available ‘He Pūrongo Reo Māori’ using their AI technology to automatic translate reports and articles into te reo Māori. However, concern has been expressed at errors in the articles. This goes against a translation ethos that states ‘ko te reo kia tika’. Too many errors are appearing, and the fear is that these errors will become accepted and subsequently common usage. In terms of the future of the younger generation we must strive to have the best reo Māori being available for them. So in order for them to have the best te reo Māori, for them to thrive, to have their mana motuhake with their language, we need to ensure that the reo is tika, in all forms and particularly in digital spaces, because this is where young people spend so much time.

Conclusions & Moving Forward

You only use tools when you need to use tools. So, we need to be conscious of which communities these tools are created to serve. And then, when developing these tools, we need to ensure that the voice of those communities are in every part of the development, from concept design to concept implementation.

The protections and stop gates in place must ensure that the language is being respected at all times. That's only something that can be developed with Māori being there every step of the way.

Big tech takes Māori data and uses it to build models that potentially damage Māori language. And then sells it back to Māori. What is a way out of this? If we expecting too much of a foreign company to be interested in engaging, then perhaps it is for our language protection institutions to be stepping up? Te Taura Whiri and Te Mātāwai are two reo Māori institutions that understand the language, that understand the culture of the language and that understand what good te reo is. Perhaps they should be certifying these AI outputs?

A more absolute AI would include cultural aspects and cultural awareness. It is the cultural nuance, the language nuance that needs to be incorporated. The AI could ask where you are from, and when you responded, it could begin to respond with the appropriate world view, the appropriate data and the appropriate knowledge system that matches your identity. By querying to determine your identity it could a better connection, a concept that is often undertaken when Māori give a pepeha.

There is a traditional Māori saying ‘te reo me ōna tikanga’. There are two versions to what ‘ōna tikanga’ means, but in this instance it refers to the culture that is embedded in the language, the subtleties and nuances that express a richness of thought, a deeper knowledge and meaning above the simple syntax that is written. When the two are combined, ‘te reo’ the language and ‘ōna tikanga’ the embedded culture, the beauty of the Māori language is realised. At the moment, generative reo Māori AI is only utilising the syntax aspect of te reo Māori.

To the data scientists out there. if you are serious about doing good for te reo Māori, and te ao Māori, then engaging with those whose data you want to use can only be of benefit to the outcomes of your research. Forming relationships with Māori and Māori organisations is a great way to understand better the problems that you hope to solve and build trust with the community that will ultimately benefit from the work you do.