How machine learning corrects grammatical errors in Google Docs
Spelling or grammatical errors can be distracting and make a proposal look unprofessional, something we all want to avoid. That is why last year Google introduced new grammar correction tools in Google Docs to help people write more quickly and accurately. With the help of machine learning, already more than 100 million grammar suggestions are flagged each week.
To date, Google’s grammar correction system uses machine translation technology. Essentially each suggestion is treated like a translation task which in this case, translating from the language of ‘incorrect grammar’ to the language of ‘correct grammar.’ At a basic level, machine translation performs substitution and reorders words from a source language to a target language, for example, substituting a “source” word in English (“hello”) for a “target” word in Spanish (“hola”).
With the latest advancements from Google’s research team in the area of language understanding, made possible by neural machine translation, Google is making a significant improvement to how it corrects language errors by using Neural Grammar Correction in Docs.
How it works
Since Grammatical Error Correction (GEC) can be viewed as “translation” from ungrammatical to grammatical sentences, sequence-to-sequence models developed for neural machine translation can be applied to this task. To train high-quality models, Google generally want to have millions or billions of examples of parallel data where each training example consists of a sentence in the source language paired with its translation in the target language. Unlike several other machine translation tasks (such as translating from English to French), there is very little parallel data for GEC.
To overcome this challenge, Google developed two contrasting methods to generate large quantities of parallel data for GEC. The first method takes good sentences and makes them worse by automatically translating them to some other language and then back to English. The second method extracts source-target pairs from Wikipedia edit histories with a minimal amount of filtration.
To ensure that the models were feasible to deploy on Google Docs without using an unreasonable amount of computing resources, Google used Tensor Processing Units (TPUs). TPUs have provided substantial performance increases for many other Google products, including Smart Compose in Gmail. In addition, Google used the open-source Lingvo TensorFlow library which enabled them to easily experiment with modelling changes and also allowed them to carefully optimise how the TPU cores generate suggestions.
So what does it all mean for you? Well, by applying neural machine translation models to grammar correction, Google is able to correct many more of the grammar mistakes you may make while writing. To launch these improvements, Google did a lot of testing to ensure that the changes actually are more helpful. Here are some of the examples from Google’s evaluation process that demonstrate neural grammar correction’s capabilities:
Changing to the neural machine translation method gives a marked increase in the recall of grammar correction suggestions in Docs. So, the next time you write on Google Docs, remember that there’s an AI working in the background to make sure your grammars are error-free.
Leave a Reply