Enhancing efficiency through machine learning and digitisation
Why it's material
According to McKinsey, companies that are digital leaders in their sectors have faster revenue growth and higher productivity than their less digitised peers. They improve profit margins three times more rapidly than average and, more often than not, have been the fastest innovators and the disruptors and transformers of their sectors.1

How this issue links to other aspects of our business
Our global priority SDGs
![]() |
![]() |
Our emerging risks
Industry 4.0
Our strategic fundamentals
- Grow our business
- Sustain our financial health
- Drive operational excellence
The global forces shaping our Thrive25 strategy
- Globalisation and high levels of connectivity
- The rapid pace of technological innovation, including AI
1 The McKinsey Global Institute: Digitization, AI, and the future of work: imperatives for Europe
Our approach
Machine learning (ML) is no longer the preserve of artificial-intelligence researchers and born-digital companies like Google or Netflix. Recognising this and in line with our agile operating approach, we work to integrate ML into our business processes.
Key developments in 2020
We continued to drive our data driven culture in order to drive productivity and profitability. This involves standardising and consolidating regional data science platforms and digital transformation strategies to support our global Industry 4.0 initiatives, including machine learning and advanced analytics technologies.
One workstream involves the extensive use of digital twins to promote discovery, interpretation, and communication of meaningful patterns in data. In the Sappi context, a digital twin is a virtual model of a process, or semi-finished or finished product. By pairing the virtual and physical worlds, we can analyse date and monitor systems, thereby anticipating and avoiding problems before they occur, preventing downtime, developing new opportunities and planning for the future through the use of simulations. As an example: we have created a digital twin for every dissolving pulp (DP) batch produced at Saiccor Mill. Each batch contains information relating to all the upstream processes that contributed to that batch, including timber, liquor and digester cook, washing and bleaching. This digital twin data ensures that process engineers have all the necessary data available in context to analyse issues in the plant.
In parallel to the development and continuous improvement of Digital Twins throughout all the regions, significant focus and effort has been invested in enabling our domain experts (such as engineers and research scientists) in the Data Science domain using a tool called RapidMiner. This RapidMiner enablement programme aims to democratise data science within our organisation with a view to expediting problem solving, encouraging innovation and empowering the broader Sappi community by equipping them with data, skills and tools. Through the programme, our people are encouraged to bring their ideas and business problems to the data science team.
Enabling our data-driven culture
Our aim is to provide an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. To achieve this, we use the Braincube data science software platform in addition to RapidMiner. We also provide our people with online and practical training and regular workshops are held to assist with developing ideas into a proof of concept.
All ideas pass through a series of funnels and gates, to ensure that the best ideas become operational and demonstrate value in terms of one or more of the following:
- Process optimisation
- Quality improvements
- Reduced costs
- Improved profitability
- Time saving
WHAT IS MACHINE LEARNING?
McKinsey& Company say that machine learning (ML) is "based on algorithms that can learn from data without relying on rules-based programming".1 Stanford University suggests that ML is "the science of getting computers to act without being explicitly programmed".2 Carnegie Mellon' University's definition states that the field of ML seeks to answer the question 'How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?'"
Regardless of the definition, at its most basic level, the goal of machine learning is to adapt to new data independently and make decisions and recommendations based on thousands of calculations and analyses. It's done by infusing artificial intelligence machines or deep learning business applications from the data they're fed. The systems learn, identify patterns, and make decisions with minimal intervention from humans. Ideally, machines increase accuracy and efficiency and remove (or greatly reduce) the possibility of human error.4
Innovative solutions for a thriving
Promising initiatives currently underway include the following:
- Pulp yield is a significant economic actor, as woodfibre dominates the total pulp production cost. One way to reduce the woodfibre cost is to increase the pulp yield, thereby increasing operational competitiveness. For example, the annual wood cost savings of a 1% pulp yield increase for a 1,000 metric tons per day DP mill is about US$1.8 million. Some of the additional indirect effects of higher pulp yield are:
– Reduced recovery of boiler loading
– Energy savings, for example, reduction in steam consumption
– Better utilisation of invested capital.

A real-time prediction model was developed in RapidMiner to predict yield utilising only intrinsic viscosity and the cellulose content of pulp. This RapidMiner artificial intelligence (AI)) evaluation allows us to identify the most important parameters influencing the pulp yield and tweak the operation conditions to maximise the pulp yield without compromising the pulp quality.
- Initial analysis with RapidMiner confirmed that the drum level control at one of our mills was the biggest single cause of recovery boiler steam flow variation. The project is now focusing on understanding the impacts of steam flow, feedwater temperature and liquor cycle properties on the steam flow.
- At Gratkorn Mill, the data science team is developing an automated process that will highlight problematic areas in the profile data for the paper, coater and calendar machines. Preliminary results show that the algorithm can identify peaks in the profile data that can then be related to a specific problem like stretching. The team is currently working on transforming the algorithm to Braincube so that the analysis can be conducted out real-time. In addition, the team is testing several anomaly detection algorithms in RapidMiner that consider all the measured profile variables simultaneously to reduce the time that it takes to identify a problem. Based on the learning from Gratkorn Mill, the data science team is currently developing predictive models to notify operations of paper breaks before they occur, based on current plant operating conditions. This application is currently being refined for Gratkorn Mill, with a view to implementing similar solutions at Tugela and Lanaken Mills in the near future.
- In conjunction with the University of Pretoria, we conducted a pilot study to determine if NIRA could be used to classify susceptible eucalypt hybrids against the pathogen Chrysoporthe austroafricana a fungal pathogen that causes the development of stem cankers on susceptible trees. The next step will be to verify the model independently and then deploy it operationally. It will be a useful tool to identify hybrids to maintain high purity in nurseries rapidly and cost-effectively.
- One workstream is testing the suitability of using microdots for tracking and timber tracing from felling to the chip pile at the mill.
- Utilising Braincube, we have developed a customisable suite of root cause analysis (RCA) tools that can be utilised across all regions to expedite troubleshooting. These tools are currently being refined and expanded to:
– Ensure intuitive ease of use
– Tackle specific business cases like paper break analysis that are frequently encountered in mills
– Enable RCA on offline data where digital twins are not available