EXPERTS RALLY BEHIND DOH DATA INTEGRITY AIM, Ateneo data scientists affirm validity of findings

Press Release/ 14 May 2020

With the easing of the quarantine across the nation presenting a new chapter in the Philippines’ fight against COVID-19, local experts have rallied behind the Department of Health (DOH) to affirm the government’s decision-making process through sound reporting of data.

According to data scientists Dr. Reena Estuar of the Ateneo de Manila University and Dr. Maya Herrera of the Asian Institute of Management (AIM), as the country is battling a new virus, data is not always guaranteed with 100% accuracy. In a media forum this morning, both described how the DOH goes through a rigorous process in ensuring data accuracy in the reporting of COVID cases.

“We validate the data before we use it, and the data that does not pass, we give them back to the data source,” said Herrera, an actuary, who clarified that data integrity is managed at the point of entry. “But this is only the beginning of an exhaustive process. Data is validated again before it is used for analysis,” she said.

Herrera cautioned that errors can occur at many points- from capture to transmission. Even sophisticated data collection systems in large private organizations require data validation. This, however, does not taint the overall interpretation of data. She said, “When you find a conflict or any questionable information, you bring it back (to the data owners), so you can correct the data. But you can use the remaining validated data. So, if there’s a 1% error, then you have 99% of the data that's correct. Then you run tests to check whether the remaining data is acceptable for your analysis.”

Herrera is a Professor of strategy and finance from AIM and a fully qualified actuary with decades of experience in statistics and data analytics, particularly on health data. She heads a team of volunteers from AIM and the University of the Philippines College of Medicine that, among other activities, helps the FASSSTER team with forecasting health logistics requirements for the Covid-19 response.

Meanwhile, Estuar reported that data collection is crucial and has become a challenge in terms of reporting on the cases. According to her, one of the challenges is the slowing down of encoding as the number of cases rise. To address this, the data collection system has been automated to eliminate errors in data entry before it is transmitted to the national agency. Local Government Units are also being encouraged to invest in data encoders that can assist in data entry of case reports.

Estuar is the project leader of FASSSTER or the Feasibility Analysis of Syndromic Surveillance using Spatio-Temporal Epidemiological Modeler for Early Detection of Diseases. It is a web and mobile application for disease modeling and surveillance developed by the Ateneo de Manila University. Originally designed for creating predictive models and

visualizing possible scenarios of outbreaks of dengue, typhoid fever, and measles, this technology has been adapted to analyze with COVID-19 data.

Certain practices in the data collection must be taken into account. Estuar explained that the challenge of real-time data is disclosing it to the public in a timely manner, while at the same time constantly validating at the entry point. “Do we relax the data entry, or do we restrict? It’s a balance that needs to be considered,” she said.

Estuar also stressed that strict measures are taken to ensure the integrity of data. “We make sure we show the accuracy of numbers every day,” she said. In doing this, they immediately assess which parts of the data needs to be reviewed at the local level, these are in the cities and municipalities. This level of granularity allows for the models to update every day. A validity and reliability scoring process is also done to increase confidence in data that will be used for decision making.

Herrera supports Estuar’s pronouncement, saying that data science is an iterative process. "When we get more data, we get more discoveries for science. We are learning, we discover new things," Herrera added.

In a recent Beat COVID-19 Virtual Presser, DOH Usec. Maria Rosario Vergeire stressed confidence in the data as the science behind it is a collaboration among credible institutions including Thinking Machines Data Science, Inc.

Both Herrera and Estuar encourage the public to check out the DOH tracker which is available to the public. “I recommend the public look at the DOH tracker because it provides good information on the everyday output of the activities in relation to COVID. Also look at global reports because we also reference and benchmark. There are several sources of data that is used to make informed decisions and we try to provide our planners with a platform that will allow them to have a wholistic approach to this pandemic.” Estuar concluded.