Prioritising indicators from items in big data: An algorithm for an automated, visual approach
Identifying representative indicators requires distinguishing the driving forces and directions of relationships in innovation, economic or health data.
The innovation is an algorithm, a sequence of coded instructions, automated to derive visual tools directly from big data.
The algorithm is adaptable to various fields of study for rapid, data visualisation and enables transparent, evidence-based indicator prioritisation.
Developing effective innovation, economic or health policies, and identifying their impacts on national or regional performance, necessarily requires sourcing information from large, administrative datasets. Measurement frameworks require representative indicators to assess impacts on economic growth and population well-being. Methods to rapidly prioritise informative indicators from big data are lacking.
The design and implementation of statistical solutions to data problems may be reapplied under different scenarios. Well-designed algorithms can specify computing processes while computer program automation exploits pre-defined, standardised data structures to reduce the time and cost of producing, replicating or extending information. Proven, automated, big data algorithms can produce informative outputs rapidly, facilitating indicator prioritisation and human decision-making.
The innovation is an algorithm specifying data-processing steps, automated for use on big data. An automated algorithm schematic is shown on page 3 of the provided attachment OpenGovernmentCaseStudy_ROyomopitoPhD.pdf. The innovation generates two visualisations, Graphics 1 and 2, with tabular references derived directly from data. Automated algorithms can process big data in a timely fashion and are conducive to generating publication-ready materials in a reporting pipeline (see attachment page 4). Graphic 1 and Graphic 2 visualisations are shown on pages 5 and 6 of the attachment, respectively.
Graphic 1 shows a priori matched policy (x-axis) and performance (y-axis) item pairs sourced from innovation, economic, employment, education and health data. The datapoints represent normalised policy and performance relationships for one country, extracted from multi-country distributions. The format allows simultaneous visualisation of policy/performance items for prioritisation as indicators.
Referring to Graphic 1, expectations would be that high (low) scoring country policies would yield high (low) performance. Therefore, quadrants of interest for predictive indicators would be ++, --. Horizonal and vertical dashed lines delineate meaningful statistical cut-offs. Points in discordant quadrants, outside reasonable standard deviation limits, show anomalies to be investigated.
Preliminary iterations of Graphic 1 are exploratory. Points showing desired relationships, i.e. good (poor) policy = good (poor) performance, suggest that the policy/performance pair are informative as indicators, providing scale dispersion requirements are fulfilled. A final iteration of Graphic 1 would show selected, representative indicators for monitoring and evaluation or in-depth analysis.
Graphic 2 represents grouped country identifiers for one indicator over time, categorised by a pre-defined response threshold. Categories show indicator response, non-response and highlights the problems of missing data in interpreting outcomes.
Graphic 2 methods were originally designed on data for immune system reconstitution in patients from US Congress-funded clinical trials. The method is appropriate for rapid, longitudinal representation of any informative indicator, such as those identified by Graph 1, after meaningful thresholds have been determined.
Graphics 1 and 2 were designed individually and proposed for use in combination in 2018.
The objective of Graphic 1 is to visually discriminate representative, informative indicators from large administrative datasets. The objective of Graphic 2 is to allow simple, visual, longitudinal assessment for a selected indicator having a large number of entities. Both graphs, derived directly from the data, take seconds to run and provide invaluable guidance for interpreting important patterns in big data.
The OECD Economics Department benefited from Graphic 1, when it was used for exploratory analysis to identify where specific policy and performance indicators were lower than the OECD average for the publication “Stocktaking: Going for Growth” (2006). After identifying representative indicators, next steps would be analysis, as published by the OECD, shown on attachment page 7, from “Annex A1 Factor Analysis to Identify Inter-related EIS Innovation Indicators”. Original methods were derived from an International Biometric Society prize-winning analysis on patient indicators.
US-Congress funded Researchers benefited from Graphic 2 which informed selection of statistical analyses for the publication: “Antimicrobial-specific cell-mediated immune reconstitution in children with advanced HIV infection receiving HAART”, A Weinberg, S Pahwa, R Oyomopito et al., Clinical Infectious Diseases (2004), Vol.39, No.1, pp.107-14.
The algorithm will be presented in April 2019 at the Australian Department of Health.
The OECD will be approached with a suggested application of deriving digital transformation indicators.
What Makes Your Project Innovative?
The novel, automated algorithm is flexible, in that it may be applied to innovation, economic, employment, health, geospatial and digital transformation data. The algorithm outputs are most useful for big data sources or administrative datasets where there are a large number of entities, such as countries, regions, businesses or a health system, and where a large number of items are available for use as indicators.
The visualisations enable human reviewers to discern important patterns in large numbers of items for selection as monitoring and evaluation indicators - for individual investigation or multivariate analysis. The visualisations, and associated tabular references, are generated directly from the data allowing rapid, seamless inclusion in a report production pipeline.
Collaborations & Partnerships
Stakeholders providing research questions were essential to the design of the two innovative visualisations.
The origin of Graphic 1 was “Stocktaking: Going for Growth exploratory data analysis” 2006, based on a request to identify where policy and performance indicators were lower than the OECD average.
The origin of Graphic 2 was a US Congress-funded national clinical trial. Immunologists asked “Can HIV medications reconstitute immune system response, as measured by stimulation indicators?"
Users, Stakeholders & Beneficiaries
The OECD Economics Department was the target in the development of Graphic 1. The visualisation was used for exploratory analysis to identify where specific policy and performance indicators were lower than the OECD average for publication “Stocktaking: Going for Growth”.
US-Congress funded Researchers were the target for Graphic 2 development. Graphic 2 informed their choice of statistical analyses in A Weinberg, S Pahwa, R Oyomopito et al., CID (2004).
Results, Outcomes & Impacts
OECD Economists found Graphic 1 and associated outputs informative and the work was presented internally in 2006 to inform works for the publication “Stocktaking: Going for Growth”.
US HIV/AIDS Clinical Trials Immunologists benefited from novel visualisation of longitudinal data (Graphic 2) and statistical analyses were published in 2004.
The algorithm can be easily adapted for new enquiries on current data. Automation produces algorithm outputs in a timely fashion.
A presentation of works on the automated algorithm and its outputs is scheduled in April 2019 with Australian Department of Health. The works are applicable to the development of geospatial region indicators for health system entities such as primary healthcare networks.
Uptake of the algorithm by the OECD for “Measuring the digital transformation” would be an exciting opportunity. Contact with OECD Representatives which will be investigated during 2019.
Challenges and Failures
The challenge, in design of the data visualisations, was how to use available resources, technological know-how and statistically robust methods to communicate important data patterns to Stakeholders, succinctly, to facilitate evidence-based decision-making.
Another challenge was, that in each case, time was limited to accomplish the outcome.
No structural failures or significant setbacks were encountered.
Conditions for Success
Conditions for successful design and implementation of the novel visualisations were:
• a problem to solve;
• personal commitment and imagination;
• scientific knowledge;
• mathematical and statistical methods;
• high-level programming expertise;
• access to computing resources;
• encouragement from a mentor;
• direct access to the Stakeholders who were the end-users of the work.
In each case, the work was valued and used immediately giving a sense of job satisfaction.
The algorithm is designed for replication. Outputs are most useful for big data sources or administrative datasets where there are a large number of entities, such as countries, regions, businesses or a health system, and where a large number of items are available for use as indicators.
The algorithm is flexible, in that it may be applied to innovation, economic, employment, health, geospatial and digital transformation data.
Approaches to organisations, such as the Australian Government, the OECD and large enterprises, which can profit from the innovation are ongoing in 2019. A presentation and meeting at the Australian Department of Health is scheduled for April 2019.
The automated algorithm presented in this submission draws upon methods designed in economic and health contexts. Statistical models are not subject-specific, therefore, methods developed in one area may be translatable to other fields of study.
Another example of published across-subject methods are shown in “Annex A1 Factor Analysis to Identify Inter-related EIS Innovation Indicators”, OECD Economics Department Working Papers, No. 479 (2006) where methods were adapted from an International Biometric Society prize-winning analysis designed on patient indicators in 1998.
Also recycled were automated multivariate regression programs, generating results for R Oyomopito et al., (2010), “Measures of site resourcing predict virologic suppression, immunologic response and HIV disease progression following HAART”, (reviewed by UNAIDS Science Now and NAM AIDSMap) for “Assessing the Program Impact on Medicare Items for Acute Low Back Pain”, 2012, a report to the Australian Department of Health.