Prioritising indicators from items in big data: An algorithm for an automated, visual approach

Identifying representative indicators requires distinguishing the driving forces and directions of relationships in innovation, economic or health data.

The innovation is an algorithm, a sequence of coded instructions, automated to derive visual tools directly from big data.

The algorithm is adaptable to various fields of study for rapid, data visualisation and enables transparent, evidence-based indicator prioritisation.

Innovation Summary

Innovation Overview

The opportunity:
Developing effective innovation, economic or health policies, and identifying their impacts on national or regional performance, necessarily requires sourcing information from large, administrative datasets. Measurement frameworks require representative indicators to assess impacts on economic growth and population well-being. Methods to rapidly prioritise informative indicators from big data are lacking.

The design and implementation of statistical solutions to data problems may be reapplied under different scenarios. Well-designed algorithms can specify computing processes while computer program automation exploits pre-defined, standardised data structures to reduce the time and cost of producing, replicating or extending information. Proven, automated, big data algorithms can produce informative outputs rapidly, facilitating indicator prioritisation and human decision-making.

The innovation:
The innovation is an algorithm specifying data-processing steps, automated for use on big data. An automated algorithm schematic is shown on page 3 of the provided attachment OpenGovernmentCaseStudy_ROyomopitoPhD.pdf. The innovation generates two visualisations, Graphics 1 and 2, with tabular references derived directly from data. Automated algorithms can process big data in a timely fashion and are conducive to generating publication-ready materials in a reporting pipeline (see attachment page 4). Graphic 1 and Graphic 2 visualisations are shown on pages 5 and 6 of the attachment, respectively.

Graphic 1 shows a priori matched policy (x-axis) and performance (y-axis) item pairs sourced from innovation, economic, employment, education and health data. The datapoints represent normalised policy and performance relationships for one country, extracted from multi-country distributions. The format allows simultaneous visualisation of policy/performance items for prioritisation as indicators.

Referring to Graphic 1, expectations would be that high (low) scoring country policies would yield high (low) performance. Therefore, quadrants of interest for predictive indicators would be ++, --. Horizonal and vertical dashed lines delineate meaningful statistical cut-offs. Points in discordant quadrants, outside reasonable standard deviation limits, show anomalies to be investigated.

Preliminary iterations of Graphic 1 are exploratory. Points showing desired relationships, i.e. good (poor) policy = good (poor) performance, suggest that the policy/performance pair are informative as indicators, providing scale dispersion requirements are fulfilled. A final iteration of Graphic 1 would show selected, representative indicators for monitoring and evaluation or in-depth analysis.

Graphic 2 represents grouped country identifiers for one indicator over time, categorised by a pre-defined response threshold. Categories show indicator response, non-response and highlights the problems of missing data in interpreting outcomes.

Graphic 2 methods were originally designed on data for immune system reconstitution in patients from US Congress-funded clinical trials. The method is appropriate for rapid, longitudinal representation of any informative indicator, such as those identified by Graph 1, after meaningful thresholds have been determined.

Graphics 1 and 2 were designed individually and proposed for use in combination in 2018.

The objective of Graphic 1 is to visually discriminate representative, informative indicators from large administrative datasets. The objective of Graphic 2 is to allow simple, visual, longitudinal assessment for a selected indicator having a large number of entities. Both graphs, derived directly from the data, take seconds to run and provide invaluable guidance for interpreting important patterns in big data.

The OECD Economics Department benefited from Graphic 1, when it was used for exploratory analysis to identify where specific policy and performance indicators were lower than the OECD average for the publication “Stocktaking: Going for Growth” (2006). After identifying representative indicators, next steps would be analysis, as published by the OECD, shown on attachment page 7, from “Annex A1 Factor Analysis to Identify Inter-related EIS Innovation Indicators”. Original methods were derived from an International Biometric Society prize-winning analysis on patient indicators.

US-Congress funded Researchers benefited from Graphic 2 which informed selection of statistical analyses for the publication: “Antimicrobial-specific cell-mediated immune reconstitution in children with advanced HIV infection receiving HAART”, A Weinberg, S Pahwa, R Oyomopito et al., Clinical Infectious Diseases (2004), Vol.39, No.1, pp.107-14.

Future directions:
The algorithm will be presented in April 2019 at the Australian Department of Health.

The OECD will be approached with a suggested application of deriving digital transformation indicators.

Innovation Description

Innovation Development

Innovation Reflections

Leave a Reply

Your email address will not be published. Required fields are marked *

Innovation provided by:

Join our community:

It only takes a few minutes to complete the form and share your project.