Case Study: Automated Solution for Procurement Notices

Background

An industry leader in the field of procurement, specializing in centralizing tender data, employing digitalization and automation to modernize the end-to-end tendering process, approached us with a complex problem.

Their objective was to enhance the efficiency, transparency, and accessibility of their procurement offering for governments, businesses, and individuals alike. However, they faced the following challenges:

  • Customer required procurement notices data to be scraped, normalized, parsed, and delivered in a custom format four times daily.
  • The data, ranging from 1980 till the present day, was not only diverse and high in volume (2,500+ notices per day) but also unstructured.
  • Each notice had 175-300 data points, and the data architecture of these notices changed over time.
  • The key challenge was to ensure consistent, accurate, and normalized data, despite the continuously evolving structure of the data source.

This would ensure more efficient, transparent, and accessible data delivery.

Data Points Daily

Notices Daily

Historical Data From

Impact

Our solution had a significant impact on our client’s operation. By providing a reliable, automated, and normalized data stream, we enabled the client to effectively streamline their data handling process. This not only improved their operational efficiency by 45% but also enhanced the transparency and accessibility of their services.

The historical data handling, in particular, allowed our client to provide a comprehensive, near real-time overview of procurement trends, which is an invaluable tool for governments and businesses. With our solution, they were able to cut costs by up to 30%, thanks to the automation of manual processes and significantly reduced errors in data extraction.

Furthermore, the speed of delivering procurement notices data to their customers improved by 56%, as the automated pipeline efficiently processed and delivered the data on daily basis. Our client was able to offer a more complete and accurate service, ultimately contributing to their goal of modernizing and improving the procurement process for the benefit of all the stakeholders.

Web Scraping Pipeline

Challenges & Solutions

6

Data Accessibility

The data was protected by an advanced anti-bot detection system, which made it difficult to access using standard data extraction methods.

7

Advanced Anti-Bot System Bypass

We developed a mechanism to navigate through the advanced anti-bot detection system while adhering to the platform’s data privacy guidelines.

6

Data Inconsistency

The procurement notices had different standard formats, which posed a challenge to extract and structure the data uniformly. This necessitated the development of advanced extraction tool ensuring accuracy of the captured information.

7

Dynamic Parsing Mechanism

Given the change in notice structures over time, we created a dynamic parsing algorithm that adapted to different formats. This ensured that all data points of interest were captured accurately and consistently.

6

Data Complexity and Volume Variability

The volume of procurement notices varied daily and had different structure, ranging up to 300 data points. This demanded adaptive extraction method ensuring all relevant information was accurately captured.

7

Data Extraction and Cleaning

We built an extraction system capable of handling the variable data volumes, ensuring that every procurement notice, was captured. The system included comprehensive data cleaning and random sample inspection maintaining consistency.

6

Data Format and Delivery

The client preferred the Open Contracting Data Standard (OCDS). Requirements included deliveries in JSON format, uploaded to an AWS S3 bucket daily. This demanded a precise process to ensure integrity while adhering to the preferred format and delivery mechanism.

7

Data Normalization and Delivery

We developed a data normalization process to unify the different standard formats into the preferred OCDS. This provided the client with standard data format. We also implemented an automated delivery system to convert and upload the data to the client’s AWS S3 bucket in the preferred format.

6

Historical Data

The client required historical data from 1980 till present, which presented a further obstacle due to the changing structure over such a long time period. Addressing this required a sophisticated approach to harmonize the diverse sets of historical data.

7

Historical Data Handling

We implemented a special module to extract, clean, and normalize historical data from 1980 till the present day. This allowed us to deliver complete and uniform historical data to the client.

Key Takeaways

Custom Solutions for Complex Issues:

This case study emphasizes the value of creating personalized solutions to tackle unique and complicated data problems. We demonstrated our ability to design customized solutions, such as a flexible parsing mechanism, advanced anti-bot system bypass, and automated data delivery on scale.

Adapting to Data Variability

The project required a versatile and adaptable system due to changes in data structure and volume. Our approach successfully managed daily changes in data volume and structure, ensuring that the delivered data remained consistent and accurate.

The Importance of Standardizing Data

When working with a variety of data sources, it’s crucial to organize the data into a single, standard format to simplify analysis and usage. By converting the data into a widely-used format, we provided the client with easily accessible information.

Expertise in Handling Historical Data

Our ability to collect and organize historical data, particularly from 1980 to the present day, showcases our skill in managing intricate and long-term data extraction tasks.

Government Procurements Scraping

Conclusion

In conclusion, the success of this project is significant. It led to a 30% reduction in costs and 56% improvement in delivery speed. These achievements underscore the importance of proficiently handling complex data extraction, normalization, and delivery tasks, especially when dealing with fluctuating and continuously evolving data structures.

This future-proof solution not only addressed the client’s immediate requirements but also adapted to potential changes, as the field continues to evolve. Through our automation efforts, we delivered an impressive 1.4 million data points daily, a feat that significantly cut manual processes and error rates.

This achievement underscores the importance of custom data solutions in supporting clients to achieve their goals, improve operational efficiency, and enhance service transparency.

By integrating historical data handling, we provided an invaluable tool for understanding procurement trends, thereby contributing to a deeper understanding of data management and promoting continuous advancement within this domain.

Netflix
Duke University
DHL
Pfizer
AT&T

We Understand How Data Can Create Real Change

Make Faster Decisions by Extracting, Organizing, and Analyzing Web Data at Scale