Building Serverless Data Extraction API using Kumologica

Kumologica
4 min readSep 20, 2020

Data extraction is a common task in IT world for retrieving data from a data source and preparing it for further processing or storage. In this article I thought of building a simple API that would demonstrate extraction of data from an external API and preparing them for processing. This use case can help data science engineers in building similar data extraction capability with very minimal effort.

We will use COVID19 API which is free API that would provide the current stats on COVID in each country. In order for processing the data returned by COVID19 API it will be converted to CSV format with only the extracted set of fields. This API flow will be developed using Kumologica.

Kumologica is a free low-code development tool to build serverless integrations. You can learn more about Kumologica in this medium article or subscribe to our YouTube channel for the latest videos.

Use case

The API flow will accept the country name as the parameter. Using the country name the flow will invoke COVID19 API to get the current stats for that particular country. The response JSON data from COVID19 API will be filtered to specific data structure which will then be converted to CSV format. The CSV content will be placed on to Amazon S3 bucket for further processing.

Flow Logic

Prerequisite

  1. Kumologica designer installed in your machine. https://kumologica.com/download.html
  2. Create an AWS S3 bucket with the name — covidinfostore.

Implementation

The diagram below shows the different systems that our flow will be responsible to orchestrate. Given that most of our dependencies are in AWS, we are going to target AWS Lambda as our deployment target to run our flow.

Solution Landscape

Steps:

  1. Open Kumologica Designer, click the Home button and choose Create New Kumologica Project.
  2. Enter name (for example DataExtractionFlow), select directory for project and switch Source into From Existing Flow …
  3. Copy and Paste the following flow
  4. press Create Button.

You should be seeing flow as given below on the designer canvas.

DataExtractFlow implementation

Understanding the flow

  1. Get /covid/stats/:country is the EventListener node is configured to have the EventSource as Amazon API gateway. The node will have the following configuration.
verb : GET
path : /covid/stats/:country

2. SetCountry is the set-property node to set country as a variable.

$msg.header.event.Records[0].pathParameters.country

3. SetUrl is the set-property node to set url as a variable. Select JSONata in drop down.

'https://api.covid19api.com/total/dayone/country/'&$flowContext("country","vars")&'/status/confirmed'

4. InvokeCOVID19API is the HTTP request node with the following configuration.

Method :  GET
Url : vars.url
Return: a parsed JSON object

5. ExtractValues is the Datamapper node with the following JSONata expression to extract the values.

$map(msg, function($v,$i){
{
"Country" : $v.Country,
"Cases" : $v.Cases,
"Date" : $v.Date
}
})

6. CSV is the CSV node to transform the JSON object to CSV format.

7. S3 is the S3 bucket node to put the csv object to covidinfostore bucket.

Deployment

  1. Select CLOUD tab on the right panel of Kumologica designer, select your AWS Profile.
  2. Go to “Trigger” section under cloud tab and select the Amazon API Gateway trigger.
API Gateway tigger setting

3. Press Deploy button.

Try it

  1. Invoke the following endpoint using the any REST client of your choice.
https://<<gateway instance id>>.execute-api.ap-southeast-2.amazonaws.com/test/covid/stats/<<country>>

You should see the csv file with the name of the country as the file name in the S3 bucket.

CSV file uploaded in the S3 bucket

Conclusion

This article has shown how easy to develop a serverless data extraction API using Kumologica Designer.

Remember Kumologica is totally free to download and use. Go ahead and give it a try, we would love to hear your feedback.

--

--

Kumologica

Kumologica is the first low-code development solution that makes your integration services run on serverless compute regardless the cloud provider.