TC6.9 - ETL processes for multiprotocol and format data ingestion, see on GITHUB for library

Test Case Title	TC6.9 - ETL processes for multiprotocol and format data ingestion, see on GITHUB for library
Goal	Access to a large library of ETL process for smart city, from GITHUB connection with dashboards via HTTPS/HTTP, LDAP Verify that the ETL can manage multiple protocols and multiple formats.
Prerequisites	Using a PC or Mobile with a web browser.
Expected successful result	List of a set of ETLs on the GitHub portal.
Steps

See Supported Protocols for the full list of supported protocols.

Please note that to correctly perform this Test Case you need and access to the ETL Virtual Machine as described below. To have access to a Virtual Machine to perform ETL please contact snap4city@disit.org.

Users roles for which this test case is available: public user (all people)

Example 1: Access to a large library of ETL process for smart city, from GITHUB

Go the Disit home page on GithHub: https://github.com/disit

Fig. – Disit GitHub home page.

Select the tab ‘Repositories’

Fig. – Disit GitHub home page>Resources.
Search for ‘snap4city’
- A set of ETL processes created for the snap4city context is available (direct link: https://github.com/disit/snap4city/tree/master/ETL)

Fig. – Disit GitHub search for ‘snap4city’.

Example 2: Verify that the ETL can manage multiple protocols and multiple formats

To verify that the ETLs are capable to manage MULTI PROTOCOLS and MULTI format, it can be useful see the table present in the TC6.5-ETL-heterogeneous, in which a detailed list of the snap4city ETLs with the specified different formats and protocols managed is made. Moreover, it is possible to open some ETLs (here after only some of them, but the action can be done on all the ETLs available) I order to verify this feature.

ETL1): Helsinki_youth_subsidies (Protocol: HTTP, data input format: xls, data output format: csv)

A set of samples are already present in the VM, for example we can choose the ‘Helsinki_youth_subsidies’ that is capable to: i) download a dataset; ii) transform it; iii) save the new data in a file (into the VM file system). To verify the ETL functionalities follow the steps:
- Click on the menu and select: ‘File > Open’ and select the file from: ‘Ubuntu/Desktop/snap4cityETL/Helsinki_youth_subsidies_XLS/Ingestion/Maink.kjb’
- In the Spoon Interface the Main Spoon Job appears.

Fig.: Spoon Interface: open the ‘Helsinki_youth_subsidies_XLS’ ETL.

Fig.: Spoon Interface: the ‘Helsinki_youth_subsidies_XLS’ ETL opened.

To run the ETL process, click on the main ‘Run’ button (), a popup will open with a set of default values (including the url from which to download the dataset: ‘’), click on the Second ‘Run’ button.

Fig.: Spoon Interface: launch an ETL (‘Run button’).

Once launched the process, a set of windows will open. One relevant is the ‘Logging window’ from which it is possible to read the logs. When the logging windows report the sentence ‘‘Spoon – job has ended’, the job is ended with no errors.

Fig.: Spoon Interface: ETL executes, logging details.

Open the ‘HTTP’ step to view how the connection with the web server it is realized:
- Click on the ‘HTTP’ Icon (right click) and select ‘edit’.

Fig.: ETL execution: edit the HTTP step.

In the URL field is present the direct url to download the dataset (or a parameter that can come from the job settings, from the Database, from a configuration file, etc.)
In the Target file field is present the file path into which the data coming from the web server are written

Fig.: ETL execution: HTTP step details.

What are the actions done by this process and how it is possible to verify them? Some block, can be expanded, with double left click or 1 right click of the mouse.
- Activities:
  - block ‘Start’: start the process (see its details with: ‘right click > edit’),
  - block ‘Create original data folder’:
    - if not exists, creates a new folder for the downloaded file: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Original_data’
  - block ‘HTTP’: makes the HTTP REST request (to ‘http://www.nk.hel.fi/avoindata/avoin_data-avustukset.xls’) and saves the xls file (named ‘avoin_data-avustukset.xls’) in the folder created in the previous block
  - block ‘File Exists’: checks if the uploaded file is already present
  - block ‘Create Elaborated data folder’: ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data’’
  - block ‘DUMMY’: do nothing
  - block ‘process data’: writes the downloaded file, made data analysis and save the result in a new file in the folder ‘‘/home/ubuntu/Desktop/ETL_Output/Helsinki_youth_subsidies_XLS/Elaborated_data/Result_data.csv’
  - block ‘Success’: ends the process.
- How to verify:
  - Search the folder into the file system and verify the presence of the files, you can also delete them, and launch again the ETL process.

Fig.: How to verify the Helsinki_youth_subsidies_XLS Results.

ETL2: Florence_School_Canteen_FTP (Protocol: FTP, input file format: shape)

Follow the same steps viewed in the previous case of ‘Helsinki_youth_subsidies_XLS’
Open the main job:
- An FTP step is present, in this case the data come from the Disit FTP and are downloaded in the Virtual machine

Fig.: Details on the FTP step.

ETL3: Florence_firstAid_accesses_HTML (Protocol: HTTP, input file format: html, file output format: csv, other format used in the ETL to transform data: xml)

Follow the same steps viewed in the previous case of ‘Helsinki_youth_subsidies_XLS’ and open the Open the main job (‘Florence_firstAid_accesses_HTML/Ingestion/main.kjb’)
In the ‘Download_data’ Trasformation is present:
- an HTTP step, downloading the data from a web server

Fig.: Details on HTTP step.

A set of transformations o the strings
A block saving the data in an xml file

Fig.: data saved in an xml file.

In the ‘process data’ transformation is present:
- a set of blocks working on strings starting from the xml file downloaded
- a final step saving the transformed data in a csv output file
  
  Fig.: transformations on data saved in a cv output file.

ETL4: via_francigena_farmhouse_GeoJson (Protocol: HTTP, input file format: json, file output format: csv)
Follow the same steps viewed in the previous case of ‘Helsinki_youth_subsidies_XLS’ and open the Open the main job (‘via_francigena_farmhouse_GeoJson/Ingestion/main.kjb’)

In the ‘Download_data’ Transformation is present an HTTP step, downloading the data from a web server

In the ‘process data’ transformation:
- a set of blocks working on strings starting from the json file downloaded
- a final step saving the transformed data in a csv output file

Fig.: transformations on data saved in a cv output file.

TOURISMO: TOURism Innovative and Sustainable Management of flOws roottooladmin1
GDPR Compliant People Detection and Counting using Thermal Cameras drupaladmin
IOT Applications = NodeRED + Snap4City MicroServices drupaladmin
CSBL - widgetKnob (IN/OUT) giovanni.manzo
Snap4City Business Model roottooladmin1
HOWTO: Exploit the new DeviceTable widget to show data on a table format. drupaladmin
Index: CSBL Client Side Business Logic - Widget Manual for Business Intelligence giovanni.manzo
CSBL - Calendar Button giovanni.manzo
CSBL - Library functions for dashboard interaction giovanni.manzo
CSBL - Time Machine, Undo Stack for business intelligence applications drupaladmin