Data Analytics on Snap4City, Machine Learning Operation, MLOps on Snap4City via ClearML

The design and development of Data Analytics, DA, Processes (DAP) is mainly performed taking in mind that their development cases are performed in Python or Rstudio. For DAP we intend the development of algorithms for some computation: KPI, predictions, optimization, simulation, etc., exploiting ML (machine learning), AI (artificial intelligence), XAI (explainable AI), operating research, statistics, etc.

     The DAPs can be devoted performing tasks of model training, model execution, computation, simulation, etc., in batch or stream. The design of DAP implies to decide their aims, for example, for implementing specific algorithms, or making predictions, anomaly detection, suggestions, statical analysis, clustering, recognition, counting, classification, object detection, KPI estimation, optimization, conversion, etc. Most of these aims can resolved by using techniques as ML, AI, XAI, NLP, operating research, statistics, etc. To this end they would need to exploit a set of libraries for Python or RStudio to produce a model (in a training phase) which in turn has to be saved to be later exploited in execution/inference. Python and RStudio platforms may exploit any kind of libraries such as Keras, Pandas, and hardware accelerator as NVIDIA to use Tensor Flow, and clusters of CPUs/GPUs, via ClearML, MLOps, etc.

     Moreover, in order to get data, the DAP in Snap4City can access to any kind of storage from external services and can access to the Snap4City KB (knowledge base, service map) and Big Data store. In that case, the access to Snap4City data is GDPR compliant and thus respect the privacy, the data licensing by using authenticated Smart City APIs, via some Access Token as explained in the Development Life Cycle Manual mentioned in the cover. The platform allows the access to historical and real-time data, and permits to save the resulting data produced by the algorithms, for example, heatmap-related predictions, the assessment of data quality, traffic flow data, ODMs, labels of detected anomalies, etc., also using some specific APIs. 

For the analysis details are reported in the Development Life Cycle and for a DAP one should identify:

  • What process must be implemented by the DAP? 
  • Which data models would be produced?
  • Which data are needed?
  • The DAP to be implemented is for training or for production?
  • How many users are going to exploit the DAP at the same time? How many executions per minute or per day?
  • How many processes for production I am going to have at the same time?
  • From where the DAP is expected to be called, from a Dashboard/view? or simply from a back-office process as a MicroService?
  • Which is the expected execution time?
  • Which is the expected precision, and which is the state of the art?
  • Do I need to execute the DAP exploiting special hardware as NVIDIA since I am going to use CUDA, tensor flow, …?

How to proceed to design the single DAPs according to its nature?

Here in the following the most relevant tasks summarized, just to recall you the main aspects to be addressed:

  • Problem analysis, business requirements.
  • Data Discovery, Data ingestion, acquisition (as above presented that can give for granted), data access from Snap4City platform or from other sources.
  • Data set preparation, transformation, identification of features, normalization, scaling, imputation, feature engineering, etc., eventual data ingestion to the Snap4City platform by using Proc.Logic or python and then storing data in the storage. The process of feature engineering may be performed by mean of PCA, of directly performing the first training and assessing the relevance of the features, may be discharging those less relevant.
  • Target Assessment Model Definition (mandatory to assess the precision of the results, t he quality of the solution produced)
    • Identification of metrics for the assessment, KPI.
    • Typically: R2, MAE, MAPE, RMSE, MSE, MASE, MAPE, …
  • Screening on Models/Techniques, for each Model/Technique or for the selection Model/Technique perform the
    • Model/Technique Development/testing
    • Performing for each of them some hyper-parametrization
  • Best Model/solution selection among those tested
    • If needed reiterate for different parameters, features, etc.
    • Comparison with state-of-the-art results.
    • Needs of Explainable AI solutions: global and local.
  • Deploy best Model/solution in production, monitoring in production. In this phase assumes particular relevant:
    • Security of data and solution
    • Scalability of the solution, in terms of multiple users requesting the same computation,
    • multiple requests of the same computation but working on different spatial area, such difference cities, KB, maps, graphs, time series, etc.

In conclusion, the main activities are those of Development and Execution.

Data Analytic Processes Possibilities

According to the kind of DAP support provided by the Snap4City platform you are using, the develop and the execution of DAP solutions can be performed and enforced in different manners, but it is any way possible to put in execution your DAP on Snap4City. The Snap4City DAP support is provided by means of a few different solutions which can be classified according to the components installed, which may impact the activities of Development and Execution in different manners.

The main components are:

  • Local Development environment on your premise for Python and/or Rstudio.
  • Jupiter HUB Server: a server providing development environment for Python with web interface
  • R-Studio Server: a server providing development environment for Rstudio with web interface
  • DAP Container Manager: A solution for creating Containers including DAPs and putting them in execution on cloud. It can be based on Marathon/Mesos as well as Kubernetes or others.
  • MLOps Support: A solution and tools to support developers in creating their DAPs, making experiments, optimising, testing and validating them, keep track of the performed experiments, etc., and also putting them in execution on some Container, exploiting also clusters of CPU/GPU.
  • IoT App/Proc.Logic processes: A Node-RED + Snap4City Libraries process which can be installed on premises or on cloud, which can exploit the Snap4City facilities: authentication and authorisation, data ingestion, data transformation, management of DAPs, calling of DAPs, interacting with dashboards (server-side business logic), interoperating with any kind of protocols and formats.
  • A&A, Authentication and Authorization mechanism of Snap4City or that of others interoperable platforms.
  • Advanced Smart City API, ASCAPI: a set of APIs to access/provide data from/to the Snap4City platform, as REST Call, microservices.

The main cases can be (starting from the less comprehensive to the most):

  1. Snap4City platform having: No DAP Container Manager, No Jupiter HUB Server, No R-Studio Server, No MLOps Support. In this case, the developers can develop their DAPs in the language they prefer, on some server or on their laptop.
    1. Once developed the DAP, it can be exploited by the Snap4City platform by making the DAP accessible via some API, or by using some data exchange via database or other means which can be controlled and exploited by some IoT App/Proc.Logic or Dashboard/View. If the DAP exposes some APIs, we suggest using Flask for Python and Plumber for Rstudio. In this case, the IoT App/Proc.Logic/Dash has to call the DAP as an external service.
      1. The external DAP providing the API may be protected by some external A&A mechanism. The IoT App/Proc.Logic can be connected using them.
      2. Please note that in the case of using External APIs from dashboards/views in JavaScript from client side, you may need to expose the credentials on the web page. So that, we suggest calling the external services APIs only from the IoT App/Proc.Logic.
    2. DAPs can exploit the Advanced Smart City API, ASCAPI, of Snap4City according to the Development Life Cycle. DAPs can access to protected data according to A&A based on OAuth as Access Dokens and GDPR, and can send data for their ingestion and save them into the platform, etc. The usage of the APIs is described in the Development Manual.
      1. The A&A for data access/save from IoT App/Proc.Logic is automated by the Snap4City Libraries and can be performed one on Edge, and totally transparent for the IoT App/Proc.Logic on cloud of Snap4City platforms, from MicroX to large solutions.
      2. The A&A for Snap4City Dashboards/views is also automated and may have JavaScript developed as Client-Side Business logic, see reference manual mentioned on cover of this document.
      3. The A&A for third party applications can be developed according to the Development Life Cycle manual.
  2. Snap4City platform with DAP Container Manager, No MLOps Support. In this case, the DAP Container Manager is integrated into the Snap4City platform accessible (typically based on Marathon/Mesos, and more recently also in Kubernetes). For the final user and for the developer the usage of one kind of DAP Container Manager or of another it should not be of great relevance and/or impact.
    1. DAP Container Manager based on Marathon/Mesos: provided on Snap4City.org. The developers have to code DAPs as API based processes, which expose their APIs via Flask for Python and Plumber for Rstudio according to Snap4City directives, see Development Life Cycle Manual and examples on web portal and training course.  
      1. The developers may have access to one or more Jupiter HUB Servers and R-Studio Servers, for DEP developing, or can develop the DAPs on their laptops/desktop. This means that the activities of tuning, hyper-parametrization, validation, etc., are all performed by coding.
      2. Once a DAP is developed according to the Snap4City directives for DAP development, it could be put in Execution. To this end, the:
        1. (B) in the following figure: DAP is put in execution on some server to expose the APIs which can be used by any IoT App/Proc.Logic as in A.1 case, above. In this case, the DAP can exploit the direct resources of the server, even NVIDIA boards, HPC, etc., if provided. In the cases of Snap4City.org, these kinds of DAPs can exploit (i) a large number of NVIDIA servers with huge number of GPUs, (ii) use external API of third party, (iii) exploit the Smart City APIs.
          1. The API based DAPs could be made accessible for Dashboards exposing the API on Internet. On the other hand, this may create a door for eventual attacks and unauthorized access to the DAPs.
          2. The API based DAPs should be protected in some manner. For example, working only if the DAP receives a valid Access Token (taken from the section), by which it can access via ASCAPI to protected content.
        2. (A) in the following figure: DAP code is loaded on the DAP Container Manager Marathon/Mesos via special Snap4City DA nodes for IoT App/Proc.Logic (available on IoT App advanced, for developers, typically accessible for AreaManager users). Those nodes request to the DAP Container Manager to automatically create a container and allocate it statically on cloud. Please note that, each new DAP has a counterpart node-red node into the IoT App/Proc.Logic flow which created it and is realized as a new container. The container of the DAP is only accessible/visible for the user who created it (which can list them on the IoT App/Proc.Logic list, where it can also be deleted/managed).
          1. This approach is suggested to be used only for realising prototypes and not for realising stable production DAPs due to its limited scalability and high consumption of resources. Moreover, the DAP container in this case is usable only by the IoT App of the user who create it.
          2. DAP may use the NVIDIA support only if provided at cloud level on any DAP container. Images of DAP containers need to be customized for adding specific libraries, and the exploitation of NVIDIA boards. Please note that this approach on Snap4City.org does not allow to the DAP to exploit the NVIDIA cluster facilities of Snap4City.
          3.   

  1. DAP Container Manager based on Kubernetes:
    1. ……….description will come…
    2. .
    3. .
  1. (C) in the above figure: Snap4City platform with MLOps Support and its integrated DAP Container Manager. This case represents the most advanced solution for DAP development and execution as described in detail in https://www.snap4city.org/download/video/Snap4City-MLOps-Manual.pdf

Snap4City with DAP Container Manager, No MLOps Support

In this case B) depicted in the above figure and described above, the data scientists may develop their DAPs on the provided Jupiter HUB, as a python development environment, as well as on Rstudio Servers.

In this kind of Snap4City platform, the development of DAPs can be performed on Jupyter HUB in Python as well as on Rstudio Servers by using ASCAPI:

  • provided by Snap4City, in this case the Jupyter HUB can be on a CPU server or on a CPU/GPU server
  • not provided by Snap4City, not accessing to the resources CPU/GPU of Snap4City

In Snap4City, the access to Jupiter HUBs for Python and/or Rstudio Servers for the development of DAP is provided by the RootAdmin. The role of the Snap4City users has to be AreaManager or higher.

Please note that, in this case, the activities of training, optimisation, hyper-parametrization, experiment tracking, assessment and validation, comparison, tuning, etc., are all in the hands of the developers.

On the other hand, the activity to put DAP in production is simplified. In the sense that, the DAP can be taken in charge by the DAP Container Manager for the execution.

The DAPs can be executed on:

  • Dockers Containers accessing and controlling them via some API, and these can be automatically produced and manage by the platform.
    • In this case, the management is typically performed by some Proc.Logic (IoT App).
    • The containers are automatically allocated on cluster and maintained alive to be used by Marathon or Kubernetes and may exploit the GPU/CPU according to the configurations. They are usually allocated dynamically, and they are moved from one VM to another by the DAP Container Manager.
  • Dedicated servers for developers and leaving them to access to the storage for using the data and providing results via Snap4City API, in authenticated and authorized manner.

 

Figure – Schema of DAP/Data Analytics (ML, AI) development
to be used as permanent Containers (exploiting CPU on cloud)

 

Figure – DAP development in R-Studio, similar to Python which is in Jupiter HUB

In Python and/or RStudio cases, the script code has to include a library for creating a REST Call, namely: Plumber for RStudio and Flask for Python. In this manner, each process presents a specific API, which is accessible from an IoT App/Proc.Logic as a MicroService, that is, a node of the above-mentioned Node-RED visual programming tool for data flow. Data scientists can develop and debug/test the data analytic processes on the Snap4City cloud environment since it is the best way to access at the Smart City API with the needed permissions. The source code can be shared among developers with the tool “Resource Manager”, which also allows the developers to perform queries and retrieve source code made available by other developers.

Rstudio and Python data analytics processes may include conceptually any kind of libraries for ML, AI, operative research, simulation, etc. On the other hand, when the process is adopted to produce a container, as in the next figure, the container has to include the library used in the code. The Development environment may be configured to allow at the single operators to load their own preferred libraries. Or requested libraries in the containers may be added by the RootAdmin. This can be performed by requesting a specific image to the platform manager and indicating the library you would like to have on Container executions.

 

Figure – Case B) DAP development flow in Python, from a Jupyter hub as well as from PC with Anaconda development environment installed.

This description of the flow refers to case in which the Python or Rstudio are created to be used as MicroServices from a Proc.Logic/IoT App. An alternative is to develop the DAPs to be used as standalone services, working on API, or providing some REST Call, and thus usable from Proc.Logic/IoT App according to the API or by collecting results on database. These aspects are described into the training course.

Figure– Data Analytics development flow in Python and integration into Proc.Logic / IoT App.

In Snap4City, there is a specific tutorial for the Data Analytic development with several examples:

https://www.snap4city.org/download/video/course/p4/

Read the mentioned slide course and/or platform overview to get a list of Data Analytics in place:

https://www.snap4city.org/download/video/Snap4City-PlatformOverview.pdf  

We also suggest reading the Snap4City booklet on Data Analytic solutions.

https://www.snap4city.org/download/video/DPL_SNAP4SOLU.pdf 

Read more on: https://www.snap4city.org/download/video/course/p4/

If you are interested to develop ML/AI processes with or without MLOps support, there is Python library which can be obtained only via subscription please contact snap4city@disit.org

Snap4City with MLOps Support, & DAP Container Mng: ML/AI..

In this case C) of the above list, the solution provided by Snap4City includes the support for MLOps, Machine Learning Operation. In Snap4City, the MLOps is provided by using a custom version of ClearML tool and by using a Jupiter HUBs for Python to develop DAPs. The access to this facility can be provided by the RootAdmin to AreaManager role of users or higher.

Snap4City with MLOps facility fully supports the phased of Development and Execution as described in document https://www.snap4city.org/download/video/Snap4City-MLOps-Manual.pdf   

Development, with the activities of:

  • Training with different parameters and models to be trained, hyper-parametrization, tuning, etc.

  • Validation and test in batch to find the best results wrt metrics, tracking and comparing the experiments, etc.

  • Managing high computational costs, managing time consumptions, sending DAP automatically on free GPU/CPU of clusters, etc.
  • And many other functionalities as described in https://www.snap4city.org/download/video/Snap4City-MLOps-Manual.pdf
  • On this phase, Snap4City.org provides access to a Jupyter HUB from which it is possible to develop the Python coded DAP, exploiting ASCAPI, and send them on MLOps Support, performed in ClearML, to exploit a number of clusters in CPU/GPU with many kinds of NVIDIA boards: H100, VG 100, RTX 4090, RTX 3090, Titan XP, etc.

MLOps is realized by using ClearML, which has as main features:

  • Experiment Tracking: Provides advanced features for experiment tracking, including automatic logging of metrics, output, source code, and the execution environment. This ensures that each experiment is reproducible, and its results are easily shareable and comparable.
  • Data and Model Management: Provides tools for efficient management of datasets and models, allowing for easy versioning, archiving, and sharing. Users can track model versions and easily associate them with corresponding experiments.
  • Integration and Compatibility: ClearML is designed to integrate with existing development environments and tools, such as Jupyter Notebooks, TensorFlow, PyTorch, and many others, thus supporting a wide variety of workflows and technology stacks.
  • User Interface and MLOps Dashboard offers an intuitive dashboard that allows users to monitor the status of their experiments in real time, view metrics and outputs, and manage resources and execution queues, all from a single interface. Root user of ClearML has the possibility of observing the activities of all the users/developers.
  • Automation and Orchestration: It allows the remote execution of experiments on any machine and distributes the tasks to be executed according to a system of queues and priorities. Also automating Hyper-parametrization via Optuna.

Please note that, the development is performed on Jupyter Hub in the personal space of the developer by enforcing a specific connection with the ClearML server (with the specific account of the Developer in the ClearML environment) by using specific credentials and code calls for data, and processes as described in https://www.snap4city.org/download/video/Snap4City-MLOps-Manual.pdf . In the Snap4City.org version, for security reasons, only specific Jupiter Hubs can exploit the connection with ClearML, they are typically under progressive backup on cloud, while versioning is provided with SVN support. In principle any Python development environment could exploit such as connection, while open to all would not be safe enough.

These aspects are described in the rest of this document.

Execution on production (for ML/AI also called Inference phase)

The Execution on production has to guarantee support for:

  • Security of data and DAP solution access, permitted only to A&A users. Also in this case, the developer, working on Jupyter Hub, can send the code to the MLOps only by using its specific credentials and IDs.

  • Scalability of the DAP solution, in terms of multiple users requesting the same computation at the same time,
  • multiple requests of the same time working on different spatial area, such difference cities, KB, maps, graphs, time series, etc.
  • monitoring the resource consumption in the terms of memory, storage, and CPU/GPU clocks/percentage. Eventual early warning and alarms sent to administrator. Possible the accounting of resource consumed.
  • Eventual block and removal of strange / non desired processes.

The Execution on production is enabled by creating DAPs (with a modality described in the rest of this document) which can be called via some APIs (provided, made accessible) according to TWO Modalities:

  • Enqueue: to call the API of a DAP which is created as a task and executed at every API call by the MLOps according to the list of requests. The DAP is allocated automatically on some server as temporary container and process (NVIDIA / GPU, clusters or classic CPU clusters) by the MLOps manager just for the single execution.
    • This means that each DAP Execution includes the loading time, and that the DAP does not remain in memory, and the memory of the servers (CPU/GPU) are not permanently booked for that DAP.
    • This approach is suitable for DAPs which are executed sporadically, and/or periodically for which the overhead time to put them in execution is acceptable with the respect to the time for computing and delivery of the response, and the period of execution.
  • OnDemand: to call the API of a DAP which is created as a task into a container and load statically on the server (NVIDIA / GPU, clusters or classic CPU clusters).
    • This means that at the first execution the time to load will be evident and may be relevant.
    • This means that, once loaded, the DAP is ready to respond to the API call since is statically (permanently) allocated on the execution server, occupying memory (of CPU mem, GPU video mem) and not the actual CPU/GPU, until is not called (wake up) via API.
    • This modality is particularly suitable to exploit DAPs which need a relevant time to be loaded and put in execution, thus making the usage of the Enqueue modality not viable. For example, the usage of a LLModel needing 24 Gbyte would need lot of loading time, with respect to its single execution time by using the OnDemand modality only in a few seconds. So that in this case, the Enqueue solution is not suggested.
  • On both these modalities, Snap4City.org provides access to exploit a number of clusters of services and single servers in CPU/GPU with many kinds of NVIDIA boards: H100, VG 100, RTX 4090, RTX 3090, Titan XP, etc.

The Snap4City platforms with MLOps support, may expose APIs of the DAP in the two modalities of Enqueue and OnDemand which can be called in authenticated manner via API as well as via IoT App/Proc.Logic nodes, as reported on the right side.

The two nodes are accessible as a separate Node-RED library of Snap4City microservices: https://flows.nodered.org/node/node-red-contrib-snap4city-clearml which can be installed on any IoT App/Proc.Logic on cloud and on Edge.

Please note that, the DAPs accessible via Enqueue or OnDemand modalities can be called from external services as well. For security reasons they can be called only by using the current Access Token of the section for the user. This allows to access at the DAPs from CSBL and any Web Application which is developed according to the Snap4City Development model and CSBL approach on Dashboards and views. This approach allows to implement much smarter and dynamic business intelligence tools and smart applications.

Read more on: https://www.snap4city.org/download/video/course/p4/

If you are interested to develop ML/AI processes with or without MLOops support, there is Python library which can be obtained only via subscription please contact snap4city@disit.org