Amazon Web Services has already taken over the lion’s share of the cloud infrastructure service providers’ market. In a recent blog post, AWS announced the general availability of Amazon Managed Workflows for Apache Airflow (MWAA). As the volume of data being processed by organizations is increasing, the introduction of MWAA will allow enterprises to divide the workload into smaller tasks and complete the tasks as a workflow.
Apache Airflow is an already established standalone open-source system used largely by developers and customers to schedule, author, and monitor sequences of tasks called “workflows”. MWAA is an orchestration service that allows customers to use the same familiar Apache Airflow to manage their workflows through the user interface and benefit from enhanced scalability and security without having to build a full-fledged structure to manage the same.
Managed Airflow also allows the customers to add powerful plugins and enhance their functionality. It also allows the customers to deploy Airflow quickly and easily through the AWS Management Console without developing resources or infrastructure. Airflow also uses inputs from multiple sources including Amazon storage services and trains machine learning models from the fetched data.
MWAA also comes with in-built security by default as the workloads only run in the enterprises’ own private cloud. AWS also provides Identity and Access Management (IAM) to control authorization to Airflow’s UI by providing users with Single Sign-On (SSO) access. Apache Airflow has to run continuously as it is a standalone system. With its integration to the AWS Management Console, the enterprises’ operational cost can be kept in check and increased according to the orchestration’s monitoring needs.
Amazon MWAA also automatically sends system statistics and logs to CloudWatch in a single location to easily pinpoint delays or errors removing the need for third-party tools.
Danilo Poccia, Chief Evangelist (EMEA) at AWS, spoke about MWAA and said, “as the volume and complexity of your data processing pipelines increase, you can simplify the overall process by decomposing it into a series of smaller tasks and coordinate the execution of these tasks as part of a workflow. To do so, many developers and data engineers use Apache Airflow, a platform created by the community to programmatically author, schedule, and monitor workflows. With Airflow you can manage workflows as scripts, monitor them via the user interface (UI), and extend their functionality through a set of powerful plugins.”