Amazon created a lot of hype around the release of AWS Athena, but is it worth that hype?
To start with Athena is an ANSI-standard query tool that works with “big-data” stored in Amazon S3. The combination of AWS Athena and Amazon S3 can deliver results quickly by tapping the power of advance data warehousing systems. If you usually prefer relational databases like MySQL or traditional data warehouses, Athena could be an option for the analytics workloads that you would like to take on.
It is an open-source technology based on Facebook Presto or PrestoDB software. In light of this lineage, Athena offers teams a serverless front-end SQL query engine to an AWS S3 data lake for an ETL or ELT process.
Since it is a serverless architecture, there is no infrastructure to manage. It also means that you only pay for the queries that you run, meaning lesser costs.
Is that it?… Well, no.
What else should you know about AWS Athena?
Before taking the leap, here are eight things to keep in mind.
1. Schema and Table Definitions
To be able to query data with Athena, you will need to make sure that you have S3-based data. You will need to create a database and tables with S3 data. When creating schemas for S3 data, positional order is vital.
If you don’t account for the position, you’ll end up with something that doesn’t match your expectations.
Once you complete the process, all databases and tables will automatically be stored within the system and will be accessible through JDBC and ODBC.
Athena also supports the creation done on VIEWS.
2. Data Formats
The Athena service works with many different data formats. This includes ORC, JSON, CSV, and Parquet. Amazon is proposing to convert data to columnar storage formats using Apache Parquet.
Make sure your team is aware of this optimization process. The use of compressed and columnar formats can also reduce the cost of query and storage while helping to further improve performance.
Amazon also recommends partitioning data to reduce the amount of data that the query needs to scan. It will improve efficiency and reduce the costs of queries.
3. Speed and Performance
AWS Athena allows to run queries on S3 data quickly and easily, without having to set up servers, identify clusters, or do any of the housekeeping that other database systems need.
- Athena uses compute resources in multiple and separate Availability Zones
- Amazon also offers redundant data storage, so the service provides speed, reliability, and availability.
Read More: AWS Amplify – All you need to know
4. Supported Functions
Athena uses the open-source software PrestoDB as its SQL query engine. Users can use ANSI-standard SQL in this tool and interact directly through Amazon S3 data. It includes standard SQL queries like SELECT and relational operators like JOIN.
At this time, Athena only supports Hive DDL to build, change, and delete table or partition.
Here is the full list of functions for Facebook Presto.
5. Integration With Leading BI Tools
Amazon promotes Athena as a process of producing result sets with SQL queries. However, you can use the data with other business intelligence tools such as Amazon QuickSight for reporting and analysis. The services has a JDBC driver, which can be used to interface other business intelligence software. Now you can even use Microsoft BI with AWS Athena.
6. Athena’s Security
Amazon provides three methods to control access to AWS Athena:
- AWS Identity and Access Management policies
- Access Control Lists
- Amazon S3 bucket policies
Users who can access data on S3 are in control. It is also possible to fine-tune security by allowing different people to see different data sets and giving access to data from other users. You can further enhance security by limiting access to data with the help of tools like Tableau and Power BI.
7. Amazon Athena Price And Cost Considerations
The price of the Athena is a little different as compared to other services such as Amazon Redshift. Users only have to pay for the amount of data that is scanned through queries they run. The results which get stored in S3 may also incur some storage charges. This explains the pricing in brief:
- Pricing for AWS Athena is set at $5 for every TB of data scanned.
- Queries are rounded to the nearest MB, with a minimum of 10 MB.
- Users need to pay for storing data in S3 at its regular rates.
To keep charges low, Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old result sets.
AWS has a very simple and easy to use interface. The menu design is an easy way to navigate through primary four tabs: Query Editor, Saved Queries, History, and Catalog Manager. If you have any experience of running queries in SQL, you will not need any specific training to use this tool.
Should you take Amazon Athena into account?
The pay-for-usage pricing model might attract analysts who believed the power of this kind for a querying system was out of their budget, or needs complex systems and DevOps support. The user interface is easy to use, and it would be intuitive for anybody with a basic knowledge of SQL.
Companies that depend on S3 and require a quick but reliable query service might find Athena an ideal solution. It is primarily for those companies that prefer to not set up their infrastructure or want the simplicity of using Athena for spot or ad-hoc analysis.
AWS Athena is continuously integrating sophisticated BI tools that are capable of producing reports and visualizations. If you are looking for an alternative to AWS Athena, then you can use Facebook Presto, the tool that is the foundation for Athena.