Amazon created a lot of hype around the release of AWS Athena, is it worth that hype?
Well, its an ANSI-standard query tool that works with “big-data” which is stored in Amazon S3. The combination of AWS Athena and Amazon S3 can deliver results quickly and with the power of advance data warehousing systems. If you usually prefer relational databases like MySQL or traditional data warehouses, Athena could be an option for the analytics workloads that you would like to take on.
It’s an open-source technology based on Facebook Presto or PrestoDB software. In light of this lineage, Athena offers teams a serverless front-end SQL query engine to an AWS S3 data lake for an ETL or ELT process.
Since it is a serverless architecture, hence there is no infrastructure to manage. It also means that you only pay for the queries that you run, thus lessen costs.
Is that it?… Well No,
What else you should you know about AWS Athena?
Before taking the plunge, here are eight things to consider or be aware of.
1. Schema and Table Definitions
To be able to query data with Athena, you will need to make sure that you have S3-based data. You will need to create a database and tables with S3 data. When creating schemas for S3 data, positional order is essential.
If you don’t account for the position, you’ll end up with something that doesn’t match expectations.
Once you complete the process, all databases and tables will automatically be stored within the system and accessible through JDBC and ODBC
Athena also supports the creation done on VIEWS.
2. Data Formats
The Athena service works with many different data formats. They include ORC, JSON, CSV, and Parquet. Amazon is proposing to convert data to columnar storage formats using Apache Parquet.
Make sure your team is aware of this optimization process. The use of compressed and columnar formats can also reduce the cost of query and storage while helping to improve performance further.
Amazon also recommends partitioning data to reduce the amount of data that the query needs to scan. It will improve efficiency and reduce the costs of queries.
3. Speed and Performance
AWS Athena allows to run queries on S3 data quickly and easily without having to set up servers, identify clusters, or do any of the housekeeping that other database systems need.
- Athena uses compute resources in multiple and separate Availability Zones
- Amazon also offers redundant data storage, so the service provides speed, reliability, and availability.
Read More: AWS Amplify – All you need to know
4. Supported Functions
Athena uses the open-source software PrestoDB as its SQL query engine. Users can use ANSI-standard SQL into this tool and interact directly through Amazon s3 data. It includes standard SQL queries like SELECT and relational operators like JOIN.
At this time, Athena only supports Hive DDL to build, change, and delete table or partition.
Here is the full list of functions for Facebook Presto.
5. Integration With Leading BI Tools
Amazon promotes Athena as a process of producing result sets with SQL queries. However, you can use the data with other business intelligence tools for reporting and analysis. Such as Amazon QuickSight. The services have a JDBC driver, which can be used to interface other business intelligence software. Now, you can even use Microsoft BI with AWS Athena.
6. Athena’s Security
Amazon provides three methods to control access to AWS Athena:
- AWS Identity and Access Management policies
- Access Control Lists
- Amazon S3 bucket policies
Users who can access data on S3 are in control. It’s also possible to fine-tune security by allowing different people to see different data sets and also to give access to data from other users. You can further enhance security by limiting access to data with the help of tools like Tableau and Power BI.
7. Amazon Athena Price And Cost Considerations
The price of the Athena is a little different from services such as Amazon Redshift. Users only have to pay for the amount of data that is scan through queries they run. The results which get store in S3 may also incur some storage charges. This explains the pricing in brief:
- Pricing for AWS Athena is set at $5 for every TB of data scanned.
- Queries are rounded to the nearest MB, with a minimum of 10 MB.
- Users need to pay for storing data in S3 at its regular rates.
To keep charges low, Amazon advises users to use compressed data files, have data in columnar formats, and routinely delete old result sets.
AWS has a very simple and easy to use interface. The menu design is an easy way to navigate through primary four tabs: Query Editor, Saved Queries, History, and Catalog Manager. If you had any experience of running queries in SQL, then you will not need any specific training to use this tool.
Should you take Amazon Athena into account?
The pay-for-usage pricing model might attract analysts who believed the power of this kind for a querying system was out of their budget, or it needs complex systems and DevOps support. The user interface is easy to use, and it would be intuitive for anybody with the basic knowledge of SQL.
Companies that depend on S3 and required a quick ut reliable query service might find Athena an ideal solution. It is primarily for those companies that prefer not to set up their infrastructure or want the simplicity of using Athena for spot or ad-hoc analysis.
AWS Athena is continuously integrating the sophisticated BI tools that are capable of producing reports and visualizations. If you are looking for an alternative to AWS Athena, then you can use Facebook Presto on which Athena is based upon.