Parquet Viewer

Click to upload or drag and drop

Accepts parquet

View, filter and sort Parquet data in seconds.

View and filter Parquet data instantly
Create visualizations with AI
Handle large datasets with ease

Trusted by over 30,000 users every month

Amazon logo
Snowflake logo
Bytedance logo
Paytm logo
Salesforce logo

View and filter Parquet files online

This Parquet viewer enables you to upload and view Parquet files online. It is optimized for big data handling and provides a simple interface for browsing large-scale datasets.

This tool supports massive Parquet files and ensures smooth performance without the need for complex software installations.

Upload your Parquet file and access your data instantly.

Easily view large-scale Parquet files online in seconds.

Optimized for big data handling with fast processing.

No need for complex software installations.

Simple and intuitive interface for data browsing.

Supports massive datasets with smooth performance.

Perfect for professionals working with Parquet files.

Parquet format

Apache Parquet (.parquet) is a format that was designed for storing tabular data on disk. It was designed based on the format used in Google's Dremel paper (Dremel later became Big Query).

Parquet files store data in a binary format, which means that they can be efficiently read by computers but are difficult for people to read.

Parquet files have a schema, so means that every value in a column must have the same type. The schema makes Parquet files easier to analyse than CSV files and also helps them to have better compression so they are smaller on disk.

How to view and filter Parquet files online

  1. Upload your Parquet file
  2. Your file will be loaded and then you can view your Parquet data
  3. Sort data by clicking a column name
  4. Filter a column by clicking the three dots
  5. Export your Parquet file in CSV or Excel format by clicking the export button

How to view and filter Parquet files in Python with Pandas

First, we need to install pandas

pip install pandas

Then we can load the Parquet file into a dataframe.

df = pd.read_parquet('path/to/file.parquet')

We can view the first few rows of the dataframe using the head method.

print(df.head(n=5))

The n parameter controls how many rows are returned. Increase it to show more rows.

We can view the last few rows of the dataframe using the tail method.

print(df.tail(n=5))

We can sort the dataframe using the sort_values method.

df = df.sort_values('column_name', ascending=true)

Just replace 'column_name' with the name of the column you want to sort by. The 'ascending' parameter controls whether the values will be sorted in 'ascending' or 'descending' order.

We can filter the dataframe using comparison operators. The following statement will filter a dataframe to rows where the value of the 'column_name' column is greater than 5.

df = df[df['column_name'] > 5]

How to view and filter Parquet files in Python with DuckDB

First, we need to install duckdb for Python

pip install duckdb

The following duckdb query will create a view from the input Parquet file.

duckdb.sql("""SELECT * from path/to/file.parquet""")

Sometimes we have large file and it's impractical to read the whole file. We can read the first 5 rows using the following.

duckdb.sql("""SELECT * from path/to/file.parquet limit 5""")

We can sort rows using the ORDER BY clause and a SQL comparison operator.

duckdb.sql("""SELECT * from path/to/file.parquet order by 'column_name' ASC limit 5""")

Just change 'column_name' for the column you want to sort by. Use ASC to sort ascending or DESC to sort descending.

We can also filter using SQL comparison operators and the WHERE clause.

duckdb.sql("""SELECT * from path/to/file.parquet where 'column_name' > 5 ASC limit 5""")

You can change the 'column_name' to change the column you want to filter by. The operator (>) and value (5) control how the filtering is applied to 'column_name'.

MT cars
Motor Trends Car Road Tests dataset.
Rows: 32
Flights 1m
1 Million flights including arrival and departure delays.
Rows: 1,000,000
Iris
Iris plant species data set.
Rows: 50
House price
Housing price dataset.
Rows: 545
Weather
Weather dataset with temperature, rainfall, sunshine and wind measurements.
Rows: 366