Ride Sharing Dataset Overview
The public Ride Sharing dataset is a great way for you to get hands-on experience working with Varada to accelerate your data analytics workloads.
The dataset was is inspired by a popular ride-sharing app. It includes many trip attributes, such as the trip duration, distance, and fare.
Dataset Tables
The Ride Sharing dataset includes two tables:
-
trips_data: A fact table of 4.4B rows, in which each row represents key locations on a trip route. One trip can contain multiple rows. This table includes the trip details, such as the trip route, trip duration, driver ID, driver details, and fare. The table is partitioned by the d_date column.
-
riders_campaign: A dimension table containing 1501 rows of marketing campaign details, such as specific segments of the ride-sharing application user base targeted in each marketing campaign.
The data for both tables is stored in an S3 bucket as parquet files with snappy compression.
Trips Data Table Schema
The trips_data table schema is:
Column Name | Column Type | Descripion | Sample Value |
---|---|---|---|
tripid | Integer | The unique identifier for the trip. | 33820 |
t_hour | Integer | The start hour of the trip. | 23 |
t_min | Integer | The start minute of the trip. | 38 |
d_weekday | Integer | The day of the week of the trip. | 3 |
t_start_ts | Timestamp | The start timestamp of the trip. | 2018-01-04 23:59:38 |
fare | Double | The fare for the trip. | 16.3217 |
src_zone | Integer | The source zone for the trip. | 3431 |
dst_zone | Integer | The destination zone for the trip. | 4304 |
duration | Double | The duration of the trip. | 1117.33 |
distance | Double | The distance of the trip. | 5.61613655 |
rider_id | Integer | The identifier of the rider. | 1043298 |
rider_age | Integer | The age of the rider. | 24 |
rider_first | String | The first name of the rider. | Fabian |
rider_last | String | The last name of the rider. | Pearce |
rider_gender | String | The gender of the rider. | M |
driver_id | Integer | The identifier of the driver. | 31100 |
driver_age | Integer | The age of the driver. | 60 |
driver_first | String | The first name of the driver. | Jeremy |
driver_last | String | The last name of the driver. | COX |
driver_gender | String | The gender of the driver. | M |
ts | Timestamp | The timestamp of the point in the trip course. | 2018-01-05 0:17:36 |
lon | Double | The longitude of the point in the trip course. | -122.4427719 |
lat | Double | The latitude of the point in the trip course. | 37.74106216 |
point_id | Integer | The identifier for the point in the trip course. | 265 |
last_point | Integer | An indicator of the last point in the trip course. | 0 |
d_date | Date | The date of the trip. Partition column | 2018-01-04 |
Riders Campaign Table Schema
The riders_campaign table schema is:
Column Name | Column Type | Description | Sample Value |
---|---|---|---|
rider_id | Integer | The identifier of the rider. | 4923013 |
num_trips | Integer | The number of trips the rider took as part of the campaign. | 8 |
segment | String | The name of the campaign. | Churning_Riders_Last3Months |
Updated 7 months ago