Ride Sharing Dataset Overview

The public Ride Sharing dataset is a great way for you to get hands-on experience working with Varada to accelerate your data analytics workloads.

The dataset was is inspired by a popular ride-sharing app. It includes many trip attributes, such as the trip duration, distance, and fare.

Dataset Tables

The Ride Sharing dataset includes two tables:

  • trips_data: A fact table of 4.4B rows, in which each row represents key locations on a trip route. One trip can contain multiple rows. This table includes the trip details, such as the trip route, trip duration, driver ID, driver details, and fare. The table is partitioned by the d_date column.

  • riders_campaign: A dimension table containing 1501 rows of marketing campaign details, such as specific segments of the ride-sharing application user base targeted in each marketing campaign.

The data for both tables is stored in an S3 bucket as parquet files with snappy compression.

Trips Data Table Schema

The trips_data table schema is:

Column NameColumn TypeDescripionSample Value
tripidIntegerThe unique identifier for the trip.33820
t_hourIntegerThe start hour of the trip.23
t_minIntegerThe start minute of the trip.38
d_weekdayIntegerThe day of the week of the trip.3
t_start_tsTimestampThe start timestamp of the trip.2018-01-04 23:59:38
fareDoubleThe fare for the trip.16.3217
src_zoneIntegerThe source zone for the trip.3431
dst_zoneIntegerThe destination zone for the trip.4304
durationDoubleThe duration of the trip.1117.33
distanceDoubleThe distance of the trip.5.61613655
rider_idIntegerThe identifier of the rider.1043298
rider_ageIntegerThe age of the rider.24
rider_firstStringThe first name of the rider.Fabian
rider_lastStringThe last name of the rider.Pearce
rider_genderStringThe gender of the rider.M
driver_idIntegerThe identifier of the driver.31100
driver_ageIntegerThe age of the driver.60
driver_firstStringThe first name of the driver.Jeremy
driver_lastStringThe last name of the driver.COX
driver_genderStringThe gender of the driver.M
tsTimestampThe timestamp of the point in the trip course.2018-01-05 0:17:36
lonDoubleThe longitude of the point in the trip course.-122.4427719
latDoubleThe latitude of the point in the trip course.37.74106216
point_idIntegerThe identifier for the point in the trip course.265
last_pointIntegerAn indicator of the last point in the trip course.0
d_dateDateThe date of the trip. Partition column2018-01-04

Riders Campaign Table Schema

The riders_campaign table schema is:

Column NameColumn TypeDescriptionSample Value
rider_idIntegerThe identifier of the rider.4923013
num_tripsIntegerThe number of trips the rider took as part of the campaign.8
segmentStringThe name of the campaign.Churning_Riders_Last3Months

What’s Next