DEVELOPMENT... OpenML
Data
2018-Airplane-Flights

2018-Airplane-Flights

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Mark Murphy
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset Description Story View the ReadMe file in my Github repo for this project. Check out all the info on my portfolio's webpage for this project. As I write this, I'm a Data Science student. To add to my portfolio, I wanted to build a web app to predict the prices of airline flight prices: the user would be able to select an origin and a destination. I found a database from The Bureau of Transportation Statistics. I downloaded their data from Q1, Q2, Q3, and Q4 from 2018 a total of 27M+ rows and 42 columns. For my pricing prediction purposes, I eliminated unnecessary columns, renamed some columns, and refined it for consistency bringing it to a new total of 9M+ rows and 13 columns. Have fun and share your kernels, please! Column Descriptions 1. Unnamed: drop this column (it's a duplicate index column) 2-3. ItinID MktID: vaguely demonstrates the order in which tickets were ordered (lower ID 's being ordered first) 4. MktCoupons: the number of coupons in the market for that flight 5. Quarter: 1, 2, 3, or 4, all of which are in 2018 6. Origin: the city out of which the flight begins 7. OriginWac: USA State/Territory World Area Code 8. Dest: the city out of which the flight begins 9. DestWac: USA State/Territory World Area Code 10. Miles: the number of miles traveled 11. ContiguousUSA: binary column -- (2) meaning flight is in the contiguous (48) USA states, and (1) meaning it is not (ie: Hawaii, Alaska, off-shore territories) 12. NumTicketsOrdered: number of tickets that were purchased by the user 13. Airline Company: the two-letter airline company code that the user used from start to finish (key codes below) 14. PricePerTicket: target prediction column Airline Company Codes (in order of frequency for this dataset) WN -- Southwest Airlines Co. DL -- Delta Air Lines Inc. AA -- American Airlines Inc. UA -- United Air Lines Inc. B6 -- JetBlue Airways AS -- Alaska Airlines Inc. NK -- Spirit Air Lines G4 -- Allegiant Air F9 -- Frontier Airlines Inc. HA -- Hawaiian Airlines Inc. SY -- Sun Country Airlines d/b/a MN Airlines VX -- Virgin America USA State/Territory World Area Codes 1 Alaska 2 Hawaii 3 Puerto Rico 4 U.S. Virgin Islands 5 U.S. Pacific Trust Territories and Possessions 11 Connecticut 12 Maine 13 Massachusetts 14 New Hampshire 15 Rhode Island 16 Vermont 21 New Jersey 22 New York 23 Pennsylvania 31 Delaware 32 District of Columbia 33 Florida 34 Georgia 35 Maryland 36 North Carolina 37 South Carolina 38 Virginia 39 West Virginia 41 Illinois 42 Indiana 43 Michigan 44 Ohio 45 Wisconsin 51 Alabama 52 Kentucky 53 Mississippi 54 Tennessee 61 Iowa 62 Kansas 63 Minnesota 64 Missouri 65 Nebraska 66 North Dakota 67 South Dakota 71 Arkansas 72 Louisiana 73 Oklahoma 74 Texas 81 Arizona 82 Colorado 83 Idaho 84 Montana 85 Nevada 86 New Mexico 87 Utah 88 Wyoming 91 California 92 Oregon 93 Washington

14 features

Unnamed:_0numeric9534417 unique values
0 missing
ItinIDnumeric6201347 unique values
0 missing
MktIDnumeric9534417 unique values
0 missing
MktCouponsnumeric3 unique values
0 missing
Quarternumeric4 unique values
0 missing
Originstring263 unique values
0 missing
OriginWacnumeric52 unique values
0 missing
Deststring260 unique values
0 missing
DestWacnumeric52 unique values
0 missing
Milesnumeric2117 unique values
0 missing
ContiguousUSAnumeric2 unique values
0 missing
NumTicketsOrderednumeric20 unique values
0 missing
AirlineCompanystring12 unique values
0 missing
PricePerTicketnumeric71834 unique values
0 missing

19 properties

9534417
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
11
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of nominal attributes.
Average class difference between consecutive instances.
78.57
Percentage of numeric attributes.
0
Percentage of missing values.
0
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
Number of instances belonging to the least frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the most frequent class.
0
Number of attributes divided by the number of instances.

0 tasks

Define a new task