DEVELOPMENT... OpenML
Data
us_crime

us_crime

active ARFF Publicly available Visibility: public Uploaded 18-11-2020 by Arnold
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Ignores community name.Author: Source: Unknown - 2009 Please cite: Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR. Data Set Characteristics: Multivariate Attribute Characteristics: Real Associated Tasks: Regression Number of Instances: 1994 Number of Attributes: 128 Missing Values? Yes Area: Social Date Donated: 2009-07-13 Source: Creator: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- culled from 1990 US Census, 1995 US FBI Uniform Crime Report, 1990 US Law Enforcement Management and Administrative Statistics Survey, available from ICPSR at U of Michigan. -- Donor: Michael Redmond (redmond 'at' lasalle.edu); Computer Science; La Salle University; Philadelphia, PA, 19141, USA -- Date: July 2009 Data Set Information: Many variables are included so that algorithms that select or learn weights for attributes could be tested. However, clearly unrelated attributes were not included; attributes were picked if there was any plausible connection to crime (N=122), plus the attribute to be predicted (Per Capita Violent Crimes). The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault. There was apparently some controversy in some states concerning the counting of rapes. These resulted in missing values for rape, which resulted in incorrect values for per capita violent crime. These cities are not included in the dataset. Many of these omitted communities were from the midwestern USA. Data is described below based on original values. All numeric data was normalized into the decimal range 0.00-1.00 using an Unsupervised, equal-interval binning method. Attributes retain their distribution and skew (hence for example the population attribute has a mean value of 0.06 because most communities are small). E.g. An attribute described as 'mean people per household' is actually the normalized (0-1) version of that value. The normalization preserves rough ratios of values WITHIN an attribute (e.g. double the value for double the population within the available precision - except for extreme values (all values more than 3 SD above the mean are normalized to 1.00; all values more than 3 SD below the mean are nromalized to 0.00)). However, the normalization does not preserve relationships between values BETWEEN attributes (e.g. it would not be meaningful to compare the value for whitePerCap with the value for blackPerCap for a community) A limitation was that the LEMAS survey was of the police departments with at least 100 officers, plus a random sample of smaller departments. For our purposes, communities not found in both census and crime datasets were omitted. Many communities are missing LEMAS data.

127 features

ViolentCrimesPerPop (target)numeric98 unique values
0 missing
statenumeric46 unique values
0 missing
countynumeric108 unique values
1174 missing
communitynumeric799 unique values
1177 missing
communityname (ignore)string1828 unique values
0 missing
foldnumeric10 unique values
0 missing
populationnumeric66 unique values
0 missing
householdsizenumeric93 unique values
0 missing
racepctblacknumeric100 unique values
0 missing
racePctWhitenumeric99 unique values
0 missing
racePctAsiannumeric91 unique values
0 missing
racePctHispnumeric91 unique values
0 missing
agePct12t21numeric93 unique values
0 missing
agePct12t29numeric89 unique values
0 missing
agePct16t24numeric94 unique values
0 missing
agePct65upnumeric98 unique values
0 missing
numbUrbannumeric67 unique values
0 missing
pctUrbannumeric64 unique values
0 missing
medIncomenumeric99 unique values
0 missing
pctWWagenumeric96 unique values
0 missing
pctWFarmSelfnumeric99 unique values
0 missing
pctWInvIncnumeric96 unique values
0 missing
pctWSocSecnumeric96 unique values
0 missing
pctWPubAsstnumeric101 unique values
0 missing
pctWRetirenumeric93 unique values
0 missing
medFamIncnumeric98 unique values
0 missing
perCapIncnumeric98 unique values
0 missing
whitePerCapnumeric101 unique values
0 missing
blackPerCapnumeric91 unique values
0 missing
indianPerCapnumeric86 unique values
0 missing
AsianPerCapnumeric98 unique values
0 missing
OtherPerCapnumeric97 unique values
1 missing
HispPerCapnumeric94 unique values
0 missing
NumUnderPovnumeric66 unique values
0 missing
PctPopUnderPovnumeric100 unique values
0 missing
PctLess9thGradenumeric97 unique values
0 missing
PctNotHSGradnumeric99 unique values
0 missing
PctBSorMorenumeric96 unique values
0 missing
PctUnemployednumeric98 unique values
0 missing
PctEmploynumeric96 unique values
0 missing
PctEmplManunumeric100 unique values
0 missing
PctEmplProfServnumeric96 unique values
0 missing
PctOccupManunumeric98 unique values
0 missing
PctOccupMgmtProfnumeric99 unique values
0 missing
MalePctDivorcenumeric98 unique values
0 missing
MalePctNevMarrnumeric96 unique values
0 missing
FemalePctDivnumeric91 unique values
0 missing
TotalPctDivnumeric94 unique values
0 missing
PersPerFamnumeric92 unique values
0 missing
PctFam2Parnumeric101 unique values
0 missing
PctKids2Parnumeric97 unique values
0 missing
PctYoungKids2Parnumeric99 unique values
0 missing
PctTeen2Parnumeric96 unique values
0 missing
PctWorkMomYoungKidsnumeric95 unique values
0 missing
PctWorkMomnumeric98 unique values
0 missing
NumIllegnumeric55 unique values
0 missing
PctIllegnumeric97 unique values
0 missing
NumImmignumeric47 unique values
0 missing
PctImmigRecentnumeric99 unique values
0 missing
PctImmigRec5numeric100 unique values
0 missing
PctImmigRec8numeric97 unique values
0 missing
PctImmigRec10numeric97 unique values
0 missing
PctRecentImmignumeric95 unique values
0 missing
PctRecImmig5numeric97 unique values
0 missing
PctRecImmig8numeric98 unique values
0 missing
PctRecImmig10numeric100 unique values
0 missing
PctSpeakEnglOnlynumeric98 unique values
0 missing
PctNotSpeakEnglWellnumeric94 unique values
0 missing
PctLargHouseFamnumeric99 unique values
0 missing
PctLargHouseOccupnumeric96 unique values
0 missing
PersPerOccupHousnumeric96 unique values
0 missing
PersPerOwnOccHousnumeric94 unique values
0 missing
PersPerRentOccHousnumeric98 unique values
0 missing
PctPersOwnOccupnumeric100 unique values
0 missing
PctPersDenseHousnumeric94 unique values
0 missing
PctHousLess3BRnumeric100 unique values
0 missing
MedNumBRnumeric3 unique values
0 missing
HousVacantnumeric70 unique values
0 missing
PctHousOccupnumeric92 unique values
0 missing
PctHousOwnOccnumeric99 unique values
0 missing
PctVacantBoardednumeric97 unique values
0 missing
PctVacMore6Mosnumeric98 unique values
0 missing
MedYrHousBuiltnumeric49 unique values
0 missing
PctHousNoPhonenumeric99 unique values
0 missing
PctWOFullPlumbnumeric91 unique values
0 missing
OwnOccLowQuartnumeric99 unique values
0 missing
OwnOccMedValnumeric100 unique values
0 missing
OwnOccHiQuartnumeric98 unique values
0 missing
RentLowQnumeric101 unique values
0 missing
RentMediannumeric99 unique values
0 missing
RentHighQnumeric99 unique values
0 missing
MedRentnumeric100 unique values
0 missing
MedRentPctHousIncnumeric95 unique values
0 missing
MedOwnCostPctIncnumeric97 unique values
0 missing
MedOwnCostPctIncNoMtgnumeric70 unique values
0 missing
NumInSheltersnumeric54 unique values
0 missing
NumStreetnumeric53 unique values
0 missing
PctForeignBornnumeric96 unique values
0 missing
PctBornSameStatenumeric99 unique values
0 missing
PctSameHouse85numeric99 unique values
0 missing
PctSameCity85numeric100 unique values
0 missing
PctSameState85numeric97 unique values
0 missing
LemasSwornFTnumeric38 unique values
1675 missing
LemasSwFTPerPopnumeric52 unique values
1675 missing
LemasSwFTFieldOpsnumeric34 unique values
1675 missing
LemasSwFTFieldPerPopnumeric55 unique values
1675 missing
LemasTotalReqnumeric44 unique values
1675 missing
LemasTotReqPerPopnumeric59 unique values
1675 missing
PolicReqPerOfficnumeric75 unique values
1675 missing
PolicPerPopnumeric52 unique values
1675 missing
RacialMatchCommPolnumeric76 unique values
1675 missing
PctPolicWhitenumeric74 unique values
1675 missing
PctPolicBlacknumeric73 unique values
1675 missing
PctPolicHispnumeric54 unique values
1675 missing
PctPolicAsiannumeric50 unique values
1675 missing
PctPolicMinornumeric72 unique values
1675 missing
OfficAssgnDrugUnitsnumeric30 unique values
1675 missing
NumKindsDrugsSeiznumeric15 unique values
1675 missing
PolicAveOTWorkednumeric77 unique values
1675 missing
LandAreanumeric61 unique values
0 missing
PopDensnumeric96 unique values
0 missing
PctUsePubTransnumeric98 unique values
0 missing
PolicCarsnumeric63 unique values
1675 missing
PolicOperBudgnumeric38 unique values
1675 missing
LemasPctPolicOnPatrnumeric72 unique values
1675 missing
LemasGangUnitDeploynumeric3 unique values
1675 missing
LemasPctOfficDrugUnnumeric80 unique values
0 missing
PolicBudgPerPopnumeric51 unique values
1675 missing

19 properties

1994
Number of instances (rows) of the dataset.
127
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
39202
Number of missing values in the dataset.
1871
Number of instances with at least one value missing.
127
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of nominal attributes.
0.76
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
15.48
Percentage of missing values.
93.83
Percentage of instances having missing values.
0
Percentage of binary attributes.
0
Number of binary attributes.
Number of instances belonging to the least frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the most frequent class.
0.06
Number of attributes divided by the number of instances.

1 tasks

3 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: ViolentCrimesPerPop
Define a new task