Wednesday, April 23, 2014

Steps to load New York City Property Tax Data into BigQuery

These are the steps to load New York City Property Tax Data into BigQuery


Update2: here's a direct link to the New York City Department of Finance FY 2015 Tentative Assessment Roll (FY15 Database Files by Tax Class (revised 1/15/14)) 
https://bigquery.cloud.google.com/table/personal-real-estate:acris_nyc.nyc_dof_tc_Tentative_Assessment_Roll
You will need a google account and a project with BigQuery enabled. See https://developers.google.com/bigquery/sign-up
The cost is very low, so far with millions of records loaded and 100's and 100's of queries the cost has been 1 cent.  Another very cool thing is you can make your data public to all authenticated users and the users accessing your data do so at no cost to you.

Update: Many Thanks to +Felipe Hoffa for pointing out that I can make the New York City Department of Finance FY 2015 Tentative Assessment Roll (FY15 Database Files by Tax Class (revised 1/15/14)) public.  So that is what I did.  You can sign in to BigQuery and run queries such as:
(Query to show for market value greater than or equal to 10 Million what is the sum of assessment values / sum of market values:)
    • SELECT '>=10000000',count(*),sum(fn_avt)/sum(new_fv_t) FROM [personal-real-estate:acris_nyc.nyc_dof_tc_Tentative_Assessment_Roll] where txcl='1' and new_fv_t >= 10000000
    • Answer at this point in time is 3%
    • While for < 10000000 it is 4.6%
    • A very regressive tax.  The lower market home pay a much larger tax as a percentage of market value.






This example focuses on Class 1 property but applies to all classes.

  • Download Data: 
    • wget http://www.nyc.gov/html/dof/downloads/tar/tc1_15.zip

  • This contains a mdb (microsoft access file)  (not exactly an open standard)
  • If you're using unix, you can install mdb-tables and mdb-export
  • Extract Data to CSV format:
    • mdb-tables TC1.mdb
    • mdb-export TC1.mdb tc1 >tc1.csv

  • Load into BigQuery:
    • bq load --project YOUR_PROJECT_NAME YOUR_DATASET.nyc_dof_tc_Tentative_Assessment_Roll tc1.csv EX_COUNT:float,CUREXT_A:float,TN_EXL:float,DELCHG:string,BORO:integer,EXT:string,TN_EXT:float,NOAV:integer,FN_EXT_A:float,FN_EXL_A:float,APPLIC2:string,CUREXL_A:float,CONDO_S3:string,CONDO_S2:string,CONDO_S1:string,YRB_FLAG:string,PROTEST:string,TN_AVL_A:float,AP_TIME:float,COMINT_B:string,ZIP:float,NEW_FV_L:float,NEWLOT:integer,BLDGS:string,COMINT_L:string,NEW_FV_T:float,O_AT_GRP:float,NODESC:integer,YRA2_RNG:float,COOP_NUM:float,BLD_VAR:string,REUC:string,YRA2:float,YRA1:float,DROPLOT:integer,LIMIT:string,CP_DIST:string,CUR_FV_L:float,TXCL:string,CURAVL:float,L_ACRE:string,O_LIMIT:string,CUR_FV_T:float,BLOCK:integer,CURAVT:float,DCHGDT:timestamp,TN_EXT_A:float,AP_BLOCK:float,APPLIC:string,CORCHG:integer,HNUM_LO:string,AT_GRP:float,CHGDT:timestamp,TN_EXL_A:float,APTNO:string,EX_CHGDT:timestamp,GEO_RC:integer,FCHGDT:timestamp,STATUS1:integer,DISTRICT:string,STATUS2:integer,TN_AVT_A:float,FN_AVT:float,VALREF:string,CONDO_NM:float,GR_SQFT:float,CUREXT:float,O_TXCL:string,CORNER:string,RES_UNIT:float,CUREXL:float,EASE:string,CP_BORO:string,YEAR4:float,FN_AVL_A:float,CONDO_A:string,BLDGCL:string,MBLDG:string,STR_NAME:string,YRB:float,FV_CHGDT:timestamp,FN_EXT:float,AP_LOT:float,BFRT_DEC:float,AT_GRP2:float,FN_EXL:float,IRREG:string,TN_AVT:float,AP_BORO:string,FN_AVL:float,LND_AREA:float,O_PROTST:string,OWNER:string,YRB_RNG:float,SM_CHGDT:timestamp,BBLE:string,SECVOL:float,BDEP_DEC:float,TN_AVL:float,O_APPLIC:string,LOT:integer,CURAVL_A:float,LDEP_DEC:float,YRA1_RNG:float,CURAVT_A:float,PROTEST2:string,FN_AVT_A:float,STORY:float,ZONING:string,HNUM_HI:string,EX_INDS:string,LFRT_DEC:float,CBN_TXCL:string,AP_DATE:timestamp,TOT_UNIT:float,EXMTCL:string,AP_EASE:string

  • A sample query would be:
    • for market value greater than or equal to 10 Million what is the sum of assessment values / sum of market values?
    • SELECT '>=10000000',count(*),sum(fn_avt)/sum(new_fv_t) FROM [YOUR_DATASET.nyc_dof_tc_Tentative_Assessment_Roll] where txcl='1' and new_fv_t >= 10000000