Blog

Title: GIS Data Services Assistant (2 positions available)
Office: Graduate Services Division, Newman Library, Baruch College CUNY
Hours: 14 per week
Compensation: $21.00 per hour

Position duration: Sept 2018 to May / June 2019
Application deadline: Sept 4 2018, with interviews to occur shortly thereafter

 



Position Description and Duties:

The GIS Data Services Assistant reports to the Geospatial Data Librarian (GDL) and works in the GIS Lab in the Library and Technology Building at Baruch College. The lab is a space where students, faculty, and staff can meet with the GDL to discuss projects and get assistance, and can work independently on GIS or data-related projects with help nearby. The lab also serves as the work area for the GDL's team. For details about our mission and services, visit: https://www.baruch.cuny.edu/confluence/display/geoportal/About+Us

The GIS Data Services Assistant has three primary sets of responsibilities:

  1. Assists the GDL with introductory GIS workshops that are part of the GIS Practicum program (offered two or three times each semester on Fridays); serves as a teaching assistant for each workshop, helping participants in the classroom with exercises. Assists the GDL with administrative and clerical aspects of running the program.

  2. Assists the GDL with helping patrons in the GIS lab, and may provide assistance with administrative aspects of the Graduate Services Division's programs as the need arises.

  3. Under the direction of the GDL, obtains and processes geospatial data and creates metadata and documentation for our GIS data repository, the Baruch Geoportal.

Qualification Requirements:

The successful candidate will be a self-directed, well-organized, and detail-oriented person who is comfortable working with technology and data, and who demonstrates a willingness to take the initiative to solve problems and learn skills that are required for specific projects. The candidate must meet the following requirements:

  • Currently enrolled as a CUNY graduate student, preferably in the social sciences, public policy, computer science, digital humanities, or library science
  • Able to commit to working for the entire 2018-19 academic year from Sept to May (with an option to extend through June)
  • Available to work during normal weekday office hours (between 8am and 6pm). Successful candidate will be able to choose hours that fit with their course schedule, but will be required to work 2 or 3 Fridays each term when the GIS Practicum is offered
  • Has at least six months of full-time experience or one year of part-time experience working in a position that required regular use of technology
  • Possesses high degree of computer literacy that includes file management, word processing, spreadsheets, and web research
  • Has some experience working with datasets of interest to the social sciences or public policy
  • Possesses good written and oral communication skills

The candidate must also have experience in at least one of the following two broad areas:

  1. GIS Experience (geospatial skills)
    1. Required: experience with geographic information systems (GIS) such as ArcGIS, QGIS, or other open source tools
    2. Required: experience with downloading geospatial data, organizing data, geoprocessing, projections and coordinate systems, working with attributes
    3. Preferred: familiarity with spatial and relational databases (SQLite, PostgreSQL), scripting (Python), and spatial metadata (ISO / FGDC)

  2. Metadata experience (library and information skills)
    1. Required: experience with metadata concepts and standards like Dublin Core
    2. Required: metadata implementation and manipulation in XML, HTML, or RDF using XLST or Python
    3. Required: familiarity with geographic information systems and GIS data
    4. Preferred: familiarity with spatial metadata standards ISO 19115, 19139, and FGDC

Candidates with experience in either category will be considered. The successful candidate will undergo training (both directed and self-directed) in the area that is not within their expertise at the beginning of their employment.


To Apply:

Email your resume and cover letter to Frank Donnelly, Geospatial Data Librarian, at francis.donnelly@baruch.cuny.edu.  Please include "Application for Graduate Position" in the subject line of the email, and in your message indicate where you learned about the position.

 

If you are using the NYC Geodatabase with QGIS 3.2 (and possibly 3.0) and are not able to view certain layers (i.e. you drag them into the map view and nothing appears), this is due to some bug with how QGIS reads Spatialite layers that have spatial indexes. The following layers are affected: a_pumas2010, a_tracts, and a_zctas. To get them to display, you can disable the spatial index. You can do this is the Spatialite GUI or the QGIS DB Manager by running this command in the SQL window:

SELECT DisableSpatialIndex('LAYER', 'geometry');

Where LAYER is the name of the layer in quotes, i.e. 'a_tracts'. Run this command on each of the three layers. Then refresh the database or remove the connection and re-establish it, and try adding the layers to the view. You should be back in business.

Alternatively, you could go back to using QGIS 2.18, which is still the long term release and is inherently more stable and hassle free.

Janine's last day working for the GIS lab today - she will be sorely missed! While she and Anastasia are off to new adventures and my academic leave continues during the summer, the lab will be closed in July and August. I'll be back the first week of fall and will be recruiting for a new lab assistant position (possibly two), so look for an announcement here sometime in August.

Janine was pretty busy before she left - here are some updates:

  1. The national IRS Migration Database has been updated with a new year of data. For state to state and county to county flows the newest year is 2015-16.
  2. The NYC Mass Transit Spatial Layers have been updated with new stop and line features for all the buses and the subway. The subway update reflects the re-opening of the South Ferry station and the shut down of the older South Ferry Loop at the southern end of the 1 Line. The source data for the regional trains hasn't changed, so we skipped updates for those.
  3. We have a new IRS Tax Exempt Organizations file for NYC for June 2018, listing all the non-profits in the city.

Some of the other datasets that we usually release around this time of year will be delayed until I return in the fall. This includes: 2017 ridership data for the subway and PATH, NYC real estate sales for 2017, and an updated version of the NYC Geodatabase.

As for the GIS Practicum, I won't be updating the manual this year as QGIS 2.18 will continue to be the long term release (LTR) until the end of October 2018. I plan on running a couple of workshops in the fall based on it, and you can sign up to be notified by email when registration opens. As the final version of the 2.x series 2.18 will continue to be supported for a few more years. I will eventually update the manual to the new LTR 3.4 in 2019, but it probably won't happen until mid to late spring.

We've recently updated several of our datasets:

 

  • NYC Mass Transit Spatial Layers: Janine created updated files for the buses last semester; since there were no changes to the subway and train files we let those go.
  • NYC Geodatabase: I created version jan2018 with updates to the census American Community Survey data tables for PUMAs, ZCTAs, and census tracts. The new tables are from the 2012-2016 ACS.

 

The GIS Lab is now open for business for the spring semester. I'm still away on leave until the end of August, but Janine continues to captain the ship and is in on Thursdays and Fridays. GIS Lab hours for the spring are posted.

So this summer we're taking our show on the road! The Free and Open Source for Geospatial (FOSS4G) conference is "the" international conference for all things related to geospatial technology and open source software. FOSS4G 2017 is in Boston August 14-18 and the Baruch GIS team will be there. We'll be running our full-day introductory GIS Practicum workshop (re-dubbed "Introduction to GIS Using QGIS" for the conference) at the Harvard Center for Geographic Analysis in Cambridge on Tuesday Aug 15th. There are a slew of great workshops being offered that Monday and Tuesday, covering all technologies and user levels. The main conference runs from Wednesday to Friday.

In preparation for the conference and the upcoming academic year, I will be updating the GIS Practicum manual pretty soon. While QGIS 2.18 Las Palmas is currently the latest release, it is scheduled to become the new Long Term Release once version 3.0 comes out later this year. I'm going to make the switch from 2.14 to 2.18 in the next workbook, since this change is on the horizon.

We've just released the latest (our 10th version!) of the NYC Geodatabase, our foundational resource for mapping and analyzing NYC city-level features and data in GIS. We're continuing to offer two formats with identical content - a Spatialite version for QGIS and Spatialite users, and an MS Access personal geodatabase for ArcGIS folks.

There are a number of noteworthy updates in this new version (July 2017):

  • All of the facility point features (colleges, hospitals, libraries, private and public schools) have been updated with new data from May 2017, replacing the data from 2015. The City Dept of Planning has upgraded and reconfigured their Facilities database (FacDB) so there are differences in how the attribute tables are structured.

  • New ZIP Code Business Patterns (ZBP) data has been added for 2015, replacing the data from 2014. As before, we've aggregated the ZIP Code data to the ZCTA level, but this time we've used an updated version of the ZIP to ZCTA crosswalk. This crosswalk contains updates and a number of error corrections in ZIP assignments. Most of the fixes are minor (in NYC there were 19 reassignments; the most significant ones were the reassignment of ZIPs 11249 and 10118 to different ZCTAs), but for the sake of consistency I've gone back and updated all of our old ZBP data (2010 to 2014) and published these as text files in an addendum in the NYC Geodatabase Archive, in case anyone needs to go back in time. The updated crosswalk is stored in the database as the b_zips_to_zcta table.

  • We have new NYC subway ridership data for 2016 for the subway_complexes table, along with an updated notes table with information on station closures. The W train (relaunched in 2016) has been added to the list of attributes, but the new 2nd Ave subway stations (and the rerouting of the Q train) have not. These stations opened on Jan 1, 2017, so we have no ridership data for 2016. These stations are included in the subway_stations table, which we have not updated since January as there were no salient changes,

  • Lastly, we have 2016 ridership data for the PATH train stations in NYC, in the path_stations file.

Related to this work, I've created / updated a couple of resources outside of the NYC Geodatabase:

  • I often get requests for ridership data from folks who are not database or GIS users, so I've provided it in a spreadsheet format. The first workbook is for the NYC Subway and contains both the ridership data from the geodatabase and the service notes. The second workbook is for the PATH Train; unlike the geodatabase it contains ridership for ALL of the stations (not just the ones in NYC). Both workbooks contain a metadata sheet that covers the sources and content. Both spreadsheets are available on the NYC Mass Transit Spatial Layers page.

  • Under the NYC Geographies resources I've updated the NYC ZIP to ZCTA crosswalk spreadsheet. It contains a metadata worksheet that explains how the crosswalk was generated (using a national crosswalk file from Nov 2015 created by the UDS Mapper and distributed by the MCDC, and the MCDC's 2014geocorr engine for relating ZCTAs to counties). So if you have ZIP Code-level data that you want to aggregate by ZCTA (so you can map the data by ZCTA or associate it with ZCTA-level census data) you can use this spreadsheet or the b_zips_to_zcta table in the geodatabase.

Janine has cranked out a new version of our recently released IRS Migration Database, which contains state to state and county to county flows that represent where tax filers have moved from year to year. Here are the salient changes:

  1.  We have added the two latest years of data: 2013-14 and 2014-15.

  2. Beginning with 2013-14 the IRS added a new category to the state migration files to count internal state migrants. In the past, any filer who remained in the state was counted as a non-migrant, but now non-migrants are counted as filers whose address did not change. If they moved within the same state they are counted in a new, separate category as internal migrants. Since the non-migrants and internal migrant categories are mutually exclusive, both are stored in the regular inflow and outflow tables. The county migration tables do not have a comparable category: any person who remained in the same county was counted as a non-migrant, even if their address within the county changed.

  3. Beginning with 2013-14 the IRS increased the thresholds for disclosure in reporting migration flows. Individual state to state flows were suppressed if there were less than 10 migrant filers (previously the threshold was 3). Individual county to county flows were suppressed if there were less than 20 migrant filers (previously the threshold was 10).

  4. For this iteration of our database, we modified the county tables by moving records for Other Flows by US region (Northeast, Midwest, South, and West) from the regular inflow and outflow tables to the totals tables from 1995-1996 forward. These categories represent subdivisions of the Other Flows - Different State category. For these filers, their specific county of origin or destination is not tabulated because the total number of migrants was too small and fell under the disclosure thresholds. So they were aggregated into categories for Other Flows - Same State and Other Flows - Different State. For the latter, subcategories were provided that indicated the number of migrants from the other states by region; since these values are not mutually exclusive (they represent portions of Other Flows - Different State) they were moved out of the inflow and outflow tables and into the totals tables to avoid double counting. The state tables were unaffected, as their flows are not categorized in this manner.

  5. The way the Other Flows - Different State category was tabulated for counties prior to 1995-1996 was quite different; in these years, the categories represent different levels of specificity that were allowed by the disclosure rules. If the number of migrants could be reliably reported by region, then other flows categories for regions were reported. If this was not possible, then the region data was collapsed into one Other Flows - Different State category. For these earlier years these categories were mutually exclusive, so this data remains in the regular inflow and outflow tables.

We're happy to announce the release of two new datasets!

The IRS Migration Database is a SQLite / Spatialite database that has annual county to county and state to state migration data from 1990 to 2013 (for counties) and 1988 to 2013 (for states). Janine has spent over a year cleaning, collating, and organizing this data into one cohesive dataset. This data is generated by the IRS Statistics of Income Division by calculating the number of tax filers who have changed their address between tax years; if the address changed that means the filer moved. Records are summarized by county and state and show the number of filers, exemptions, and aggregate gross income. While the datasets cannot represent complete migration (not everyone files taxes) its time span and detail in showing point to point flows makes it a valuable resource for studying internal migration trends within the United States.

You can access the database using any number of free tools, like the SQLite Manager plugin for Firefox. Although the tables do not have spatial geometry, GIS users can view them and add them to projects in QGIS or ArcGIS as the database is saved in the Spatialite format. As the data represents many to many relationships, GIS users will need to write some queries to pivot the data to make it mappable; for example you can visualize all the inflow or outflow to one particular county or state, or to a group of places. Janine has included some sample views in the database to help get you started.

On the other side of the globe, we have recently acquired boundaries for City Municipal Wards for India with 2011 census data for twelve cities. This data is produced by ML Infomap and was procured by the CUNY Institute for Demographic Research. Thanks to them, we are able to provide this data by request to members of the Baruch community and to CIDR affiliates (since it's a proprietary and licensed product, we can't provide public access to it - sorry).

As we approach the summer we'll have additional updates to share, including an additional year of NYC real estate data and updates to the NYC Geodatabase that will include: new ZIP Code Business Pattern data, updated features for NYC facilities, and a new year of subway ridership data. We're also hoping to add the two latest years of data (2013-14 and 2014-15) to the IRS database. Stay tuned for details.

 

We've just released the latest iteration of our NYC Mass Transit Spatial Layers series (Jan 2017), which contain a number of important updates.

  1. We've completely re-organized the files for MTA bus stops and routes. Instead of having separate borough files for routes and stops (including the confusing mixture of bus company and local Queens bus routes), we have aggregated all of the local buses into one city-wide file, and have separated the express bus services into their own dedicated file.

  2. The new MTA subway stops file includes the three new stations on the 2nd Ave subway. There's also a new station on the SIR that replaces two older stations that have been demolished. The new routes file includes the W train and the extension of the Q train up 2nd Ave. In some cases we inserted the new routes data ourselves, as the MTA static feeds that were released in January still lacked this information. The train attribute of the stops file has been updated to indicate which trains stop there. There have been no updates to the subway entrances file, so we're keeping the previous one from spring 2016.

  3. We've added stops and routes files for the PATH train (Port Authority Trans-Hudson) which is the rapid transit system that connects Newark, Jersey City, and Hoboken to Manhattan. The files are a subset and derivative of data produced by the NJ GIS Office, which in turn were derived from GTFS schedule data from NJ Transit. We pulled out just the PATH train and have re-projected it to NY State Plane Long Island so that it fits seamlessly with the MTA mass transit layers in the series.

As usual we've moved our previous series (May 2016) into the NYC Mass Transit Spatial Layers Archive. Over the summer we hope to harmonize the subway ridership data that we include in the NYC Geodatabase with these layers.

 

 

Here are the latest updates in the new version (Jan 2017):

  • All of the American Community Survey data tables for PUMAs, ZCTAs, and census tracts have been updated with new data from the 5-year 2011-2015 ACS (the previous version of the database had tables from 2010-2014). The Census Bureau has re-coded all of their variables for asked rent, so that the bottom and top values have increased to reflect an increase in prices. The new variables do a better job at representing the distribution / spread of rent values.

  • We updated the subway stations layer from the latest MTA data feeds. The new stations layer includes: the three new stations on the 2nd Ave subway line, the new station on the Staten Island Railway (and the removal of two stations that the new one replaced), and updated train attributes to account for the re-routing of the Q train and the re-introduction of the W train. The related subway complexes layer was not updated, as new ridership data isn't available for these stations yet (the complexes layer represents the state of subway service and ridership for 2015). That update will come this summer.

All metadata and documentation is up to date. If you still need data from the previous version (Aug 2015) it's been moved to the NYC Geodatabase Archive section.

Registration is now open for the spring semester’s GIS (geographic information systems) Practicum, Introduction to GIS Using Open Source Software (featuring QGIS). There will only be one session this term, held in the GIS Lab at Baruch College:

  • Friday Mar 10th

The day-long workshop runs from 9am to 4:30pm. Current CUNY graduate students, faculty, and staff, and full-time Baruch undergrads are eligible to register. Advance registration is required; the fee is $30 and includes a detailed tutorial manual and a light breakfast. Participants must bring their own laptop with QGIS pre-installed in order to take the class. Visit the GIS Practicum page to learn more and to register: http://guides.newman.baruch.cuny.edu/gis/gisprac.

Registration is now open for the fall semester’s GIS (geographic information systems) Practicum, Introduction to GIS Using Open Source Software (featuring QGIS). The sessions will be held in the GIS Lab at Baruch College:

  •     Friday Sept 30th
  •     Friday Oct 28th
  •     Friday Nov 18th

The day-long workshop runs from 9am to 4:30pm. Current CUNY graduate students, faculty, and staff, and full-time Baruch undergrads are eligible to register. Advance registration is required; the fee is $30 and includes a detailed tutorial manual and a light breakfast. Participants must bring their own laptop with QGIS pre-installed in order to take the class. Visit the GIS Practicum page to learn more and to register: http://guides.newman.baruch.cuny.edu/gis/gisprac.

It may be summer, but the GIS Lab is still churning away! I have three updates to share.

First, the latest version of the GIS Practicum manual is now available. The workbook has been updated using the latest long term release of QGIS, 2.14 Essen. While most of the revisions are cosmetic tweaks to reflect changes in the interface, the sections on raster data and web mapping services did get some notable additions. We'll be running workshops using the new material this coming fall. CUNY affiliates who are interested in being notified once registration opens can sign up here.

Second, we've released the latest version of the NYC Geodatabase. There are two big updates here. First, the latest ZIP Code Business Patterns data has been added, with 2014 data replacing 2013. Second, all the subway data has been updated, with 2015 ridership data for the MTA and the PATH stations that are in NYC. As always, you can choose between a SQLite version that's optimal for QGIS or the Spatialite GUI or CLI tools, and an MS Access personal geodatabase that's suited for ArcGIS. 

Lastly, I've been updating many of the library's research guides and some of my tutorials. In the GIS Guide I've revamped the web mapping section, tossing the old Google map stuff and replacing it with videos and links about CARTO. In conjunction with these changes, I've updated several Tutorials including the one on bringing data into CARTO, as they've recently changed their name (from CartoDB) and interface. The NYC data guide has been cleaned up by adding several new resources and tossing some old ones. Also made some minor edits to the guides on US Census Data and Demography; for the latter, check out all the new book's we've purchased under New Titles.

This is an important update regarding several of the bus stop and route features in our NYC Mass Transit Spatial Layers series. We've discovered some anomalies between our descriptions and the file contents, and have updated our metadata accordingly:

  • The bus route and stop files initially designated for Queens actually just contain features in eastern Queens, primarily east of Flushing Meadows and I-678. We've changed the titles of these files to Eastern Queens and updated the descriptions accordingly.

  • The local bus routes and stops for western Queens are lumped into the Bus Company (i.e. express bus) files for the entire city. We've changed the titles of these files to Western Queens and Bus Company and updated the descriptions accordingly.

This isn't an ideal arrangement, as the local buses in Queens are fundamentally different from the express buses. You can distinguish between the local and express services by looking at the route number in the attribute table: if the route is designated with a 'Q' immediately followed by a number, it's a local Queens route. Otherwise, it's an express bus route. It's harder to tell with the stops; you can compare them against the routes to see which are local and which aren't. Most of the stops located in western Queens will be local bus stops.

Why are the files structured this way? It has to do with the internal organization of the MTA and how the buses are managed. The bus lines are divided into different groups based on how they are administered, and the data is structured to reflect this. When we download the raw GTFS bus files to process them, we download one set designated for each borough and one set designated for the bus company. Whatever is in those original sets gets carried over into the new files we create. Our files simply follow the provenance of the MTA files.

In some cases it's normal for bus routes or stops to appear in a neighboring borough, and we've noted this in our metadata all along. For example, the M60 is a Manhattan local that runs through Harlem to LaGuardia Airport In Queens. It appears in the Manhattan files and not the Queens files since it's designated as a Manhattan bus and stops primarily in Manhattan. Other cases are odd exceptions; the M100 runs in northern Manhattan from Harlem to Inwood, but is included in the Bronx files and not the Manhattan ones. In that case, it's simply because that route is managed / administered alongside the Bronx routes.

But the split within Queens and the combination with the express buses is counter-intuitive, and we're going to consider re-organizing this data when we assemble our next version at the end of this year. For now,  we've updated the metadata for the May 2016 version so that users are aware of this issue. If you want all bus stops or routes for Queens, this data is split into two files: a file with local buses for eastern Queens, and another with local buses for western Queens that also includes express buses city-wide.

 

 

It's been a busy semester, and now that it's ending we've wrapped up many of our projects and are ready to share the results. As always, we've meticulously documented all of our datasets so you know exactly what you're getting.

  • NYC Mass Transit Spatial Layers - Janine has created the latest version of our NYC bus, train, and subway features that we process and assemble from the MTA's raw data feed. If you want to make some nice subway maps or need to figure out where all the buses are going, look no further!

  • NYC Geocoded Real Estate Sales - Anastasia has completed the gargantuan task of creating this new dataset, where she's aggregated all of the city's real estate sales from 2003 to 2015 AND geocoded them using the city's geocoding API. We even went the extra mile and manually identified all unmatched records so that we have a complete dataset. We're making the layers available as shapefiles for each year, and as one big collection in a Spatialite database.

  • US Census Geocoding Script - I've written a Python 3.x script that uses the Census Bureau's Geocoding API and the external censusgeocoder module to batch process delimited text files of parsed and unparsed addresses. Check out the documentation that's included with the script for details.

Even though the summer is here, we'll keep plugging away - check the GIS Lab for our availability. Later this summer a new version of the NYC Geodatabase will be rolled out, and the latest datasets I've mentioned here will also be ported over to the NYU's spatial repository as part of our new and exciting collaboration. We have a few other new datasets in the works too, so stay tuned.