Part 4 - Thematic Mapping

The goal of part 4 is to introduce you to map layout and design, as well as to some additional data processing techniques. You will also grapple with coordinate systems and map projections, which are central components underlying GIS. You'll learn about cartographic representation and design and the practical implications of choosing how to classify and represent your data.

The goal of this particular exercise is to create a stand-alone thematic map to show the distribution of employment in the health care sector by state in the United States.

I. Transforming Map Projections II. More Geoprocessing III. Creating Calculated Fields IV. Classifying and Symbolizing Data V. Designing Maps VI. Adding Labels

Section I: Transforming Map Projections

This section will show you how to transform a file from one map projection to another. Choosing a coordinate system and map projection for your layers is of critical importance; all layers in a project need to share the same system in order to work together, and the choice of a system is influenced by the type of analysis you're doing and what your final map will depict.

Steps

  1. Create a new project. Open QGIS to an empty, blank project. Hit the Save As button. Browse to your data folder for part 4 and save the project as part4.qgs. We'll be working with this project throughout this part of the tutorial.
  2. Define the project window. On the menu bar go to Settings > Project Properties, scroll through the coordinate system list and select NAD 83 as the coordinate reference system (CRS). Click OK.
  3. Check the shapefile's CRS. Minimize QGIS, and use your file browser to browse through the data folder for part 4. You'll see there's a shapefile in the folder called st99_d00.shp, which represents the states of the United States. It has a .dbf, .shx, and .prj associated with it. Open the .prj file in a text editor (if using Windows, select the file, right click, choose the option to select a program from the list, select Notepad and click OK). You will see the projection information stored in the file:
    GEOGCS["NAD83",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS 1980",6378137,298.257222101,
                AUTHORITY["EPSG","7019"]],
            AUTHORITY["EPSG","6269"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.01745329251994328,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4269"]]
    . This file tells us that the shapefile is projected in NAD 83. Close the file when you're finished.
  4. Add the states shapefile. Maximize QGIS. Hit the Add Vector data button. Browse to the part4 data folder and add the st99_d00 shapefile. Use the Zoom In button, draw a box around the US and zoom in.
    Add US States Shapefile
  5. Define a custom projection. Now we need to transform our layer to a different CRS that's more suitable for a thematic map of the US. Instead of using NAD83, which is a basic geographic coordinate system (GCS), we are going to use a projected coordinate system (PCS). While QGIS has access to a large number of GCS's in its library, it doesn't have many PCS's that are suitable for continental or global projections. We need to add a custom projection to the QGIS library. In the data folder for part 4, open the file called lcc_na_proj4.txt. Copy the contents of the file:

    +proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs.

    Maximize QGIS. On the menu bar go to Settings > Custom CRS. Paste the projection information from the file into the Parameters box. In the Name box, type North America Lambert Conformal Conic. Hit the Save Button, then hit OK to close the window.
    Define a custom CRS
  6. Transform the projection (new in 1.7). Select the st99_d00 layer in the ML. Right click and hit Save As. Browse and save the file in your part 4 folder as states_reproject.shp. For the CRS, hit Browse and scroll down to the bottom of the CRS window and open the menu that says User Defined Coordinate Systems. Select the North America Lambert Conformal Conic projection that you defined and hit OK. Hit OK to create the file. Close the export window. Hit the add Vector data button to add the new states_reproject to your map window.
    Transform projectionNA LCC in the CRS List
  7. Change the definition for the window. At this point you'll notice something strange; your layers will not be able to draw properly because you now have two layers that don't share the same CRS. Select the old st99_d00 layer in the map legend (ML), right click and remove it. Select the state_reproject layer in the ML and hit the Zoom to Layer button. You should see your newly projected layer. With your new states_reproject layer selected in the ML, right click, and choose Set Project CRS From Layer. This will give your project and map window the same projection as your file (alternatively, on the menu bar you could go to Settings > Project Properties, scroll through the coordinate system list and select the North American Lambert Conformal Conic CRS in the Custom projection section). Hit the Save button.
    Layer reprojected in NA Lambert Conformal Conic

Commentary

Understanding Coordinate Reference Systems

All GIS layers are created using a specific coordinate reference system (CRS). The reason that we can take data from different sources and overlay them in GIS is because they share the same system; likewise, we can plot coordinate data and create layers because there's a coordinate system under the hood of our map window. In order for everything to work, your layers must share the same system and the map window must be defined to use that system. GIS software can be used to transform layers from one system to another. Each CRS is composed of at least three or four parts:

Spheroid or Ellipsoid: We typically imagine the earth as a perfectly round sphere, but in reality the earth is rather lumpy and uneven, with protrusions in some areas and indentations in others. The shape of the earth is approximated using spheroids, round three dimensional models of the earth, and ellipsoids, which represent the earth as being more oval than sphere-like in nature.

Coordinate System: This is the reference grid used for locating places on the earth and measuring distances. Latitude and longitude is the most common system, but there are other systems with different grid cells and units of measure; for example, the Universal Transverse Mercator (UTM) system uses a unique grid.

Datum: When you apply a coordinate system like latitude and longitude to different spheroids or ellipsoids, there needs to be a method for creating the grid and attaching it to the earth's surface. Mathematically, where does one draw the prime meridian and equator on a particular spheroid in order to accurately represent their location? The frame of reference for drawing these lines and measuring locations on the surface of the earth is called a datum.

Collectively, when you have these three elements: a spheroid or ellipsoid, a datum, and a coordinate system, you have something called a Geographic Coordinate System (GCS). The terminology is confusing, as a coordinate system is one part of a geographic coordinate system, and some systems are named based on the datum they use. For example, WGS84 (World Geodetic System of 1984) is the most common GCS and uses the WGS84 spheroid, WGS84 as a datum, and latitude and longitude as a coordinate system. WGS84 is used by the Global Positioning System of satellites and thus by individual GPS units as a default, and is commonly used by online mapping applications. It is so common that it is often referred to a THE Geographic Coordinate System. There are other systems; in North America NAD83 (North American Datum of 1983) is widely used, particularly by government agencies. It uses GRS 1980 as a spheroid, NAD83 as the datum, and lat and long as the coordinate system.

If you add a map projection as the fourth element to the spheroid/ellipsoid, datum, coordinate system trio, you have a projected coordinate system (PCS):

Projection: Map Projections are mathematical systems for taking the three dimensional earth and transforming it to a flat two dimensional surface. There is no way to take a 3D shape and accurately represent it on a 2D surface, so map projections are designed to preserve one quality of the earth - area, shape, or distance/direction, or are created as a compromise to make the earth appear the way we expect it to appear on a flat surface.

It's important to understand the distinction between a GCS and a PCS, because when you go to transform a layer or define a projection these two systems will be stored or organized in the software separately, under different menus or tabs. You should use a GCS when you're doing analysis, measuring distances, or working in a relatively small geographic area. You should use a PCS when you're creating a thematic map or need to have a certain quality of the earth (area or shape) preserved.

Latitude and Longitude

The most common coordinate system is latitude and longitude, a grid system that covers the earth and uses a unit of measurement called a degree. Lines of latitude, called parallels, run east-west. The origin of latitude is the equator, which is zero degrees latitude. The equator bisects the earth and along this line there are twelve hours of daylight and twelve hours of darkness each day, throughout the year. Lines of latitude run 90 degrees to the north pole and 90 degrees to the south pole. One degree of latitude is equal to approximately sixty-nine miles, and since they are parallel lines they never converge.

Lines of longitude, called meridians, run north-south. Unlike the equator, which is the defacto line of latitude based on natural phenomena, the selection of an origin for longitude is arbitrary. The Prime Meridian, zero degrees longitude, was designated as the origin parallel in the 19th century. It runs through the center of the astronomical observatory in Greenwich, UK. There are 180 degrees of longitude to the east and to the west of the prime meridian. The meridian that is opposite the prime meridian on the far side of the globe, 180 degrees longitude, is the International Date Line. Unlike latitude, longitude converges at the poles to a single point at zero degrees. Since lines of longitude converge there isn't a uniform distance between them - the distance decreases as you move away from the equator. At the equator one degree of longitude is approximately 69 miles across, while at the poles it is zero miles.

Latitude and longitude at 15 degree intervals Equator 0 deg Lat and Prime Meridian 0 deg Long Equator 0 deg Lat and International Date Line 180 deg Long Longitude converges at the poles

There are two conventions for recording coordinates: in degrees, minutes, and seconds (DMS) or as decimal degrees (DEC). Take a look at the following coordinates for Philadelphia, PA from the USGS GNIS gazetteer:

39 deg 57' 08'' N 75 deg 9' 50'' W (DMS)

39.952335, -75.163789 (DEC)

The DMS notation is similar to the notation for telling time - there are 60 minutes in one degree and 60 seconds in one minute. DEC notation is preferable for computer processing; if you're plotting coordinates in GIS they should be in DEC. In DEC, latitudes south of the equator and longitude west from the prime meridian to the international date line are recorded as negative numbers. It is crucial that DEC coordinates indicate direction, otherwise you'll be confusing your point with a different place:

39.952335, -75.163789 is Philadelphia, PA USA

39.952335, 75.163789 is a remote area in western China near the Kyrgyzstan border

Map Projections

Most people today would agree that the earth is round. Most maps, whether they're on paper or a computer screen, are flat. When you take a three dimensional sphere and flatten it to two dimensions, you get fair amount of distortion. Imagine removing the peel from an orange and laying it out flat - you can't do it without tearing the peel. A map projection is a method for taking the three dimensional earth and transforming it to a flat surface.

For a nice overview, visit Radical Cartography's projection page and note the common projections (marked in pink). Projections can be classified based on how the grid is applied to the earth's surface - a grid laid flat on top (azimuthal), wrapped as a cone on the top half of the earth (conical), wrapped around the earth as a cylinder (cylindrical), etc. They can also be organized based on which property they preserve:

Other projections:

You can compare maps that use different projections to get a sense for how they distort different areas (in particular, observe Greenland):

mollweide robinson
Mollweide Robinson
general coordinate system mercator
GCS (aka Plate Caree) Mercator

Common map projections for the world for general reference or thematic use include Robinson, Mollweide, Goode Homolosine, and Winkel Tripel (the first two have proj4 definitions that can be custom defined in QGIS). In general, projections that appear oval-like, showing the curvature of the earth at the edges, are best for general or thematic use.

Every continent and country has a preferred map projection or set of projections that is appropriate for each area based on its size and shape. Look at atlases or pre-existing maps to get an idea of what these are. Albers Equal Area, Lambert Equal Area, and Lambert Conformal are common and are adjusted to focus on specific continents or countries. Orthographic projections are used to map polar areas.

GCS Definitions

Several formats have been created for recording the definition of projections. There's the Open Geospatial Consortium's Well-Known Text Format (OGC WKT) as seen in the example we worked through, the Proj4 format, which we used to define a custom CRS in QGIS, and .prj file format created by ESRI. To look up CRS information, you can use the Spatial Reference website at http://spatialreference.org/. Use that site to get the proj4 format for creating custom projections in QGIS. When you open a .prj file and look at the definition, you'll see the elements that make up the GCS (projection, datum, spheroid) as well as units of measurement and origin information:

PROJCS["North_America_Lambert_Conformal_Conic",
    GEOGCS["GCS_North_American_1983",
        DATUM["North_American_Datum_1983",
            SPHEROID["GRS_1980",6378137,298.257222101]],
        PRIMEM["Greenwich",0],
        UNIT["Degree",0.017453292519943295]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["False_Easting",0],
    PARAMETER["False_Northing",0],
    PARAMETER["Central_Meridian",-96],
    PARAMETER["Standard_Parallel_1",20],
    PARAMETER["Standard_Parallel_2",60],
    PARAMETER["Latitude_Of_Origin",40],
    UNIT["Meter",1],
    AUTHORITY["EPSG","102009"]]

Geographic reference systems have also been classified with codes, which makes them easier to identify and retrieve. The QGIS CRS draws it's systems from a library called the European Petroleum Services Group (EPSG). This library contains most of the primary GCS systems, such as WGS84 and NAD83, and local PCS systems like State Plane. For example, EPSG 4269 is the code for NAD 83, and EPSG 4326 is the code for WGS 84. The advantage of the codes is clearer when you're working with longer names: NAD 83 NY State Plane Long Island is abbreviated to EPSG 32118. The EPSG library lacks most of the PCS systems for continental and global map projections, which is why these are not available in QGIS; search Spatial Reference to find the proj4 definitions for these projections in order to custom define them. A brief list of common projections and definitions is included in the appendix of this tutorial.

Defining Undefined Projections

All shapefiles have a CRS and were created based on a particular one, but in some cases you may download or come across a file where the projection information for the shapefile, the .prj, is missing. In order to use the shapefile you will have to define the projection and create a .prj for it, so that the software will know how to render and layer it properly. To do this you'll have to go back to the website or source and look for some metadata that will tell you what CRS the file is in. The metadata could be listed on the download website, in a README or narrative file that accompanies the shapefile, or in an XML file accompanying the shapefile that was written based on metadata standards.

Once you know what the projection is, you can go to Vector > Data Management Tools > Define current projection. You can assign the projection from the QGIS databases of projections or you can import it from an existing shapefile that has the proper projection.

Note that defining a projection is DIFFERENT from transforming one. You DEFINE projections for shapefiles that are undefined, in order to tell the software what projection it is in. Use the Define current projection tool for that purpose. You TRANSFORM projections for shapefiles that are defined and have a projection, in order to convert them from one projection to another for a specific purpose. Select the shapefile in the Map Legend and do a Save As to convert the shapefile from one projection to another.

QGIS Projection Handling

In this tutorial, and in general, I suggest that you: know what CRS your layers are in, make sure all of the files you're using share the same CRS, and define the project window to match your layer's CRS. I believe that this cuts down on confusion and helps avoid errors caused by mis-aligning data layers and using systems of measurement that don't match. By default, you have to explicitly define the QGIS project window to match your layers.

However - you have options. Go to Settings > Options > CRS tab to see what they are. The default projection when you start a new project is WGS 84; if you know that you'll usually be working in another projection, you can set that here and it will save you the step of constantly having to define the window for each new project.

Below this is a checkbox for enabling on the fly projection. If you enable this, QGIS will attempt to redraw layers if they don't match the projection of the window or other layers. This makes the software easier to use, but as I've suggested it could lead to problems later.

CRS tab under Options menu

You also have the option of setting the default projection for new layers that are created or are added to the window without a projection. The default setting is one projection that you select, and it's WGS 84. Once again, if you know you'll be working with a particular projection constantly you can select it here and save some time. Throughout this tutorial we've constantly been selecting NAD 83 as the default CRS every time we create a new file. But if we change this setting to NAD83, we could have skipped that step, as NAD 83 would be assigned by default each time. Alternatively, you can choose one of the other radio button options - using the project window's CRS is a safe bet, if you're following the practice of keeping your window and all of your files in the same CRS.

Lastly, if you know that your layers are defined properly and share the same projection, there's a shortcut that lets you assign the same projection to the map window. You can select one of the layers in the ML, right click, and choose the option to Set Project CRS From Layer. This saves you the step of going to Settings > Project Properties > CRS tab and setting the project CRS by scrolling through the CRS list.


Section II: More Geoprocessing

This section will demonstrate a few more geoprocessing techniques that you're likely to need. You'll learn how to convert a single part layer to a multi-part layer and will do another table join.

Steps

  1. Count features for your layer (new in 1.7). Select the states_reproject layer in the ML, right click and check the Show features count box. It tells us there are 273 features. If there are 50 states, plus two (DC and Puerto Rico), how could there be 273 features?
  2. Open the table and examine a selection. Select the states_reproject layer in the ML, right click and open the attribute table. Notice that there are several records for Alaska. Click on the first record for Alaska in the table to select it. Close the attribute table. Pan your map view to see Alaska. Notice that one large portion of the state is selected, but none of the islands that are separate from the mainland are. This shapefile is a single-part file, meaning that each individual polygon is an independent feature with its own record in the attribute table. You can select other islands in Alaska with the Select feature tool to test this. When you're finished, clear all selected features.
    Alaska single part feature
  3. Convert to multi-part feature. Before we join an attribute table to our shapefile, we need to convert the layer to a multi-part feature - a layer where a single feature (state) can be made up of multiple polygons. This will allow us to do a one to one join between the state layer and the attribute table. On the menu bar go to Vector > Geometry Tools > Singleparts to mulitpart. The input file will be states_reproject. The Unique ID field that will be used to convert the file (associating individual polygons with a feature) is the STATE field. Browse to the data folder for part4 and save the file as states_multi.shp. Click OK. Click Yes to add the new layer to the project. Close the single to multi menu.
    Singlepart to Multipart
  4. Do a feature count. Select the states_multi layer in the ML, right click and check the Show features count box. It tells us we have 52 features, which is what we're expecting. Select the old states_reproject layer in the ML, right click and remove it. Hit the Save button.
  5. Examine the employment data table. Minimize QGIS. Use your file browser to go to the part 4 data folder. Find the file called hc_emp_bls_2009.csv. Right click on the file and open it with a text editor (Windows users should use Notepad). This data is stored in a plain text, comma delimited format. Each column or field is separated by a comma, and each record (one for each state) is stored on a separate line. The data is from the US Bureau of Labor Statistics (BLS) and there are 51 records, one for each state and DC. The CODE field is a state FIPS code we can use for joining (a list of these is included in the appendix), EMP62 is the number of people who are employed in the Health Care and Social Assistance sector, as defined by the North American Industrial Classification System (NAICS), and TOTAL_EMP is the total number of people in the labor force in 2009. Close the file. Now look for the file hc_emp_bls_2009.csvt and open it. This file contains a single line and has one entry for each column in our csv file. It specifies the type of data stored in each column - text (string) or number (integer in this case). QGIS will reference this file to store and display our csv data correctly when we open it in QGIS. Close the file.
    Employment data in CSVCSVT file
  6. Add the CSV table to your project. Maximize QGIS. Hit the Add Vector data button. Browse to your part4 data folder and select hc_emp_bls_2009.csv. If you don't see it make sure the Files of Type dropdown is set to All Files. Add it it your project. Select it in the ML, right click, open the attribute table. If the file imported correctly the text-based columns should be left-centered and numeric columns should be right-centered (if this isn't the case then there's a problem with your csvt file). Notice the CODE field - this is the two digit FIPS code that identifies each state. We'll be able to match this to the FIPS code stored in our shapefile in the STATE field. Close the table.
  7. Join the data to the shapefile. Select states_multi in the ML, double click, and open the Joins tab in the properties menu. Hit the green plus sign to add a new join. hc_emp_bls_2009 is the join layer, CODE is the join field in that layer, and STATE is the target field in our shapefile. Click OK. Close the properties menu. Select states_multi, right click and open the attribute table. You'll see that all of the data has been added. However, if you look at the record for Puerto Rico, you'll see that the values from the data table are NULL; this is because we had a feature for Puerto Rico in our shapefile, but no record in the data table. Close the table. Save your project.
    Join Tab
  8. Work around for 1.7. Because of a bug in version 1.7 we'll have to permanently fuse our data table to our shapefile in order to categorize and symbolize our data properly. Select states_multi, right click, and select Save as. Save the new layer as states_data in your part4 data folder. Make sure to Browse for a CRS, to assign our custom CRS, NA Lambert Conformal Conic, to the new data layer. Custom projections are always located at the bottom CRS selector window. Hit OK to save the new layer. You'll get an error message saying that not all of the features could be drawn - that's OK. Puerto Rico won't be drawn as we have no matching data for it, and since we have no data for it we won't be mapping it. Add the new states_data layer to the project and remove the old states_multi layer. If you open the attribute table for states_data all of the data for the 50 states and DC should be there. Save your project.
    Save As New Layer
  9. Inspect the layer. Zoom in to the northeastern US, to the area around New York City. You'll notice that, unlike the previous census file we worked with from TIGER, this file has already been modified to remove bodies of water from state boundaries. But if you look at the NYC area, you'll see that Manhattan and Long Island appear joined to the mainland. This shapefile is from the Census Generalized Cartographic Boundary Files; they are TIGER files that have had their boundaries simplified so they appear less jagged at small scales (viewing the US as a whole) but are not appropriate for large scale maps (viewing a small area like the NYC metro).
    Generalized boundaries

Commentary

Singlepart and Multipart Features

Polygon features in shapefiles or other vector formats may consist of multi-part or single part features. With single-part features, each individual polygon has a record in the attribute table. With multi-part features, each feature has a record in the attribute table regardless of how many polygons make up the feature. For example, in a single-part shapefile of Hawaii each island has it's own record in the attribute table (1st image below), whereas in a multi-part shapefile all of the islands are combined into a single feature, the State of Hawaii, for which there is one record in the attribute table (2nd image below). Most GIS systems have tools for converting one format to another - this is important, because if you want to join a data table to a shapefile, you'll usually want it to be a multi-part shapefile. For example, if you are joining a state-based data table to a shapefile, it wouldn't make sense to assign the entire population of Hawaii to each individual island - it would lead to errors when classifying data and calculating statistics.

Single-part features HawaiiMulti-part features Hawaii

Generalization and Scale

The Census Generalized Cartographic Boundary Files http://www.census.gov/geo/www/cob/bdy_files.html that we are using in this part of the tutorial were designed for creating maps of the US at a national or regional scale. According to the Census Bureau, "The cartographic boundary files are primarily designed for small scale, thematic mapping applications at a target scale range of 1:500,000 to 1:5,000,000." Boundaries have been generalized to depict land areas, to smooth coastlines and boundaries, and to remove small islands. This makes the boundaries appear smoother and cleaner at these smaller scales, while sacrificing accuracy that wouldn't be visible.

When choosing vector files for thematic mapping you will need to make sure that the generalization for the file is appropriate for the scale you're working at. If you were creating a map of the NYC metro area, you would not want to use these boundary files as the generalizations become apparent at this larger scale and will make your maps appear inaccurate. You can identify whether a layer is appropriate by looking at the metadata and seeing if an optimal scale is indicated. Scale is a proportion of units of measurement on the map versus the actual distance in reality. A scale of 1:5,000,000 indicates that one measurement unit on the map represents 5,000,000 units in reality. Small scale maps cover large areas while large scale maps cover small areas; this may seem counter-intuitive, but remember that scales represent fractions: 1/5,000 is a larger number (and thus larger scale) than 1/5,000,000. Most GIS software have tools for generalizing boundaries if you need them to be more simplified.

The screenshot below illustrates differences in generalization in scale using vector data from the Natural Earth website, which provides free generalized vector data for creating professional medium to small scale maps. The map below is of the Delmarva peninsula and shows an overlay of 3 different shapefiles created at 3 different scales. Scales range from small to large and from most to least generalized: the beige area is a 1:110 mil scale, the red line is 1:50 mil scale, and the blue line is 1:10 mil scale. Obviously you wouldn't want to use the 1:110 scale layer depicted in beige to create a map of this area as it is far too generalized, but it would be well-suited for a national map.

Comparison of generalization and scale

One footnote regarding the Census Generalized Cartographic Boundary Files - when you download the shapefiles they will be undefined, i.e. they are missing .prj files. You must define the projection before you use them (see commentary in the previous section about defining shapefiles). Boundaries from 2000 should be defined as NAD 83 and boundaries from 1990 as NAD 27. Generalized Boundaries for 2010 have not been released yet.

CSV Files

CSV files (comma separated values) are an alternative, stand-alone data format to DBF files that you can use in QGIS to match data tables to shapefiles. In Part 3 of this tutorial we worked with a tab delimited text file to get coordinate data into QGIS. CSV files are essentially the same format; they are plain text files where fields are separated by commas (as opposed to other delimiters like tabs or pipes) and records are separated on different lines. Compared to DBF, CSV is a much more common format that you can create in any text editor or spreadsheet program, and when you download attribute data from a website or digital repository CSV is almost always an option.

Unlike DBF, CSV files do not contain any embedded information that specifies the type of data stored in each field. QGIS automatically imports all fields in CSV files as strings. This is problematic, as numbers imported as strings cannot be treated as numbers (grouped into graduated categories or operated on mathematically). CSVT files are used to overcome this. You must create these by hand in a text editor and provide a data type for every column in your csv file. The names of the data types are placed in quotes and separated by commas. The file must have the same name as the csv file, must be saved with the extension .csvt, and must be stored in the same directory as the csv. The following data types are supported:

You should be careful when opening csv files, or any delimited text files, in Microsoft Excel. Excel imports the CSV and automatically saves any value that looks like a number as a number. This has the unintended effect of rendering identifiers like FIPS codes and ZIP Codes useless, as zeros are dropped from preceding values. Even if you open a CSV file in Excel and don't save it, the file is still altered. In order to convert values back to their original form you would have to use the concatenate formula on any values that are less than their expected length and pad them with zeros. You can avoid these problems by working with CSV in text editors, or by using other spreadsheet programs. For example, when you open a csv file in Open Office Calc you are prompted to designate the data type for each field. Designate your identifiers as text / strings and they will be preserved.


Section III: Creating Calculated Fields

This section will show you how to add new calculated fields to a shapefile in QGIS. In many instances mapping numbers that represent whole values may not make sense and you'll want to create derived values. In this section you'll create a percent total to show the concentration of employment in the public health sector across different states.

Steps

  1. Enter the edit mode. Select the states_data layer in the ML, right click and open the attribute table. Hit the Edit button below the table to enter the edit mode. Since we are making changes to the actual shapefile we need to do that from an edit mode.
  2. Launch the field calculator. Hit the field calculator button that's a few buttons to the right of the edit button. This opens the Field Calculator window. Under New Field, type P_Total as the output field. Change the field type to a Decimal number (real). Keep the output field width to 10 (default width setting in the table window) but change the precision to 3 (number of places right of the decimal point). In the fields box, click EMP_62 to add it to the expression field. Hit the divisor symbol under the operators. Then click EMP_TOTAL in the Fields box. Your field expression should read EMP62 / EMP_TOTAL. Hit OK. Back on the attribute table screen, hit the Edit button to stop editing and save your edits. You'll see the new percent total field appended on the right.
    Creating a calculated field
  3. Examine your data. Click on the P_TOTAL column to sort the data. When you're finished examining the data close the table. You can Save your project at this point, but the edits to your shapefile have already been saved, since we were working on the shapefile directly and saved it after the edit mode.
    New Fields

Commentary

Representing Values

In some circumstances it may make sense to map values as whole numbers - cities by number of crimes, states by total population, counties by number of renter-occupied housing units, etc. But in each of these examples a particular place could have a higher value simply because it has more people or is a larger place. In order to make more meaningful comparisons it's often necessary to do a little math:

Location Quotients

A location quotient is a common indicator used in economic base analysis. It compares a local economy to the greater economy in order to measure how specialized a local economy is for particular industries. The result is a ratio that measures the concentration of economic activity. For this exercise, if you wanted something more specialized than a basic percent total you could calculate a location quotient:

(employment in industry in local economy / total employment in local economy) / (employment in industry in national economy / total employment in national economy)

In our example above you would use this expression in the field calculator:

(NAICS62 / 17759870) / (TOTAL_EMP / 128607843)

The numbers represent the total number of people employed in health care in the US and the total number of people in the US labor force. Here's an example:

NY: (1384770 / 17759870) / (8343862 / 128607843)= 1.20

NJ: (528613 / 17759870) / (3771296 / 128607843)= 1.02

So in this example, the NY economy is more specialized in health care and social assistance relative to the nation, while NJ is average in this industry relative to the national economy.

The data used in our example covers both public and private sector employment from 2009 and comes from the US Bureau of Labor Statistics Quarterly Census of Employment and Wages (QCEW) at http://www.bls.gov/data/.

Industrial Classification: NAICS

The North American Industrial Classification System (NAICS) is a hierarchical system of codes used to classify businesses into industries in the US, Canada, and Mexico. It was created in the mid 1990s and replaced the older Standard Industrial Classification (SIC) system. The NAICS system consists of broad industrial sectors defined with two digits that can be broken down into more specific subsectors with additional digits.

In our example we are studying the labor force of NAICS 62, the Health Care and Social Assistance Sector. Establishments in NAICS 62 provide health care and social assistance for individuals. The services provided by establishments in this sector are delivered by trained professionals; health practitioners or social workers with the requisite expertise. NAICS 62 can be broken down into more specific subsectors that include:

Each of these 3 digit subsectors can be broken down into 4 digit groups. For example, subsector 621 Ambulatory Health Care Services can be broken down to:

4 digit groups can be broken down further to 5 digit industries (6124 Outpatient Care Centers breaks down to 62141 Family Planning Centers, 62142 Outpatient Mental Health and Substance Abuse Centers, 62149 Other Outpatient Care Centers), and 5 digit industries can be broken down to 6 digit national industries (62149 breaks down to 621491 HMO Medical Centers, 621492 Kidney Dialysis Centers, 621493 Freestanding Ambulatory Surgical and Emergency Centers, and 621498 All Other Outpatient Care Centers ).

You can browse and download the codes at http://www.census.gov/eos/www/naics/. They are widely used by government agencies that produce data for industries (US Bureau of Labor Statistics, US Census Bureau, Statistics Canada, National Institute of Statistics Mexico) as well as private companies that produce databases or information retrieval systems that focus on industrial research. The NAICS system is largely compatible with the UN Statistics Division's International Standard Industrial Classification (ISIC) codes.


Section IV: Classifying and Symbolizing Data

In this section you'll learn about the different methods for classifying data and the best approach for choosing color schemes to symbolize your data. These are important concepts to grasp, as they have a direct impact on how successful your map will be in communicating your data.

QGIS has been providing two different symbology or style menus for several versions now - an old stable version and a newer experimental version. In this section we'll work exclusively with the "New Symbology", as it looks like the "Old Symbology" is going to be dropped in future versions of QGIS.

Steps

  1. Classify your data. Select states_data in the ML and double-click to open the properties menu. Go to the Style tab. If it's not currently active, click the New symbology button to apply the new symbols for this layer. In the New Symbology change the classification drop down from Single Symbol to Graduated. Change the Column (the field you're classifying) from AREA to P_Total. Change the number of classes from 5 to 4. Keep the mode as Equal Interval. Choose one of the default color ramps (like green or blue) and hit the Classify button at the bottom of the menu.
    Classified with initial color scheme
  2. Select a new color scheme The default color ramps in QGIS go from dark to light for low to high values, which is the opposite of standard cartographic convention; you would expect that the colors go from light to dark for low to high values (in most cases). You shouldn't use these defaults as you'll probably confuse your map readers. So - add a new color ramp. Hit the color ramp dropdown and choose New color ramp at the bottom. On the color ramp type screen choose ColorBrewer and hit OK. On the Colorbrewer ramp choose a scheme - for quantitative data with only positive values you should choose a color scheme that uses a single color value from light to dark - DO NOT choose a multi-color or random scheme. Hit OK once you've made your choice, and then give your color layer a name (like CB_greens or CB_oranges). Hit OK. Back at the Style menu, choose your newly added color scheme from the dropdown and hit Classify to reclass your data with the new colors. Hit OK to map your data.
    Colorbrewer RampClassified with new Colorbrewer scheme
  3. Examine the Equal Intervals map. In the styles tab we used the default classification scheme called Equal Intervals. This took our four classes of data and divided it so that each class has an equal range of values; with a min value of .089 and a max value of .18 our data has a range of .091 - divide by four and each class covers a range of .2275, sorted from lowest to highest. Remember that these are percentages in decimal format (.089 is 8.9% and .18 is 18%, etc). Right click on states_data in the ML and check the Show feature count option. You'll see the number of states in each class varies, but the range of values in each class is constant (2.275 percentage points in each class).
    Equal Intervals Map
  4. Map data using Quantiles. However, we could use an alternate classification method called Quantiles. Double click on the states_data layer to go back to the style tab under the properties menu. Change the classification mode to Quantiles and hit Classify. Hit OK to re-map your data in this scheme, and take a look at the result. Compared to the equal intervals map, quantiles show us a greater range of colors since each class has the same number of features. Quantiles divides our data into classes that have an equal number of data points. Since we have 51 data points (50 states plus DC), we have about 13 states in each class sorted from low to high, as you can see in the feature count.
    Quantiles Map
  5. Map data using natural breaks. We have another option. Double click on the states_data layer to go back to the style tab under the properties menu. Change the classification mode to Natural Breaks (Jenks) and hit Classify. Hit OK to re-map your data in this scheme. The natural breaks method classifies data based on the location of gaps or breaks in the data range, which is less arbitrary than equal intervals or quantiles. Notice how there are only two states in the lowest category. If you select states_data in the ML, open the attribute table, and sort by P_Total, you'll see Nevada and DC are in this class. After DC, there's a large gap of 1% point between DC and the state in the next class, California, large enough that the natural breaks formula created a class break here. An easier way to envision how natural breaks works is with a scatterplot, which illustrates where gaps in the data exist (you can create scatterplots in a spreadsheet or statistical package).
    Natural Breaks MapScatter Plot of Data
  6. Save your project. At this point save your project. For our map we'll stick with the natural breaks method, but read the commentary below for an explanation of each method and it's advantages and disadvantages.

Commentary

Data Classification and Color Schemes

The purpose of a thematic map is to communicate a message about the data. If a map uses too few classes, then the data is too generalized and meaningful patterns can be hidden. If a map uses too many classes, then a pattern becomes difficult to detect because there is too much detail. It is difficult for the human eye to distinguish between too many colors or variations of color. Generally speaking, it is a good idea to use 3 to 6 classes, and ideally 4 or 5. When choosing the number of classes you should consider the number of data points, the range of the data, the purpose of the map, and the color choice based on the output. While a certain number and range of colors may look good on a color printed map, they may appear washed out if the map is shown on a projector or blurred together if photocopied in black and white. You should design with the final output in mind.

After ranking the data from lowest to highest values, there are a number of classification methods:

It's often necessary to make some common sense adjustments to any classification scheme, such as creating unique classes for values of zero or missing values, and adjusting classes so they don't contain a mix of negative and positive values. In QGIS you have the ability to adjust classes or create manual classes. To do this, you classify the data using one of the standard methods in the Style tab for the layer, then select the class that you want to change and double click on the range. You'll be able to type the values in by hand.

The natural breaks method tends to be preferred by geographers for classifying data for maps. You can use the natural breaks option to classify the data using the Jenks algorithm, or you can plot the data in a scatterplot and create classes manually based on where you think the largest gaps are. Take a look at the maps and data for this project below to compare how the different classification methods group the data; lines under values denote a break between one group and the next. In this particular case the equal intervals and natural breaks method yield similar results; this is coincidental. The data we're examining is rather evenly spaced around the mean, and the split based on equal values and the location of gaps in the data is nearly identical. In other cases these classification methods will yield quite different results.

  equal intervals quantiles natural breaks
State Equal Intervals Quantiles Natural Breaks
NV .089 .089 .089
DC .091 .091   .091  
CA .109 .109 .109
UT   .110   .110 .110
CO .115 .115 .115
HI .116 .116 .116
GA .118 .118 .118
VA .119 .119 .119
WY .119 .119 .119
AZ .124 .124 .124
AK .126 .126 .126
IL .128 .128 .128
SC .128   .128   .128
TX .128 .128 .128
ID .129 .129 .129
WA .130 .130 .130
AL   .132   .132   .132  
MD .135 .135 .135
OK .135 .135 .135
OR .136 .136 .136
FL .138 .138 .138
KS .138 .138 .138
NE .139 .139 .139
IA .140 .140 .140
IN .140 .140 .140
NJ .140   .140   .140
TN .142 .142 .142
MT .143 .143 .143
WI .143 .143 .143
KY .144 .144 .144
NC .145 .145 .145
NH .146 .146 .146
MO .147 .147 .147
DE .151 .151 .151
ND .151 .151 .151
MS .152 .152 .152
NM .152 .152 .152
LA .153   .153   .153
MI   .154   .154   .154  
AR .158 .158 .158
SD .159 .159 .159
OH .161 .161 .161
MA .163 .163 .163
MN .163 .163 .163
CT .164 .164 .164
NY .166 .166 .166
PA .168 .168 .168
VT .168 .168 .168
ME .178 .178 .178
RI .179 .179 .179
WV .180 .180 .180

Color schemes for displaying quantitative values on choropleth (shaded area) maps should show a logical progression of color values. The progression from light to dark helps convey the change in data values from low to high, and most map readers can infer this without even looking at the map legend. Creating a mixed, fruit salad of colors will defeat this natural inference and will confuse the map reader - so don't do it. When comparing qualitative values (categorical data instead of ranges of values), a map should use colors that reflect those values. For example, it makes sense to use reds and blues to show which political party a state voted for, as these colors have become associated with the US political process. Without even looking at a legend or description, the average American will instantly understand what this map is about. Depicting the same data with greens and yellows doesn't make much sense, and results in confusion.

Good color choicePoor color choice

While we're not considering it for this exercise, the unit of geography used to map phenomena can profoundly affect the interpretation of a distribution or pattern and the ultimate message that your map sends. Mapping populations of US states or Canadian provinces is fine if you are interested in seeing which states / provinces have the most people. But these maps tell you very little about how the population is distributed across these countries, since there is considerable variation in the concentration of people in each state / province. Using a smaller unit of geography can give you a better idea of the distribution of the population. We can see in the first map below that Canada's population is highly concentrated in certain metropolitan areas. However, even the census divisions in the map are not evenly populated. Given that Canada has large unpopulated areas, Statistics Canada has created a layer called an ecumene to show where concentrated areas of population are - this is what you see in the 2nd map. (Source: Geography Division, Statistics Canada, Population Ecumene Census Division Cartographic Boundary File, 2006 Census 92-159-XWE/XWF):

Population by census divisionPopulation by census division with ecumenes

Oftentimes you'll be limited to using certain geographic units based on the availability of the data. For example, it's relatively easy to get current US Census data at the county level, but is rather difficult to get it for ZIP codes, making it necessary to compromise.

Colorbrewer

Colorbrewer is an online tool for choosing good color schemes for thematic maps. Recent versions of QGIS have integrated many of the color schemes from this tool in the New Symbology tools. But it's still worth visiting the site at http://colorbrewer2.org/ to guide you in choosing good colors. The tool let's you choose the number of classes and class options like sequential (for quantitaive data we've used in our example), categorical (for nominal or qualitative data), and others. You also have the ability to filter color schemes based on desired output. In the lower-right hand corner of the map, you can click on a scorecard that shows whether your choice is ideal for the color blind, color printing, photocopying, and viewing on an LCD screen. You should always choose color schemes based on what your final output format will be.

Colorbrewer Tool

Colorbrewer gives you the option to export your color choices out as text, where the text is some notation for representing color such as RGB or hexadecimal (used in HTML for identifying colors). In older versions of QGIS and under the Old Symbology tab this was useful for defining and adding custom colors that were better than the default choices; you could type the RGB values in by hand to add additional colors.


Section V: Designing Maps

In this section you'll learn how to create a finished map that includes typical map elements: legend, title, and source information.

Steps

  1. Set the environment for the print layout. Hit the print new button to enter the print layout screen. On the General tab in the Paper and quality menu on the right-hand side change the paper size from A4 to ANSI A (letter 8 1/2 by 11). The general tab provides you with options for the map canvas as a whole. Once you add individual items (a map, label, legend, etc) an item tab will appear, and if you have the item selected in the canvas you'll be able to edit its attributes in the item tab. Each tab has nested menus for editing various elements.
    Print Composer
  2. Add your map and configure zoom. Hit the add map button in the toolbar. Then draw a box on the map canvas, leaving an even amount of space on each side so there is a gap between the map and the edge of the canvas. If you don't get it right on the first try, you can always hover over an edge of the map, hold down the left mouse, and drag the edge to change the size. Or, to shift the entire map on the page, use the Select Move button. This button moves the entire map box. To shift the geography inside the map box, use the adjacent Move Item button. Move the map around so that the lower 48 states are roughly centered in the box. With the move item button selected, you can also change the zoom of the map by using the mouse wheel, or by clicking on the item tab on the right and experimenting with the scale in the map box. The regular zoom buttons on the toolbar will NOT effect the zoom of the geography; these zoom buttons just zoom you closer and further from the map canvas, similar to taking a piece of paper and holding it closer or further from your face. Experiment with them and see. When you're finished, with the map selected go to the Item tab, and under general option increase the outline width of your map from .3 to .5.
    Map Composer with Lower 48
  3. Add additional maps for Alaska and Hawaii. Given the vast distances between the lower 48 states, Alaska, and Hawaii, it doesn't make sense to include them in the same map window at the same scale; look at most maps of the US and Alaska and Hawaii appear in separate maps or boxes so that optimal scale can be achieved for all three areas; we'll do the same with our map. Hit the add map button and draw a smaller box in the lower left hand corner. Use the Move Item button to shift the focus of the map to Alaska, and with this button selected use the map wheel to change the zoom. If you have trouble getting the zoom "right", open the map menu on the item tab on the right, watch how the scale changes as you zoom in and out with the mouse wheel, type in an estimated scale that's somewhere in-between. Right below the scale in the menu is rotation, which is currently set to 0. You can type values here to rotate the items in the map from 0 to 359 degrees clockwise. Since Alaska looks a little skewed (since we're using a map projection for the whole continent and AK is on the edge) change the rotation to 330 to straighten Alaska out. Once you're finished, repeat the same step for Hawaii: add another map, zoom in to focus on the main eight islands, and rotate it by 320.
    Item tab and US map with AK and HI
  4. Add a legend. Hit the Add Vector Legend button and click on the lower right-hand corner of the map. The legend in 1.7 is a little buggy (depending on your operating system), but we can fix that once we finalize the items in our legend. Go to the Item tab and hit the Legend Item options. Select states_data in the list, hit the edit legend button, and change the name to Percent Total. You should also edit each data range to change the label to change our percentages to whole numbers. Open the Percent Total dropdown, select each range, hit the edit button, and type in the percentage values. Next, switch to the General menu on the Item tab and change the generic "Legend" title to NAICS 62 Employment. Hit the Title Font button and change the font to 12. Now, to fix the strange drawing bug (occurs in Windows XP, caused by a strange default size for the box) on the Item tab hit Item Options, and under Item Options hit the Position and Size button. Change the width and height to something reasonable (in our example, a value of 45 works well), hit Set Position and Close. That should fix the problem. The final step is to move the legend to an ideal position in the corner of the map (which may require you to shift the map around a bit).
    Editing legend items
  5. Add a title. Hit the Add label button. Click on the top of the map, and a generic label is added. In the label Item tab, change the default Quantum GIS label to Employment in the Health Care Sector in 2009. Change the font to 18 using the font button. On the Item tab open the General options menu and uncheck the option that says Show Frame. This will turn off the label outline. Click on the label in the map, and using the select move button, move the label to the top center of the map, and expand the size of the label box so the title appears on one line.
  6. Add a label with source information. Hit the add label button. Click on the bottom of the map to add the generic label. In the label Item tab, change the label to read: Source: US Bureau of Labor Statistics 2009. Change the font to size 8. On the item tab open the General options menu and uncheck the option that says Show Frame. Click on the label in the map, and using the select move button, move the label to the bottom center of the map, and expand the size of the label box so the text appears on one line.
  7. Add a label with author information. Repeat the same step above to add a label with your information - Map created by [insert your name / organization] [insert date]. Move this label underneath the source label.
  8. Add a north arrow (new in 1.7). Hit the add image button. Click somewhere to the right of the USA in the map, above the legend. Scroll through the picture options in the preview and select a simple north arrow. In the item tab for the image, go to the general options and turn the frame for the arrow off. Move the arrow around on the map to get it centered, and resize it to make it a bit smaller.
  9. Balance your map elements. At this point you should have all of your map elements in place. You may need to resize and shift elements around in order for the map to appear balanced. If you want to insure that boxes are lined up properly, you can hit the Select Move button and click on individual features while holding down the shift key to select multiple items. You can use the various align buttons to arrange elements in a certain way, and you can use the group button to bind several features together so you can move them in unison.
    Finished, balanced map
  10. Close the composer and save. Oddly, there is no save button within the composer (the one on the toolbar is for saving a template of your map, and not the map itself). Close the composer window, and back out at your map view save your project. This will save the map you just created. It's IMPORTANT that you save your map prior to printing or exporting it - this insures that if the export or print goes wrong or crashes, you won't lose your map. Once you save, hit the Print button, select the first composer from the list and hit show, and you'll be back to your finished map. If your map looks grainy or out of focus, don't worry - it's really ok. To assuage any worries, you can hit the Refresh button.
  11. Print to PDF. PDFs are good for maps that stand alone as their own document. The export to PDF function has been improved in 1.7 to remove bugs present in earlier versions. Before you export, make sure you don't have any map elements selected and return to the general tab for the map. Under Paper and Quality if you check the Save As Raster checkbox your resulting PDF file will be smaller in size and will draw a little quicker when opened, as the vector elements in your map will be stored as a simple images in the PDF. Save your map as a PDF file, hc_emp_2009.pdf, in your part 4 data folder. The screen will go blank and may hang for several seconds while the map is being exported. After a few moments you can click on the composer to reactivate it, or minimize and maximize QGIS to get back to the composer.
  12. Export as PNG. You can also save your map as an image file like a jpg or png. Normally we would want to design the map to be the size of the desired image, and we'd want to adjust the DPI quality (just above the Save as Raster checkbox) in the General tab to reduce it's size. For now let's change the DPI to 150, just so the resulting image isn't humongous. Hit the Save As Image button. Browse to your data folder for part 4 and save the map there as hc_emp_2009.png - MAKE SURE you type out the extension .png after the filename in the filename box - otherwise QGIS could freeze completely and you'll be forced to bail out. After you hit save, your screen will flash as it exports - just wait for it redraw and the export will be finished.
    Adjust DPI
  13. Take a look at your maps. Minimize QGIS and use your file browser to go to your part 4 data folder. Double click on the PDF file to open it in Adobe or your PDF viewing software. Double-click on your png file to open it in your default image viewing program (or open it with your web browser). Congratulations on creating a finished map!

Commentary

QGIS Map Composer: Some Details

In some GIS software packages the current view in the map window and the print layout are dynamically linked, and a change in one (such as adjusting the zoom) affects the other. This isn't the case with QGIS; the two are separate. If you do change something in the map view, such as reclassifying the data, you can update the map composer under the item tab for the map by hitting the Update Preview button. Changes in focus or zoom between the view and the composer are not connected at all, which relieves a lot of potential headaches.

The print composer allows you to customize minute details of the canvas, map, and legend, more so than other open source packages. The composer also gives you the ability to draw shapes or add portions of an attribute table directly to a map. One of the most recent developments is the ability to store more than one map in a single project. From the map view, you can use the Print New button to create new, individual maps, and the Print Composer button to manage your maps and choose a particular one to show or edit.

The composer still has a few quirks, as we've seen with sizing issues with the legend. A big weakness is the scalebar feature. While QGIS does allow you to insert a scalebar, the tool is difficult to use. The scalebar automatically takes the units of measurement used in the project, and there isn't a way to convert units on the fly. So you'll have a scalebar in meters, feet, or decimal degrees instead of kilometers or miles, which is of little practical use. This isn't a large issue if you're creating thematic maps, but if you're designing reference maps not having a scalebar is a problem.

General Map Design

When creating maps you need to design with the end use, format, and audience in mind. If you're designing a map that you're going to embed as an image in a document or web page, you should change the size of the canvas and design the map to the specifications for the document. Creating a full size 8 1/2 by 11 map and scaling or cropping the final image is a bad idea; you'll introduce distortion into the map and text will become illegible. You also need to think about page orientation; it's appropriate to map the United States using a landscape page layout, but if you were mapping an area that was taller rather than wider (South America) you'd want to flip the page to portrait.

Individual map elements (maps, title, arrow, legend, source text) should be balanced on the page to achieve some harmony; avoid lumping too many elements together or having large areas of white space. The title and legend should concisely and accurately describe what the map is about and what you are mapping. The amount of detail you provide and the terminology you use should vary with your audience; for example if we were going to circulate this map to the general public we may want to include a brief definition of NAICS 62 and what is included in it. You should always include the source of your data in the map. The fonts and north arrows should also be tailored to the map content; a title in calligraphy font and an ornate compass rose may look good if you're recreating one of Christopher Columbus' charts, but it would look rather stupid on our US health care employment map map. This may seem like an obvious thing to point out, but the Internet is rife with bad maps where people have done just that.

Maps are a method of communication, designed to send a message. Like a book or article that is poorly written, maps that are poorly designed will fail because they do not effectively communicate their message to their audience. Some reasons why maps can flop:

Output Formats

PDFs are a good format for creating stand-alone documents. PDFs are also a vector-based file, meaning that the geometry of every shape (point, lines, and polygons) is stored as a series of coordinates. If you're working with vector features to begin with, the output in the PDF should be fairly smooth, and if you zoom in to the document you should see all of the detail stored in the original file. If the PDF takes too long to open or draw or the PDF file is too large, you may want to consider the option to save the map as a raster within the PDF. The problem with PDFs is they are stand-alone; SVG files are emerging as a vector format that can be embedded in documents but it is still not widely supported (the SVG export in QGIS is also a work in progress).

Image formats are raster-based, meaning that the image is composed of individual pixels or grid cells. Rasters are designed for a specific scale; zoom in too close and the image quality deteriorates as each individual cell becomes more distinct. Raster's can stand alone or can be embedded in documents. PNG files are an open format, compressed raster. They're a good alternative to jpgs; they have better image quality and are widely supported. Tif files are a lossless, uncompressed format - use these only if you need to preserve the image at its highest quality (these files get pretty big). When exporting to a raster, be sure to adjust the dpi (dots per inch) setting, which will adjust the resolution of the image (and affect it's size and quality).

When printing hard copy maps, what you see on the screen is not exactly what you'll get on paper, so be prepared to print test copies and go back and revise. Because there are different screen resolutions and different printers (in terms of print method and quality) colors and outlines will vary. The current ink levels in the printer will also have an impact on how bright or dull the final output is.


Section VI: Adding Labels

In this section we'll go back and add some labels to our map. Like symbology, two different labeling systems (an old stable one and a new experimental one) have existed side-by-side in QGIS for several versions now, and it's likely in the near future that the new version will replace the old entirely. The old labeling system is available in the Labels tab under the Properties menu for a particular layer; we've used this briefly in Parts 2 and 3 of this tutorial. The new system is available via labels button on the toolbar. We'll use the new system in this part.

Steps

  1. Turn labels on. Close the print composer and go back to your QGIS map view. Select states_data in the ML and hit the labels button. On Label Settings check the box to Label this layer. In the Fields with labels dropdown choose ST as the label field (this field has the two-letter postal code for each state). Change the size of the text from 8.5 to 8. Hit the Advanced tab, and on that tab click the Over Centroid radio button, and under the buttons move the Priority slider all the way over to High. Hit OK to apply the label settings.
    New labeling menu
  2. Inspect the labels. At first glance the label placement looks pretty good, and is a vast improvement over previous versions of QGIS. There are a few small issues; the labels for Florida and Louisiana look a little off center. And if you're zoomed out so the contiguous 48 states fill the screen, the labels for Washington DC and Rhode Island are omitted, as they would overlap with labels of neighboring states. With a little extra work we can fix that.
    FL off center
  3. Add new columns to the attribute table. The labels are automatically placed in the center of the state. In order to define and store a specific position for them, we have to add some new columns to the attribute table. And to do that, we'll have to enter an edit mode so we can actually modify our file. Open the attribute table for states_data. Hit the edit button at the bottom of the table. Hit the New column button. In the Add Column window name the new field label_x. Assign it a Decimal number type. Give it a width (number of characters) of 10 and a precision (number of decimal places) of 3. Hit OK, and the new column gets tacked on at the end of the table. The label_x column will hold the x (latitude) coordinates for our label. But we need a second column to hold our Y coordinates (longitude). Repeat the previous step to add a second column called label_y. Finally, add a third column called rotation, and give it the same attributes. Once you've added it, hit the edit button to save the changes, and the columns become permanent. Close the attribute table.
    Adding a new column
  4. Update label menu settings. Before we can start moving labels we have to tell QGIS to store the positions for our labels in these new fields. Hit the labels button and on the labels menu go to the Data defined settings tab. Scroll down in this window to the Position box. In the dropdown for X Coordinate, select the x_labels field. In the dropdown for Y coordinate, select the y_labels field. For Rotation, select the rotation field. Hit OK.
    Define the labels settings
  5. Move the label for Florida. With states_data selected in the ML, hit the edit button to enter an edit mode. You'll see each state outlined with little circles; these are the individual nodes that make up the points of each polygon, and this is your clue that you're in an editing mode. You'll also see that the move label button on the toolbar is now active. Hit the button, and you'll see a crosshairs as you move across the map. Adjust your map so that Florida (FL) is visible and centered. Move the crosshairs over the FL label, hold down the left mouse button, drag the label to the center of the state, and release.
    Adjusting labels
  6. Adjust additional labels. Do the same to move the label in nearby Louisiana (LA). Then, use the pan tool to move to the northeastern US, then reactive the move label button. Move the label for Maryland (MD) to the north and the label for DC to the south so that both will be visible. Adjust the labels for Delaware (DE) and Rhode Island (RI) by moving them off the coast, so that the states themselves are more visible. The labels aren't going to look right at this scale, so zoom out back to the continental US to make sure the labels look OK at that scale. Once you're satisfied, hit the edit button to stop editing and save your edits. You may have to enter the edit mode, move labels, and exit a few times until you get the labels right (as it may be difficult to see their placement in the edit mode). When you're finished, you can open the attribute table for the layer, scroll to the right, and you'll see that coordinates are stored in the y_label and x_label field for the labels you moved. Close the table.
    NE labels adjusted
  7. Adjust rotation for AK and HI labels. Even though they may look fine in our map view, our labels for Alaska and Hawaii are going to look askew when we re-open our Map Composer. This is because we rotated the maps of AK and HI so that they appeared "normal" in orientation relative to the rest of the country. So, we also have to alter the rotation for the labels to match. Enter an edit mode. Hit the change label button. Zoom up to Alaska and click on the AK label. At the bottom of the Labels properties box type 330 in rotation and hit OK. (330 is the number of degrees we rotated Alaska in the map composer - you could go back into the composer to find this info). Repeat the same step for Hawaii, but specify a rotation of 320. Exit the edit mode and save the changes.
  8. Update your map composer. Hit the print button and show Composer 1. You should see all your map labels - don't worry if they appear overlapped; they should turn out fine in the export. If you don't see the labels, select each map and under the Item tab and hit Update Preview. If your legend appears funny (this may be a bug depending on the operating system you're using) just select the legend and under Item, Item Options, hit the Position and Size button and reset the length and width of the legend to something sensible (like 45) and apply the changes.
  9. Save and export. Close the map composer and back in the map view hit the save button. Then go back to the map composer and export your map (remember - we exit, save, and return just in case the export crashes). Print your map out as a PDF or save it as an image. Save it in your part 4 data folder as hc_emp_2009_labels.png (or .pdf). Minimize QGIS, go to your part 4 data folder, and take a look at your final map.
    Final map with labels

Commentary

Labeling in QGIS

Features can be displayed and differentiated from each other using text. For example, the standard cartographic convention for labeling bodies of water is to use an italic font and, when possible, a dark blue color. The size of a label indicates the hierarchy of the feature - oceans have larger fonts than seas, which have larger fonts then rivers, larger than streams, etc. Land features are labeled in black, or anything that isn't blue, and are never written in italics. Larger features, land or water, may be written in all capital letters, while smaller features are in lower case.

ATLANTIC OCEAN GULF OF MEXICO Lake Ontario Hudson River

UNITED STATES NEW JERSEY Philadelphia Trenton

Automatic labeling placement in QGIS, and the ability to move labels and customize them, has vastly improved in the latest versions of QGIS. There are some other options at your disposal:

Thematic Maps and Symbols

In this tutorial we worked through an example for creating a shaded area or choropleth map. However, there are a number of other techniques that you can use to create a thematic map. QGIS also supports graduated symbols for point and line layers, where the relative size of the symbol (a circle, square, line, or image) represents a value (if you look at the style tab for a point layer, you can change the legend type to graduated symbols). If you have a polygon layer that you'd rather map as graduated circles (instead of shaded areas) you have to convert it to a point layer first (you can do this under Vector > Geometry Tools > Polygon Centroids).

Symbols are used to show qualitative data (name or feature type) or quantitative data (proportions or numbers) and are often divided into four types:

Symbols are often designed to mimic the features they represent, i.e. airplanes for airports, little buildings with flags to represent schools, etc (these are all examples of nominal symbols). In some cases, features may be represented with geometric shapes (circles, squares, triangles) that can be easily distinguished on small scale maps. Some features may be represented using a standard convention for classifying them, i.e. mining maps may label minerals based on their abbreviation in the periodic table - Sn for tin, Pb for lead, Cu for copper, etc.

A single symbol can be used to identify a feature. Varying the size or color of the symbol can indicate quantity. The width and color of roads on a map is highly standardized to show the type of road and volume - thick blue roads are interstate highways, thick green roads are toll highways, thinner red roads are US highways and thinner black roads are state or local roads (all ordinal symbols).

Considerations and Next Steps

Now that we have mapped this data - what does it mean? How would you interpret this map? Are there any spatial patterns to the data (clustering) or does it appear more or less random? Maps have the ability to answer questions but also raise new ones. In order to understand what's going on, we have to become familiar with the underlying dataset. What kinds of occupations are included in the Health Care Sector, and how might that explain the distribution across different states?

For more practice, something else to try:

<--- Back     Next --->