rummaging a data set: identifying the gaps

“Media archaeology rummages textual, visual, and auditory archives,” says Erik Huhtamo and Jussi Parikka, “as well as collections of artifacts, emphasizing both the discursive and the material manifestations of culture” [1]. The archive that I have been rummaging is a data set extracted from the New York City Open Data web site [2]. However, I must qualify this equation of the data set as an archive. The data set certainly resembles the OED’s definition of an archive, as a “place in which public records are kept” [3]. More specifically, the place is the Internet, the record is each entry on graffiti, and the preserve of the record is secured by electronic storage and transmission.

The data set, however, is not quite an example of a textual, visual, or an auditory archive. As my review of the data set below will show, the attributes point primarily to times, places, and jurisdictions. It is an archive then of references to a particular kind of artifact and that is: walls modified by illegal inscription and restored by legal erasure. While it is in this way that the data set archives explicitly the material manifestation of culture, the absence of any reference to the variety of bodies that read and authored the walls un-archives the cultural discourse that took place. These bodies include, but are not limited to, the original author of the graffiti, the city’s street investigator that scheduled the graffiti for removal, and the removal crew that operated on the location. Each one of these bodies performed a reading and writing of urban space, constituting the space as a site of discourse. While I would like to consider this further in a later post, my aim here is to describe the data set, to identify the omissions it produces and explain why the data set needs to be transformed in order to perform a media archaeology on the city. That is, to borrow Huhtamo and Parikka’s framing of archaeology, to uncover the buried paths in the near past which might help us to find our way into the future.

NYC OpenData - Graffiti Locations (Screenshot)

The following is an outline of the data set as found on NYC Open Data:

The data set is entitled “Graffiti Locations”
14,000 records in the data set
Data set last downloaded on October 29, 2011
The first recorded entry is dated September 13, 2010
The last recorded entry is dated August 18, 2011
According to published meta data

The last update of the data set on NYC Open Data occurred October 24, 2011
The data set is updated monthly
The data is provided by the Department of Sanitation (DSNY)

Data set attributes and formats, presented here in the order that they appear online:

Incident Address Display (Text, e.g. “1 AUDUBON AVENUE”)
Borough (Text, e.g. “MANHATTAN”)
Community Board (Text, e.g. “12 MANHATTAN”)
Police Precinct (Text, e.g. “Precinct 33”)
City Council District (Number, e.g. “10”)
Created Date (Date Format, i.e. “MM/DD/YYYY HH:MM:SS AM -0500/-0400”)
Status (Text, i.e. “Open” or “Closed”)
Resolution Action (Text, “Cleaning crew dispatched. Property cleaned.”)
Closed Date (Date Format, i.e. “MM/DD/YYYY HH:MM:SS AM -0500/-0400”)
X Coordinate (Number, e.g. “1,000,966”)
Y Coordinate (Number, e.g. “244,894”)

The meta data for the data set is accessible under a maroon “About” button. The data set is described as “Addresses, current status, and coordinates of requests to clean graffiti (other than bridges or highways) received from the public and SCOUT in the last 12 months.” SCOUT is New York City’s “Street Conditions Observation Unit”. It is part of the Mayor’s Community Affairs Unit [4]. SCOUT supports the City’s 311 system by inspecting the condition of every city street about once a month. From the metadata we gather that the two government bodies responsible for graffiti removal is the Community Affairs Unit in the Mayor’s Office and the Department of Sanitation.

Certain key subtractions are performed as the data set is updated over time. First, since the data set only presents the last 12 months of records, it truncates records regardless of whether the graffiti was successfully removed or not. Second, the data set only presents the most recent record on a graffiti location. This means that any prior activity related to a particular graffiti location is removed and replaced with only the most recent event. Third, each record is intended to identify one graffiti mark per location. A location with multiple graffiti tags may have multiple entries. As a consequence, it is not possible to distinguish one graffiti marking from another at a location. Fourth, while police and governing jurisdictions are identified for each location, the reporting bodies (SCOUT, general public) and removal teams are omitted along with their particular mechanism of reading or re-writing the space.

These appear to be reasonable omissions in light of the purpose of the data set. The only records worth storing and transmitting are those that aid in coordinating the graffiti removal process – a process that is meant to affect measurable improvement in the quality of life in the city. As Mayor Michael Bloomberg put it: “Not only is [this] technology helping us to speed up the delivery of services, it’s also helping to make City government more accountable” [5]. But then, why identify the council district or police precinct in the data set? Neither are among the three government agencies directly responsible for the process of graffiti removal. Also, why truncate the data to include only the past 12 months? Data generated since the opening of the project is needed to help determine if the process affected faster and more accountable deployment of city resources.

To redress these perceived omissions, I am importing the data in a new, self-hosted database. This allows me to create a data model that will maintain a history for each graffiti location that is not truncated over time. It also allows permits me to add documentation to the city defined data set – documentation such as photos and field notes from the location. Furthermore, by transforming the data set an analysis and visualization can be performed free from the constraints of the City’s data model – a decidedly desk bound data model that erases the situated context within which the data was performed and traced. It is no good to rummage a data set of references without rummaging among those spaces, i.e. reading the walls. My aim in transforming the data is to restore the various reading and writing paths for which this data set represents a malformed trace.

References
[1] Huhtamo, Erkki, and Jussi Parikka. Media Archaeology: Approaches, Applications, and Implications. University of California Press, 2011, p. 3.
[2] “Graffiti Locations.” NYC Platform, n.d. http://nycplatform.socrata.com/dataset/Graffiti-Locations/mcd4-i5wd.
[3] “archive, n.”. OED Online. September 2011. Oxford University Press. http://www.oed.com/view/Entry/10416?rskey=UQ5Llx&result=1&isAdvanced=false (accessed November 21, 2011).
[4] “NYC*scout.” Mayor’s Office of Operations, n.d. http://www.nyc.gov/html/ops/html/data/street.shtml.
[5] AT&T Wireless Solution Provides Fast Lane for Citizen Complaints. Case Study, 2009. http://www.wireless.att.com/businesscenter/en_US/pdf/NYCSCOUT-CaseStudy.pdf.

rummaging a data set: identifying the gaps

Recent Posts

Categories

Links

Archives

Categories