How to interrogate the wikipedia api for a city in the United States

advertisements

Given a string of text like: San Francisco, California I'm trying to use a jquery get request to get back the Wikipedia page for that city. I found the Wikipedia web service API but I'm having trouble getting it to work. My goal is to get only the overview section.

I tried pointing my browser at the following url:

https://en.wikipedia.org/w/api.php?action=query&titles=San%20Francisco,%20California&prop=revisions&rvprop=content&format=jsonfm but it came back as:

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.
Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.
See the complete documentation, or the API help for more information.
{
    "batchcomplete": "",
    "query": {
        "pages": {
            "19946864": {
                "pageid": 19946864,
                "ns": 0,
                "title": "San Francisco, California",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "#REDIRECT [[San Francisco]]\n{{R from city and state}}"
                    }
                ]
            }
        }
    }
}

If I add &redirects then I get the expected response:

https://en.wikipedia.org/w/api.php?action=query&titles=San%20Francisco,%20California&prop=revisions&rvprop=content&format=jsonfm&redirects

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.
Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.
See the complete documentation, or the API help for more information.
{
    "batchcomplete": "",
    "query": {
        "redirects": [
            {
                "from": "San Francisco, California",
                "to": "San Francisco"
            }
        ],
        "pages": {
            "49728": {
                "pageid": 49728,
                "ns": 0,
                "title": "San Francisco",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "{{About|the city and county in California}}\n{{pp-move-indef}}\n\n{{Use mdy dates |date = July 2016}}\n\n{{Infobox settlement\n<!--See the table at Template:Infobox settlement for all fields and descriptions of their usage-->\n| name = San Francisco, California\n| settlement_type = [[Consolidated city-county]]\n| official_name = City and County of San Francisco\n| image_skyline = Golden Gate Bridge, SF (cropped).jpg\n| image_caption = San Francisco and the [[Golden Gate Bridge]] from [[Marin Headlands]]\n| image_flag = Flag of San Francisco.svg\n| flag_size = 100px\n| image_seal = Seal of San Francisco.png\n| seal_size = 100px\n| nickname = ''The City''; ''The City by the Bay''; ''Fog City''; ''San Fran'';{{efn|This name, like Frisco, has often been discouraged amongst Bay Area natives.}} ''Frisco'' (''locally disparaged'');<ref name=\"Frisco okay\" /><ref name=\"Don't Call It Frisco\" /><ref name=\"Frisco\" /><ref name=\"Friscophobia\" /> ''The City that Knows How'' (''past'');<ref name=\"The City that Knows How\" /> ''[[Baghdad]] by the Bay'' (''past'');<ref name=\"Baghdad by the Bay\" /> ''The Paris of the West''<ref name=\"The Paris of the West\"/>\n| motto = ''Oro en Paz, Fierro en Guerra'' (Spanish)<br />(English: \"Gold in Peace, Iron in War\")\n| image_map = California county map (San Francisco County enlarged).svg\n| ma

If I try another city,state query it works:

https://en.wikipedia.org/w/api.php?action=query&titles=Martinez,%20California&prop=revisions&rvprop=content&format=jsonfm&redirects

This page corresponds with Marintez, California

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.
Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.
See the complete documentation, or the API help for more information.
{
    "batchcomplete": "",
    "query": {
        "pages": {
            "107407": {
                "pageid": 107407,
                "ns": 0,
                "title": "Martinez, California",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "{{Use mdy dates|date=March 2016}}\n{{Use American English|date=March 2016}}\n\n{{Infobox settlement\n<!-- See the table at Template:Infobox settlement for all fields and descriptions of their usage. -->\n| official_name              = City of Martinez<ref>{{cite web |url=http://www.cityofmartinez.org/ |title=Homepage |publisher=City of Martinez |accessdate=November 20, 2014}}</ref>\n| settlement_type            = [[City (California)|City]]\n<!-- Images and maps ---- -->\n| image_skyline              = Aerial view of Martinez, California.jpg\n| image_caption              = Aerial view of Martinez\n| image_seal                 =\n| image_map                  = Contra Costa County California Incorporated and Unincorporated areas Martinez Highlighted.svg\n| mapsize                    = 250px\n| map_caption                = Location in [[Contra Costa County, California|Contra Costa County]] and the state of [[California]]\n<!-- Location ----------- -->\n| pushpin_map                = USA\n| pushpin_map_caption = Location in the United States\n| latd = 38 |latm = 01 |lats = 10 |latNS = N\n| longd = 122 |longm = 08 |longs = 03 |longEW = W\n| coordinates_display        = inline,title\n| coordinates_region         = US-CA\n| subdivision_type           = [[List of sovereign states|Country]]\n| subdivision_name           = {{USA}}\n| subdivision_type1          = [[U.S. state|State]]\n| subdivision_name1          = {{flag|California}}\n| subdivision_type2          = [[List of counties in California|County]]\n| subdivision_name2          = [[Contra Costa County, California|Contra Costa]]\n<!-- History ------------ -->\n| established_title          = [[Municipal corporation|Incorporated]]\n| established_date           = April 1, 1876<ref>{{cite web\n | url = http://www.calafco.org/docs/Cities_by_incorp_date.doc\n | title = California Cities by Incorporation Date\n | format = Word\n | publisher = California Association of [[Local Agency Formation Commission]]s\n | accessdate = March 27, 2013}}</ref>\n<!-- Government --------- -->\n| government_type            =\n| leader_title               = [[Mayor]]\n| leader_name                = Rob Schroder<ref>{{cite web\n |url=http://www.cityofmartinez.o

But now the problem is how will I get into the overview and scrape it out. It seems like it should be possible to only query for the overview or summary.


To get the response as json instead of html you should use

format=json

Instead of

format=jsonfm

The jsonfm returns a "pretty-printed" HTML result good for debugging.

Here is the request you should use:

https://en.wikipedia.org/w/api.php?action=query&titles=San%20Francisco,%20California&prop=revisions&rvprop=content&format=json

You can also use format=xml and format=php for xml output or php serialized value.