Elasticsearch Python Client indexing JSON

advertisements

I ran in a problem while playing around with the Elasticsearch Python Client. I have (valid!) JSON in a file called test.json. I now want to index that JSON in elasticsearch. I tried this little Tutorial to check if I can connect to my local elasticsearch instance and it worked, so i believe the problem is not in my connection to elasticsearch.

When I run my little code here:

from elasticsearch import Elasticsearch
import json

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

with open('test.json') as json_data:
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data))

I get this exception (mapper_parsing_exception?) on my command line:

    Traceback (most recent call last):
  File "app.py", line 13, in <module>
    es.index(index='testdata', doc_type='generated', id=1, body=json.load(json_data))
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse')

Could you point me in the wright direction, what might be the problem?

Ahh yeah, I printed the "json.load(json_data)" ant that worked perfectly, meaning there is no problem with loading the JSON from the file.

Appreciate your help! Greez

Update:

with open('test.json') as json_data:
    #d = json.load(json_data)
    print(json_data)
    es.index(index='testdata', doc_type='generated', id=1, body=json_data)

This code doesn't work either, i can't even print the json to the CL.

Error now:

<open file 'test.json', mode 'r' at 0x7f8329340c00>
Traceback (most recent call last):
  File "app.py", line 14, in <module>
    es.index(index='testdata', doc_type='generated', id=1, body=json_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 284, in perform_request
    body = self.serializer.dumps(body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/serializer.py", line 50, in dumps
    raise SerializationError(data, e)
elasticsearch.exceptions.SerializationError: (<closed file 'test.json', mode 'r' at 0x7f8329340c00>, TypeError("Unable to serialize <open file 'test.json', mode 'r' at 0x7f8329340c00> (type: <type 'file'>)",))

Thats the content of the test.json file (just some random generated json):

[
     {
        "_id": "58ee19e75ffc814d4dff17da",
        "index": 0,
        "guid": "45476739-80b3-49de-8f00-9923f84f56ce",
        "isActive": true,
        "balance": "$2,882.08",
        "picture": "http://placehold.it/32x32",
        "age": 31,
        "eyeColor": "blue",
        "name": "Liliana Odom",
        "gender": "female",
        "company": "PLASTO",
        "email": "[email protected]",
        "phone": "+1 (983) 474-3785",
        "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593",
        "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.",
        "registered": "2015-05-07T05:40:28 -02:00",
        "latitude": -46.141522,
        "longitude": -157.943368,
        "tags": [
          "labore",
          "quis"
        ],
        "friends": [
          {
            "id": 0,
            "name": "Earline Bass"
          }
        ],
        "greeting": "Hello, Liliana Odom! You have 5 unread messages.",
        "favoriteFruit": "apple"
      }
    ]

Update 2:

I tried this now:

id = 1
with open('test.json') as json_data:
    data = json.load(json_data)
    for dat in data:
        print(json.dumps(dat))
        es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat))
        id += 1

print(json.dumps(dat)) works, but i now get an illegalArgumentException:

Traceback (most recent call last):
  File "app.py", line 15, in <module>
    es.index(index='testdata', doc_type='generated', id=id, body=json.dumps(dat))
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 128, in perform_request
    self._raise_error(response.status, raw_data)
  File "/home/elk/Documents/pythonelastic/venv/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 124, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Bloodstorm][127.0.0.1:9300][indices:data/write/index[p]]')

Update 3: Hereis the ES log, looks like the id field is defined twice in this index.

[2017-04-12 17:43:07,847][DEBUG][action.index             ] [Bloodstorm] failed to execute [index {[testdata][generated][AVti1SY7fn4azWzi8gyQ], source[{"guid": "45476739-80b3-49de-8f00-9923f84f56ce", "index": 0, "favoriteFruit": "apple", "latitude": -46.141522, "company": "PLASTO", "email": "[email protected]", "picture": "http://placehold.it/32x32", "tags": ["labore", "quis"], "registered": "2015-05-07T05:40:28 -02:00", "eyeColor": "blue", "phone": "+1 (983) 474-3785", "address": "121 Sedgwick Place, Farmington, Marshall Islands, 2593", "friends": [{"id": 0, "name": "Earline Bass"}], "isActive": true, "about": "Adipisicing veniam ex nulla irure minim incididunt et irure est nostrud ex ut. Occaecat eu proident eu pariatur deserunt aliquip. Commodo ullamco incididunt consequat quis commodo irure elit quis. Aute et reprehenderit ad ipsum magna cupidatat magna minim sunt labore mollit occaecat. Dolore sint veniam deserunt excepteur.", "balance": "$2,882.08", "name": "Liliana Odom", "gender": "female", "age": 31, "greeting": "Hello, Liliana Odom! You have 5 unread messages.", "longitude": -157.943368, "_id": "58ee19e75ffc814d4dff17da"}]}] on [[testdata][3]]
java.lang.IllegalArgumentException: Field [_id] is defined twice in [generated]
        at org.elasticsearch.index.mapper.MapperService.checkFieldUniqueness(MapperService.java:496)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:376)
        at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:320)
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:306)
        at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:230)
        at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:480)
        at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:784)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)


Given the structure of your test.json file, you need to parse it and then iterate over each document in the array:

with open('test.json') as raw_data:
    json_docs = json.loads(raw_data)
    for json_doc in json_docs:
        my_id = json_doc.pop('_id', None)
        es.index(index='testdata', doc_type='generated', id=my_id, body=json.dumps(json_doc))