ES queries

Environment variables

ES_CLUSTER_DNS=$(aws es describe-elasticsearch-domain \
                     --domain-name $ESTEST_ELASTICSEARCH_DOMAIN \
               | jq .DomainStatus.Endpoint --raw-output)

ES_CLUSTER=http://$ES_CLUSTER_DNS

Under the covers ES uses Apache Lucene to create and manage the inverted search indices, but it exposes a friendly REST API. I'm going to interact with the REST API using curl, but normally you'd use the client library for your programming language. The client library has the added benefit in that it will be aware of which nodes it should talk to. Of course there's an ES client for your favorite programming language, including Python, Java, Ruby, Golang.

From https://github.com/mspnp/azure-guidance/blob/master/Elasticsearch-general.md:

Elasticsearch is a document database highly optimized to act as a search engine. Documents are serialized in JSON format. Data is held in indexes, implemented by using Apache Lucene, although the details are abstracted from view and it is not necessary to fully understand Lucene in order to use Elasticsearch.

About ES clients: Does ES provide a client that does discovery, so you don't have to discover...you can give it a list of IP addresses or nodes, and their Java client will go even further and join the cluster as a client node....it is aware of cluster state (e.g. secondary is promoted to client). You don't need to give all teh nodes, just seed nodes so it will get all the nodes in the cluster.

Add a record to an index

curl -s -XPOST "$ES_CLUSTER/megacorp/employee/1" -d '
{
    "first_name": "John",
    "last_name": "Smith",
    "age": 32,
    "about": "I like to build cabinets",
    "interests": ["sports", "music"]
}' | jq .

[Highlight "megacorp"] 'megacorp' is the index. You can think of the index as analogous to database.

[Highlight "employee"] Within the index, you have types, which are analogous to database tables. It's basically a logical categorization within your index.

[Highlight '1'] Within an index, you can store as many documents as you'd like. Here, I'll add two new documents, with ID=1 and ID=2.

TODO: Create slide

index    <-->  database
types    <-->  table
document <-->  row

Add a second record

curl -s -XPOST "$ES_CLUSTER/megacorp/employee/2" -d '
{
    "first_name": "Jane",
    "last_name": "Smith",
    "age": 23,
    "about": "I like to collect rock albums",
    "interests": ["music"]
}' | jq .

You can also manage the cluster via the REST API as well

Show cluster status

curl -s $ES_CLUSTER | jq .

Note: In these examples, my cluster is wide-open so I didn't have to sign my requests. If you've locked it down to specific IAM principals, you would need to sign your requests. So you would use the AWS SDKs, rather than the standard ES clients.

Appendix

Code in Python using standard HTTP requests module (requests)

import requests
import sys

url='http://search-estest-domain-xnzukkovs6px3wt2zhsy7t2si4.us-west-2.es.amazonaws.com'
json1 = """
{
    "first_name": "John",
    "last_name": "Smith",
    "age": 32,
    "about": "I like to build cabinets",
    "interests": ["sports", "music"]
}"""
r = requests.put(url + "/megacorp/employee/1", data=json1)
print(r.text)