Performance tips

Tradeoffs of new index every day, or single index

Most will create a single index per day. You can drop an index (which is faster than removing a lot of records) TTLs have been removed in 2.0...they were so inefficient (we try to stay away from TTLs in general) [check: have they been deprecated or removed High rates of ingest with near analytics (1B records, neartime analytics within 30 minutes) We saw little returns on creating our own index. If you're doing unique counts on high-cardinaltiy fields, you can hash it so you can get unique counts more efficiently. There's a lot of tuning so you can throttle writes.

They would index everything into an INVERTED INDEX.

Nuxeo does a content management system (data is in Postgres), also write it to ES. They try to force the refresh.

Should ES be used as a datastore? In 2.0, they have focused on resiliency, durability...

  • They are trying to move to a 6-week release cycle.

When you bump up to two-replicas, the ingestion rates go up significantly. Replicas give you better performance on the query side.

Ingest, and then hit API to refresh and commit, and then add replicas. Every single index has a certain number of shards (mod, not consistent hash).

Create single index, and ingest as high as we can. Take your notes, multiply by 2 or 4 (around 3). If you have a 5 node cluster, then create 20 shard index (based on load testing). Double the number of nodes + 1.

When you bump up to two-replicas, the ingestion rates go up significantly. Replicas give you better performance on the query side.

Ingest, and then hit API to refresh and commit, and then add replicas. Every single index has a certain number of shards (mod, not consistent hash).

Create single index, and ingest as high as we can. Take your notes, multiply by 2 or 4 (around 3). If you have a 5 node cluster, then create 20 shard index (based on load testing). Double the number of nodes + 1.

Found runs on AWS -- "would you recommend Found? Yes, I would. I would want to put my compute next to my data."

Sony is using ES for security monitoring (people trying to figure out what new video games are coming out for PS4). Netflix is running a 2000-node cluster.

One of the VPs is over at Elastic now. If you can afford it, Splunk is fantastic - they give you the solutino.

Consistency concerns - Trent stores the catalog recors in ES, but currently stores transactional records elsewhere

Maintenance - Elasticsearch is fantastic as far as maintenance...does require some additional knowledge or skillsets.

Concur has Kibana at every desktop. They use Kibana. Using it for logging and analytics, putting customer analytics in there.


http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/what-is-amazon-elasticsearch-service.html

Domain Change Scaling Guidelines

Increase in data quantity

Increase in data size

Use the following guidelines to scale for both increased data quantity and data size:

Choose a larger instance type or add additional instances

Increase the size of the EBS volume

Increase in traffic due to Amazon ES request volume and complexity

Use the following guidelines to scale for increased traffic:

Choose a larger instance type

Add additional instances

Add replica shards

Replica shards provide failover; a replica shard is promoted to a primary shard if a cluster node containing a primary shard fails. For more information about replica shards, see Replica Shards in the Elasticsearch documentation.

Customer anti-patterns

Near-time streaming analytics (on of Trent's customer has a 24-node ES cluster)