Graves & Pitarka HPC Workflow Porting from BlueGene/P to POWER6

Sung Bae posted on Oct 04, 2016

Moved to Verification of workflow porting from Foster to Fitzroy

Paraview quick tour

Daniel Lagrava posted on Oct 02, 2016

What can Paraview do for me?

Paraview is a visualization software capable of reading almost any kind of input file and do 2D and 3D rendering of scientific data.One can go from a basic CSV file to more complex formats like the VTK based ones (http://www.vtk.org/wp-content/uploads/2015/04/file-formats.pdf).

It allows you to improve the presentation of your results for scientific and general public.

(I cannot upload files so I have some images here: https://drive.google.com/open?id=0B3o2OfQ3KYoqWS1kOTJ1RTZoZzQ)

Some useful ideas:

It can add tracers to vectorial quantities
Break a 3D domain into planes to show what is happening with some of the quantities
Paraview can actually do calculations with the vectors on your results: For example in my case, after a very long simulation I needed a quantity that I did not simulate, the enstrophy. However, I had the vorticity and its integral gave me the result I wanted.
Add streamlines

Animations, oh yeah!

Perhaps one of the most coolest awesomest feature in Paraview is to make animations quite easily.

In order to animate something you can either provide:

A set of input files that are numbered sequentially. In that case we can get a temporal animation.
A single file. In this case we can animate the camera to visit certain regions of the rendered results in order to show something interesting that is happening
- In this example I also used a feature called "save state..." which will write all your interactions with Paraview in the current session to disk. It is very useful to load a particular set of interactions and improve them.

Paraview and OpenSees

The PVD recorder works, yay!
The PVD recorder does not work on parallel, duh!
However, we can make a very simple python script to translate the input + outputs into a suitable format. I was almost there last week, but when trying to re-write the code to be nicer I erased most of it. Anyways, it can be done and it can be an alternative to the recorder.

Some last thoughts

Paraview works fine, is free but the learning curve may be difficult
Sung and I can always try to help into getting some nice animations
"Paraview works on parallel" is kind of a lie, at least in my past life experience.
A lot of resources around the internets!

QuakeCoRE-SCEC Database meeting

Sung Bae posted on Sep 27, 2016

Meeting held on the 30 Sep 2016 at 9:30 NZDT

Proposed Agenda

Introduction
Background: QuakeCoRE SeisFinder
SCEC experience
- Overview
- Size
- Number of visitors and who they are expected to be (general public?)
- HW, SW, network?
- Issues around using SQL for this particular type of project. Advantages?
- Have they considered any other alternative that does not use SQL? For example Hadoop (large storage, map-reduce operations) https://github.com/Esri/gis-tools-for-hadoop
- Maintenance: how many people need to be working on the project in order for it to remain usable.
QuakeCoRE requirement
- Support for large size data
- Quick response
- Support for many concurrent queries : http://stackoverflow.com/questions/16628329/hdf5-concurrency-compression-i-o-performance
- Flexible/scalable to add more fields for retrieval
- Support for geographic data type : Need to query "nearest" points.
- hosting (external/internal)
Others
- QuakeCoRE: Workflow. Currently with loadleveler multistep job. Will be incorporating cylc. https://cylc.github.io/cylc/

Notes after the meeting

SCEC database (DB) has a large number of seismic information, in the order of 22 Billion seismic entries. They have started seeing increased query times. They use MySQL as the backend. The DB runs on a grunty machine with 128 GB of RAM and 24 cores under Fedora 24.They also have another machine that is not public facing with smaller specs. Other points here:
- Query performance has increased as the DB has gotten bigger (~ 4Tb)
- They will try to reduce the size of the most problematic tables to improve performance.
- They want to split the current huge DB into two: Production DB (containing only latests analysis) and a read-only SQLlite based DB with less used data.
Their database is not for the public at this point, it is mostly used by researchers. The usage pattern is quite bursty and not constant. It depends mainly on the researchers that require the data.
Besides the performance issues they have noticed with MySQL, other issues related to be noted are:
- Backup will increase as the DB size increases
- Updating MySQL is a complex operation. Nevertheless, it is always a good idea to update for bug fixes/performance improvements/new features.
- Note that they have dedicated staff to administer the machine and the DB
Thinking about a solution that does not use MySQL is not feasible, as they have build a stack on top of it that relies on the DB backend to provide certain features.
- We should be cautious in analyzing our requirements so that we choose a suitable DB for our needs as well.
When discussing about Hadoop, Scott's intake is that it does not suit a querying serving problem as the one they face. On top of it, it requires a custom filesystem and they do not think that their. Basically they have not found a solution that would be so appealing that they would move from the current one.
They seem interested in using HDF5 as a standard for some of the output produced by the codes they use, as currently they have several custom binary formats.
Their workflow uses Pegasus (https://pegasus.isi.edu/)

GM Simulation Working group meeting minutes and discussion

Sung Bae posted on Sep 22, 2016

Sep 21, 2016

Brendon Bradley, Sung Bae, Hoby Razafindrakoto, Seokho Jeong, Viktor Polak, Sharmila Savarimuthu, Kevin Foster, Chris De la Torre, Ethan Thomson, Chris McGann

I. GM Simulation

Chris De la Torre : Python code development
Hoby: Lessons from SCEC

II. Workflow

1. Current progress

SeisFinder (Viktor/Sharmla) : Demo, Google Map API. Viktor to investigate KML export.
StatGrid validation (Sung) : Issues, solutions
- Current issue : 400m statgrid output too large. Not enough disk space for DB conversion (P7 has 2x135Gb only)
- Plan:
  - Canterbury instead of SI (same resolution as statgrid) and comparison : Just to complete validation.
  - Talk to UC HPC for disk space.
  - (To do) BB suggests Sung to consult Scott Cahallagan (SCEC) re. DB tools and develop a solution compatible to SCEC.
  - Q : a method to compare two seismograms? (To do) To discuss with Hoby offline.
OpenSees (Daniel) POWER7 SW stack for OpenSees development. Will start collaboration with Alex Pletzer from NeSI. Seokho to point to a paper re. OpenSEES scalability issue.
HW purchase (Viktor). Quote from Cyclone, specs.

2. New Ideas

Visualisaton (Sung) - http://scecvdo.usc.edu/ : SCEC-VDO. Shakemap animation not currently possible, and perhaps not ideal for automated 3D animation production. Yet a nice platform for data visualisation potentially beneficial for TB3.
Non-uniform grid of stations over SI.
(To do) Sung to test SCEC-VDO and explore possibility of extending it, and talk to Matthew Hughes and exchange ideas.

3. New tasks

Task	Question	Answer
Add feature for comments + validation matrix for each simulation scenario	(Sung) What comments, validation criteria do you like? Will you hand-draw an example output page?	Lets prioritise this to be considered in Nov 2016. I would like us to finish the validation framework project (Pettinga et al. #16035), as that will provide the final framework. We can then determine for each of the simulations how they meet the matrix; and finally thus decide how to display the results on SeisFinder
Add feature to show slip model of fault as image	(Sung) Image that looks like the output from rupt_rise+rake_mod_7pt9.csh?	Correct. We may want to reconsider this in the future (esp. for multi-fault ruptures), but for now (till say Q2 of 2017) this will be sufficient.
Add feature to provide multiple realizations of scenario due to different slip	(Sung) Is this related to the non-uniform weight of different rupture model (for historical event)?	This is loosely related to this. The idea is that both for historical Eqs, and also for future Eqs - there is uncertainty in our modelling assumptions. In order to represent this uncertainty, we provide multiple ground motion simulation results (the uncertainties can be due to different slip, but actually there are many more uncertainties. In general, the uncertainties relate to the earthquake source, velocity model, and local site effects assumptions); so the idea here is for SeisFinder to be able to enable users to obtain either a single result, or a suite of results (i.e if we have N simulations to represent the uncertainties, then SeisFinder would allow extraction of N ground motion time series at a given location(s) of interest). Clearly, this means that we have an increase in data storage demands. So this increases the need for our non-uniform grid (+ SQL) ideas even more (Q: Did you touch base with Scott Calaghan?(sp?))
Add feature to allow user-specified Vs30	(Sung) User to supply a new Vs30 file and web service applies new amplification and make a new set of seismogram files on the fly?	Since we already ask the user to provide the Lat and Lon values for one or more locations, if the user 'turns on' the option to provide their own Vs30 value, then my idea was that it would simply make available a third column for input data (or read a third column of a CSV file). Computationally what would happen is that we would use the site amplification scripts viktor has written to 'remove' the initial site amplification based on the previously assumed Vs30 value, and the 'add' the new site amplification based on the user-specified Vs30 value

test

Seokho Jeong posted on Nov 27, 2015