Hydro GeoAnalyst Insider
WINTER EDITION
Vol.1: No.1, 2011

Welcome to the HGA Insider!

It’s a new year, and everyone in SWS is excited.  We have always felt that the HGA user community is a group of very brilliant, dedicated, creative, and supportive people.  This year, we felt it was time to give more back to you.  This newsletter is a part of that effort.  Once each quarter, we will send you a new edition.  Each edition will be packed with articles on how to get the most out of HGA – whether you are new to the program, or a veteran of many years.  We know you work hard.  We’d like to help you work smarter. 

Is there something you need help with in HGA?  Let me know – I would be happy to feature it in the next edition.

Sincerely,

Jennifer Hurley
Hydrogeologist & Data Management Specialist
Schlumberger Water Services

IN THIS ISSUE:

Announcements

HGA 2011 Sneak Peek

Best Practice

Designing your data modeling

Experts Exchange

Freestone - "Exceptional value from environmental data management"

Quick Helps

Real-Time QA for Manual Data Entry

Speeding Up Large Imports

Making Import Templates

5 Second Survey

Take the survey!

HGA 2011.1 Sneak Peek

The HGA 2011.1 release is all about data quality.  Specifically, lab data quality.  Even laboratories can make mistakes.  It’s important to be able to confirm that your lab results and methods adhere to the level of quality that you need to meet your project goals, and that any deviations are caught early.  In HGA 2011.1, you can assess your chemical data against these six different quality metrics:

  1. Duplicates – Is the percent difference or coefficient of variation within proper project control limits for each pair of duplicate samples?
  2. Spikes – Is the percent recovery on spiked samples above your specified minimum value?
  3. Blanks – Were there detections in any of your blanks?
  4. Dilution Factors – How many times did the laboratory dilute your samples?  Were any samples diluted too many times?
  5. Maximum Detection Limits – Are any of the analyte detection limits so high (for example, because of high dilution factors) that your non-detects are meaningless in comparison to regulatory guidelines?   
  6. Holding Times – Did any of the samples wait too long before the lab analyzed them, making analyte concentration meaningless?

Even better, HGA 2011.1 will automatically flag any data it finds that are outside the quality standards you define, so that you can easily exclude those questionable samples from further analyses.

Tired of importing chemical data in two steps?  In HGA 2011.1, you can import chemical data in one easy step, and HGA will populate all related tables all at once.  No longer will you need to populate the sample, and then the results table.  You can use the extra time to watch the snow fall outside your window, or go get that second cup of coffee.

We know that most of you begin projects using the Environmental Template.  Did you ever wonder what some of those fields were for, or why you would ever want to use them?  We thought it was time to explain.  HGA 2011.1 includes a data dictionary that explains what each field should (or at least, could) be used for, and provides examples.  Don’t like our suggestions?  No problem.  We still always allow you to change the data model to be exactly the way you want – even in the middle of a project you are already working on.

Top

Designing Your Data Model

The Template Manager allows you to edit your data model in HGA. 

The easiest way to get started is to begin with the standard Environmental Template that comes with HGA, (available in either imperial or metric units).  You can then adjust it to meet your specific project needs.  There are only a very limited set of fields that should remain untouched, or which require special configuration in the Profile Settings.  A summary of these fields is available here.

One of the most common modifications is to rename fields and tables to match project terminology.  For example, you may wish to rename the Hydrogeology table to be Hydrostratigraphy

Some examples of tables that clients have added include:

  • Top of Casing (TOC) Changes – to store historical data showing changes in TOC over time
  • Free Product Thickness - to store measurements of thickness of free product found in wells
  • Stream Flow – to store date, average daily flow rate, peak flow, gage height, reservoir storage, and lake elevations
  • Meteorology – to store daily precipitation, temperature,  and evapotranspiration rates
  • Well Production – to store production data (for example, for pump and treat systems)

Primary keys

It is good practice to set up your tables with primary keys.  Why would you want one?  For two reasons.  First, having a primary key means you can’t end up with duplicate entries (for example, if you accidentally try to import the same data twice).  Second, once the data are in your project, if you want to update a large number of records quickly, having a primary key on your table allows you use it and the Data Transfer System (DTS) in HGA to do a bulk update of all affected records.  Examples of when you might want to do a bulk update might be because you have new data for several records, or because you realized a part of the data is wrong (hey – it does happen!).

A primary key can be a single field (such as StationID in the Station table), or a combination of fields (such as From and To for a table like Casing).  The important point is that whatever you choose, it should create a UNIQUE combination for every record in that table. 

Relationships

Like everything in life, a good data model is built on good relationships.  The most common form of relationship (in HGA at least) is the Parent-Child relationship.  This relationship describes a hierarchy, where the parent table contains data that is, in some ways, at a higher level than the child table.  There is usually a one-to-many relationship between the parent and child.  Meaning that, just as in real life, one parent can have many children.  And (in the world of HGA), each child table can have only one parent table.  As a result, the parent table must contain data before the child table can. 

A good example that you would be familiar with from your current projects are the Parameter_Sample and Parameter_Result tables.  The Parameter_Sample table contains information about the physical sample (this generally corresponds to the types of data you would collect in the field), while the Parameter_Result table contains information about the analytical results (this generally corresponds to information created by the laboratory).  Parameter_Sample is parent to the Parameter_Result table.  One sample can have many different results (one for each analyte).

Understanding these types of relationships and having the ability to identify situations where a parent-child relationship is helpful (or necessary) can reduce duplication in your data, and make your data easier to maintain.

For example, I recently worked with an HGA client who has a lot of geophysical data.  They often ran the same tool set down the same wells, to see how values were changing over time.  The existing geophysical data structure in the classic Environmental Template can’t handle this, because each geophysical data table has only depth and value fields and assumes each log is run only once for each station.  There are two different ways this situation could be handled.

Basic approach – no parent child relationship

The client could have modified each relevant table to include a date field, and a run number.  They could then use the run number to track the log – for example, a gamma log, over time in a single well.  Such a table would look like the one shown below.


You can see that the Run and the Date fields repeat many times, each time with the same content.  In fact, they repeat for every Gamma value collected for each specific date.  Imagine if you found a mistake in the date for one of the runs.  In the example shown, you would have to correct that date 1311 times!  Want to import new data?  You would be importing 2620 more values than you need to, because of the duplication.

Smart approach – nurture a parent child relationship

The alternative to this approach is to create two tables.

The first we will call Geophysical_Run.  This table contains information about each run – a unique run_id field (the primary key), the date the survey was made, the type of geophysical data collected, the measurement unit of the data, and the station in which it was performed.  This will be the parent table.

The second table, we will call Geophysical_Result.  This table is simple; it contains three fields: run_id, depth, and value.  This is the child table.  The two tables are linked through the run_id.  The primary key is a combination of the run_id and depth fields.
The same data are stored, but now there are no duplicated pieces of data.  If you discover an error in a survey date – you can correct it once, and it’s updated everywhere.  Plus, you can see at a glance, from the Geophysical_Run table, how many times a particular well has been surveyed.  And if that client decides to run a new type of geophysical survey at the site, he doesn’t have to change his data model to include a new table for that survey data.  Instead, he just adds a new value to the survey_type pick list and he’s ready to import.


We’d love to hear about the tweaks you’ve made to your data models – let us know

Top

Freestone - "Exceptional value from environmental data management"


Freestone Environmental Services arranged and hosted at their offices a one-day data management workshop consisting of presentations given by experts from Schlumberger Water Services and their own internal staff.  Key clients were invited, and lunch and snacks were included.  An agenda was circulated ahead of time so that busy attendees could choose to drop in for only specific sessions, if they desired.  Morning talks presented broad approaches and discussed the overall challenges of data management in our industry, with comments on some approaches to resolve common issues.  Afternoon talks were more technical, and focused on application of specific technologies, and in what ways Freestone was using them to improve the situation (and their deliverables!) at the Hanford Site.  The turnout was excellent.  In total, seventeen key clients attended at least one of the sessions, and the feedback Freestone received was very positive. 

Two of the Prime contractors to DOE (Hanford site) have subsequently made the decision to standardize their technical working groups on HGA. They also use HydroManager, the SWS enterprise level solution, to expose the data real-time on a secure website for review by the multiple management-level stakeholders who are involved in the site review and approval process.  With HydroManager, data tables, complex queries, cross-sections, and borehole logs become instantly accessible via web browser as they are created in HGA. As a result of the workshop and Freestone’s subsequent implementation of these tools, Freestone is now established as experts of both HGA and environmental data management practices at the Hanford Site and have cemented their presence as leaders in the field.

     

Freestone’s involvement has increased; their data model was used as a reference in designing the Hanford Site model, and together with SWS staff they trained Hanford staff on how to employ the Freestone HGA data model and to improve project communication and standardization.

Top

Quick Helps

Real-Time QA for Manual Data Entry (turn on BHLP while entering well data)

Next time you are entering well data in HGA, try this trick to help.  Turn on the BHLP Profile Viewer.


That way, you can see the borehole log as you create it, which can help you notice problems in the data sooner (like a well screen that’s too long and doesn’t match the lithology data).


 Top

Speeding Up Large Imports (Station Name versus Station ID)

Got a LOT of data to import using the DTS?  Want it done faster?  Try importing using StationIDs rather than Station Names.  This way, HGA won’t have to take the time to look up the station IDs for each well name in your import file. 

Don’t know the IDs for your stations?  Here’s how to get them:

Use the DTS Export (as described in the Making Import Templates quick help) to export the ID and Name fields from the station table.  This will give you a complete list of all the stations and their corresponding IDs in HGA.

You’re probably now thinking “Ok – I have these IDs – but I don’t want to match them by hand to the records in my Excel sheet – that’s WAY too much work”.  Don’t worry.  You don’t have to.  Excel has a neat little trick to help you with this.  Here’s how:
Open the excel spreadsheet containing all the data that you want to import (your source file).  Also open the Station ID excel sheet you just exported from HGA, and re-order the columns so that the Station Name is the first column in the table, and the Station ID is the second column.

Add a new column to your source file and call it Station_ID.  Select the first cell in the new Station_ID column.  In Excel 2007, from the Formulas ribbon, select VLOOKUP from the Lookup and Reference library.

In the dialogue that appears, you will need to map four fields.  The lookup_value is the value you would like to have an ID for: select the field that contains the name of the Station in the current data row.  The Table_array is the entire range of cells that contains your reference table: click on the little icon, navigate to the other Excel spreadsheet that contains your station names and station ids, and select all the data in the table (both columns, all rows).  The Col_index_num is an integer that tells Excel which of the reference table columns contains the value you are trying to look up:  write 2 (because the second column contains the ID that you want to use instead of the station name).  The Range_lookup lets you control whether Excel looks for a similar value (true), or an exact match (false): type false.  In the end, your dialogue should look something like this:


If you hit OK, Excel should populate the cell with the Station ID that matches the station name for the current data row.  It’s a good idea to check this value to make sure it matches.  In this way, you know if your formula is correct.  If you are happy with it, edit the formula and use the dollar sign ($) to lock the table_array values (so the table references don’t change when you copy the cell).  In my case, my formula ends up looking like this:


Now copy and paste the formula for every record in your import data set.  Excel should correctly populate the entire Station_ID column in your import data set with appropriate Station IDs by looking them up in your reference table.
You are now ready to import your data set using Station IDs instead of station names!

Top

Making Import Templates

Use the DTS Export function to quickly make an import template for a table in HGA.  The DTS export works almost the same as the Import process – only in reverse!   Through the DTS Export, pick the table and specific fields to include in the export, and then create the file.


If you already have data in the table you are exporting, you’ll want to open the exported file and delete all the records you exported.  The result should be a clean template that has column headers that exactly match the fields for the table you exported.  And that’s nice, because during import, HGA will automatically map field names from your source file if they match those in the database tables.  This is particularly useful for tables that have a lot of fields. 

You can take this idea one step further.  Chemistry data in particular is collected several times each year, and usually delivered by the laboratories in a standard digital format.  For faster importing via the DTS, it may be worth discussing with your laboratory to see if they can standardize their data delivery format to be consistent with your HGA needs.  Some labs already do this for their clients.

Top

5 Second Survey


Are you clean water or contaminant focused?

Click here to answer.

 Top

©2011 Schlumberger. All Rights Reserved. | 519-342-1141 | Fax: 519-885-5262 | sws-info@slb.com | swstechnology.com