When working with well log data in a pandas
DataFrame
, it is very likely that you’ll want to explore your data in the context of geologic zones. By adding zone labels to each row of your DataFrame
, it is possible to use some of the fun and powerful features of pandas
, like groupby()
for stats aggregations. And while the process for adding tops to a DataFrame
is not obvious, it is simple.
In this approach, I add a zone label to each row in the DataFrame
by using the cut()
method from the pandas
library.
First we’ll load our log data using lasio
and subsequently create a pandas
DataFrame
object from the LAS data section.
import lasio
import pandas as pd
import numpy as np
las = lasio.read('example.LAS')
This LAS file happens to have an extra header section ~T
that lists tops. This is not a defined section in the LAS 2.0 standard, but nonetheless it is common to find tops information stored in a header section like this.
lasio
does not parse the ~T
section automatically, but it does load that section into the LASFile
object and store it as a newline-delimited string in the sections
dictionary.
print(las.sections['Tops'])
Out:
'STE_GEN 2879.3700\nST_LOUIS 3027.6700\nSALEM 3262.4400'
For this technique of adding tops to a DataFrame
, we need a list of the top names and a list of the depths of each top. We’ll just use some simple string split()
s to get the lists we need. Note that the depth values for the tops are stored in the LAS file as strings, so we’ll have to convert them to the float
object type.
The depths list will need to include an arbitrarily large value as well so that the base of the deepest zone is defined. In this case we’ll just use 99,999 ft.
raw_tops_section = las.sections['Tops']
tops_names = [i.split()[0] for i in raw_tops_section.split('\n')]
tops_depths = [float(i.split()[1]) for i in raw_tops_section.split('\n')] + [99999]
print(tops_names, tops_depths)
Out:
(['STE_GEN', 'ST_LOUIS', 'SALEM'], [2879.37, 3027.67, 3262.44, 99999])
The lasio
library has a convenient built-in method for creating a DataFrame
object.
df = las.df()
By default that DataFrame
uses the log depth as the index, which is fine for this exercise.
Now we create a label for each DataFrame
row based on our tops and store it as a new column called ZONE
. We can use the pandas
cut()
method to do so by providing our top names and depths to the bins
and labels
[kwargs]().
df['ZONE'] = pd.cut(df.index, bins=tops_depths, labels=tops_names)
Now we can use powerful pandas
groupby()
s to do powerful magic. In this example I’ll groupby the ZONE
and aggregate the statistics.
df.groupby('ZONE')[['GR', 'DPHI', 'ILD']].agg(['mean', np.std]).round(2)
Out:
GR | DPHI | ILD | ||||
---|---|---|---|---|---|---|
mean | std | mean | std | mean | std | |
ZONE | ||||||
STE_GEN | 31.11 | 10.23 | 0.02 | 0.01 | 109.27 | 181.24 |
ST_LOUIS | 39.96 | 10.55 | 0.02 | 0.01 | 167.73 | 169.24 |
SALEM | 25.81 | 8.47 | 0.03 | 0.01 | 128.91 | 136.58 |
That’s one way to get tops into a DataFrame
. I’d love to hear from you if you have other tricks for working with tops in a pandas
DataFrame
!