Example Flow: Using Train and Predict for ML models
In Danomics you can use machine learning algorithms such as RandomForests, Gradient Boosting, and Multi-Linear Regression to perform predictions. This could be for infilling or creating missing log curves, predicting properties, or even to predict properties like EUR or IP90.
The Train and Predict tools will need to be ran in separate Flows. The training Flow will create the model, and the prediction flow will apply it.
The Train Flow will be as follows:
- LogInput >> Brings the log data into the Flow
- CpiLogCalc >> Calculates the curves to use in the model training
- Train >> Trains a machine learning model
Notice that in the above example we have two Train tools. This is because we are training one model for DTC prediction and another for PEF prediction.
The Predict Flow will be as follows:
- LogInput >> Brings the log data into the Flow
- CpiLogCalc >> Calculates the curves to use in the model prediction
- Predict >> Make prediction using the model from the Train
- DeleteComputedCurves >> Does some basic cleanup
- LogOuput >>Writes the curves to a new log database.
Notice that in the above example we have two Predict tools. This is because we are applying the models for DTC prediction and another for PEF prediction.
Training Flow
In th training Flow we need to create the curves we are going to use and predict. E.g., if we want to use GR_FINAL, RESD_FINAL, RHOB_FINAL, and NPHI_FINAL to make predictions for DT_FINAL, we need to have all those curves available. We do this in the CpiLogCalc tool. Note, we are using the _FINAL curves as we want them to already be normalized and have undergone washout repair. We also need to select the CPI and database information.
For the Train tool we will need to choose the Training Features, Label, and ML Model. An example for predicting DT_FINAL is shown below:
Note that the models are saved as ".pickle" files. That means we don't end with a LogOutput as we aren't saving the results to a log database at this step.
Predict Flow
The prediction Flow has a similar architecture. In the CpiLogCalc tool we only need to enter in the curves we used as the training features in the Train Flow. Following on our example above:
In the Predict tool we select the model and give it an output curve name. Note tha the Label should not be an existing curve name (e.g., no DT_FINAL) as this would cause confusion later. Instead we'll give it a name like DT_ML that will then alias to DT, and then be added into to DT_Final by the CPI Config.
DeleteComputedCurves is a useful addition in this Flow to remove the curves we created for making the prediction (e.g., GR_FINAL). LogOutput ends by writing out a new log database.
Tips and Tricks
- Remember, that we Train first, then Predict, in separate Flows.
- You can have multiple Train / Predict in one Flow.
- Remember that ML algorithms require that you have finite data - so if a training curve is null a prediction can't be made for that sample.
- Train and Predict can be used on both logs and points data.
- Check the job logs after it has ran and inspect for errors and look at the MSE error reported for each model trained.
Related Insights
DCA: Type well curves
In this video I demonstrate how to generate a well set filtered by a number of criteria and generate a multi-well type curve. Before starting this video you should already know how to load your data and create a DCA project. If not, please review those videos. Type well curves are generated by creating a decline that represents data from multiple wells.
DCA: Loading Production data
In this video I demonstrate how to load production and well header data for use in a decline curve analysis project. The first step is to gather your data. You’ll need: Production data – this can be in CSV, Excel, or IHS 298 formats. For spreadsheet formats you’ll need columns for API, Date, Oil, Gas, Water (optional), and days of production for that period (optional). Well header data – this can be in CSV, Excel, or IHS 297 formats.
Sample data to get started
Need some sample data to get started? The files below are from data made public by the Wyoming Oil and Gas Commission. These will allow you to get started with petrophysics, mapping, and decline curve analysis. Well header data Formation tops data Deviation survey data Well log data (las files) Production data (csv) or (excel) Wyoming counties shapefile and projection Wyoming townships shapefile and projection Haven’t found the help guide that you are looking for?