We are aware that experimental data comes in many forms and many formats, and in an ideal world, the BRAID analysis web service would accept data to be analyzed in any form imaginable. However, data cleaning – the process of arranging and formatting data so that it can be read by software – is a large, complex, and often intractable problem that is well beyond the scope of the service we offer here. We therefore require that data submitted to the web service is formatted within certain constraints. It is our hope that these constraints are no more burdensome than they would be for any other analytical software, but we understand that fulfilling them can still be confusing, so we have put together the following set of tips and guidelines, which we hope will make using the BRAID web service easier.
First, the file format. Comma-separated value (CSV) is a standard format for storing tabular data. A CSV file contains a single table of data, with rows of the table represented by lines in the file, and cells in a row separated by commas. CSV files can be created in a text-editor, or in spreadsheet editors such as Microsoft Excel; we recommend using a spreadsheet editor, as it will ensure that the file is correctly formatted (i.e. has the same number of cells separated by commas in each row). To save a data table as a CSV file in Microsoft Excel, select "Save as…", and under file type select "CSV (comma delimited) (.csv)".
In addition to using the CSV format, input files for the BRAID web service must have a very specific structure to be interpreted correctly. The first row of the table should contain column headers, or names; furthermore, the file must contain columns with the following five headers (and yes, lower and upper case must match):
Every measurement that should go into the analysis should be on its own row. Columns "Sample1" and "Sample2" contain strings or unique identifiers specifying which two drugs were combined in that row’s measurement. (More details on how to represent single-agent and negative control data is below.) Columns "Conc1" and "Conc2" contain the concentrations (in molar) of the two drugs combined in that measurement. Finally, the column "Act" contains the numerical effect variable that you wish to fit the BRAID model to. Note that this need not be the raw measured value: it can be a measurement that has already been normalized to controls, or a base-10 logarithmic transform of a measure such as fluorescence. The example file below depicts a file with six columns: the required five columns, as well as a column titled "Measure", which contains the raw fluorescence measured in the original experiment. The column "Act" contains the fluorescence data normalized to the average values of negative controls from the experiment. The columns in the file can be in any order, as long as all five required columns are present and labelled correctly. You can also include an optional column titled "Experiment" which marks blocks of data in your file that should be analyzed separately.
Of course, most combination experiments include data in which the behavior of each combined drug is tested in isolation; how should such data be included? To explain this, we must first examine how data in the input file is interpreted. The image below depicts the top of a BRAID input file, with four blocks of data highlighted. What separates these blocks, and what determines how the BRAID analysis script will interpret them, is the values of columns "Conc1" and "Conc2"; specifically, whether either or both of these columns is equal to 0. In a BRAID input file, if one of the values "Conc1" or "Conc2" is greater than 0, but the other is equal to 0, that measurement will be treated as a single agent measurement for the corresponding drug, while the name of the other drug will be ignored. For example the second, red block of measurements in the image below will be interpreted as single-agent dose-response measurements for caffeine; the value of "Sample2" for these measurements, "Sucrose", will have no effect on the analysis.
Similarly, if both values "Conc1" and "Conc2" are equal to 0 – as in the first, green block in the image above – the measurement will be treated as a negative control measurement, and the values in columns "Sample1" and "Sample2" will be ignored. For your own files, it may be advisable to use a specific string to specify the absence of a drug, as in the image below. Note, however, that the data in the image above and the data in the image below will be treated identically by the BRAID analysis script.
So, to add single-agent dose response data to your BRAID input file, place the concentrations of the drug tested in one concentration column, and the name of the drug tested in the corresponding sample column. Set the value in the other concentration column to 0, and place an arbitrary drug identifier in the other sample column (which will be ignored). To add negative control measurements, put 0 in both concentration columns, and an arbitrary identifier in both sample columns.
If a column titled "Experiment" is included in the data file, data with different values in this column will be analyzed separately, regardless of the values in columns "Sample1" and "Sample2". One use case of this column is if the same combination of drugs is tested at different time points or under differing conditions; marking these conditions in the Experiment column will allow them to be analyzed as separate combinations.
One additional caveat when using a spreadsheet editor such as Excel: a CSV file is stored as text separated by commas, meaning that for any number to be saved to the file, the program must make a choice about how that number will be represented. The two most common approaches are as a decimal number to a set number of digits after the decimal point, or in scientific notation with a certain number of decimal places. The point to keep in mind is that however the data is displayed when you are viewing the spreadsheet, that is how it will be saved. So if a cell contains the number 0.00030544076, and Excel displays the number as 3.05E-4, it will be saved as the string "3.05E-4", meaning that all precision after the digit "5" will be discarded. If you want to avoid this, you must tell the program explicitly how to format the data. In Excel, this can be done by selecting the cells you want to format (usually concentration cells as they are often much smaller than 1), locating the "Format Cells" option (where it is located will depend on the version of Excel) and selecting the desired format (usually "Number" or "Scientific") with the appropriate number of decimal places. This can make your file slightly larger, but will ensure that your data is saved as precisely as you need.
For those who wish to examine input files in more detail, we have provided two example files containing data from our original BRAID analysis publication. The first file, found here, contains the log-10 reduction in cell survival of combinations of three PARP inhibitors with two standard-of-care chemotherapy agents in the Ewings Sarcoma cell-line ES1. The associated compound file, containing synonyms, abbreviations, and concentration limits for the five tested compounds, can be downloaded here. A second data file, found here, contains data originally from work by Cokol et al.; specifically, the combined antifungal actiivty for all ten pairwise combinations of 5 antifungal agents. These combinations have been marked as distinct experiments so that each set of measurements (including dose-response and negative control measurements) is analyzed separately.