# User Documentation (Getting Started)
## Installation
From PyPI
```bash
pip install space_packet_parser
```
From Anaconda
```bash
conda install -c lasp space_packet_parser
```
## Basic Usage
The typical workflow for parsing packets is to
1. Load a packet definition
Packet definitions are the XTCE configuration documents that describe how to parse
and extract binary chunks of data into Python variables.
```python
definition = spp.load_xtce("/path/to/xtce_definition.xml")
```
2. Iterate over binary data
You can load binary data from a file all at once, or continually read from a socket stream.
To parse individual packets, you can iterate over that binary data to yield individual
binary packet chunks one at a time. There is a built-in generator for CCSDS packets.
Other binary packet generators can be used as well if your packets follow a different
protocol from CCSDS.
```python
for binary_packet in spp.ccsds_generator("/path/to/packet_file.ccsds"):
# Print out each packet's header
print(binary_packet)
```
3. Parse the binary packet data into a dictionary of parsed items
With a definition (1) and a stream of individual packets (2), one can
then parse the contents of that binary data into Python objects. The packet
definition defines a lookup structure based on `Parameter` names,
which are returned as a python dictionary of `{ParameterName: value}` items.
```python
packet = definition.parse_bytes(binary_packet)
# All items within the packet
print(packet)
# An individual item
print(packet["my_uint3_param"])
```
Putting this all together in an example script:
```python
from pathlib import Path
import space_packet_parser as spp
from space_packet_parser import ccsds
packet_file = Path('my_packets.pkts')
xtce_document = Path('my_xtce_document.xml')
# 1) load the XTCE
packet_definition = spp.load_xtce(xtce_document)
# 2) create a binary generator to yield packet binary data
generator = spp.ccsds_generator(packet_file.open('rb'))
# 3) parse individual packets with the definition
packets = [packet_definition.parse_bytes(ccsds_packet) for ccsds_packet in generator]
# You can introspect the packet definition to learn about what was parsed
# Look up a type (includes unit and encoding info)
pt = packet_definition.parameter_types["MY_PARAM_Type"]
# Look up a parameter (includes short and long descriptions)
p = packet_definition.parameters['MY_PARAM']
# Look up a sequence container (includes inheritance)
sc = packet_definition.containers['SecondaryHeaderContainer']
# See the API docs for more information about the ParameterType, Parameter, and SequenceContainer classes
with packet_file.open("rb") as binary_data:
ccsds_generator = ccsds.ccsds_generator(binary_data)
for packet_bytes in ccsds_generator:
packet = packet_definition.parse_bytes(packet_bytes)
# Do something with the packet data, which behaves like a dict
print(packet['PKT_APID'])
print(packet.header) # subset of packet
print(packet.user_data) # subset of packet
```
We aim to provide examples of usage patterns. Please see the `examples` directory in the GitHub repo. If there is
a specific example you want to see demonstrated, please open a GitHub Issue or Discussion for support.
## Parsing Packets to Xarray Datasets
For analysis and visualization workflows, Space Packet Parser can parse packets directly into Xarray Datasets using the `create_dataset` function. This is particularly useful when working with timeseries telemetry data. Note that this requires installing the optional `xarray` dependencies:
```bash
pip install space_packet_parser[xarray]
```
The `create_dataset` function returns a dictionary of Datasets keyed by APID, where each Dataset contains all parameters from packets with that APID:
```python
from pathlib import Path
import space_packet_parser as spp
from space_packet_parser.xarr import create_dataset
packet_file = Path('my_packets.pkts')
xtce_definition_file = Path('my_xtce_document.xml')
# Parse packets directly to Xarray Datasets (one per APID)
datasets = create_dataset(
packet_files=[packet_file],
xtce_packet_definition=xtce_definition_file
)
# Access dataset for a specific APID
apid_1_data = datasets[1]
print(apid_1_data)
# Work with the data
print(apid_1_data['MY_PARAMETER'].values)
```
You can filter packets by APID or other criteria by passing a `packet_filter` function. This is useful when working with multiplexed packet streams:
```python
# Filter to only parse packets with APID 41
datasets = create_dataset(
packet_files=[packet_file],
xtce_packet_definition=xtce_definition_file,
packet_filter=lambda pkt: pkt.apid == 41
)
```
**Limitations**: The `create_dataset` function only supports packet definitions with consistent field structure across all packets with the same APID. It cannot handle polymorphic packets where the structure changes based on previously parsed values. For such cases, use the low-level parsing API by calling the `parse_bytes()` method directly.
## Packet Bytes Generators
Packet bytes generators are functions that yield individual packets as `bytes` objects (or subclasses of `bytes`) from a binary data source. Space Packet Parser provides built-in generators like `ccsds_generator`, `fixed_length_generator`, and `udp_generator`, but users can write custom generators to parse any packet format they need.
A generator function should accept a binary data source (file-like object, socket, or bytes) and yield packet bytes one at a time. The built-in generator implementations in `space_packet_parser/generators/` provide complete examples of how to implement packet bytes generators. Custom generators allow you to adapt Space Packet Parser to work with any binary packet format.
While XTCE is commonly used with CCSDS packets, the XTCE standard is not limited to representing CCSDS packet structures. The CCSDS header information (VERSION, TYPE, APID, etc.) is not required by XTCE. You can define XTCE packet structures for any binary format and use a custom or built-in generator to yield those packets for parsing.
### Built-in Generators
#### CCSDS Generator
The `ccsds_generator` parses CCSDS Space Packets according to the CCSDS standard. It uses the packet length field in the CCSDS header to determine packet boundaries and supports features like segmented packet reassembly.
```python
from space_packet_parser import ccsds_generator, load_xtce
packet_definition = load_xtce("my_ccsds_packets.xml")
for packet_bytes in ccsds_generator(binary_data):
parsed = packet_definition.parse_bytes(packet_bytes)
print(parsed)
```
#### Fixed Length Generator
The `fixed_length_generator` yields fixed-size chunks from binary data. This is useful for packet formats where all packets have a known, constant length.
```python
from space_packet_parser import load_xtce
from space_packet_parser.generators import fixed_length_generator
packet_definition = load_xtce("my_fixed_length_packets.xml")
for packet_bytes in fixed_length_generator(binary_data, packet_length_bytes=64):
parsed = packet_definition.parse_bytes(packet_bytes)
print(parsed)
```
#### UDP Generator
The `udp_generator` parses UDP (User Datagram Protocol) packets from binary data. It reads the UDP length field from each packet header to determine packet boundaries. The generator yields `UDPPacketBytes` objects that expose UDP header fields (source port, destination port, length, checksum) as properties.
```python
from space_packet_parser import udp_generator, load_xtce
packet_definition = load_xtce("my_udp_packets.xml")
for udp_packet in udp_generator(binary_data):
# Access UDP header fields directly
print(f"From port {udp_packet.source_port} to port {udp_packet.dest_port}")
# Parse the packet using XTCE
parsed = packet_definition.parse_bytes(udp_packet)
print(parsed)
```
### Writing Custom Generators
A minimal custom generator follows this pattern:
```python
def custom_generator(binary_data):
"""Yields fixed-length packets from binary data."""
while True:
packet_bytes = binary_data.read(packet_length)
if not packet_bytes:
break
yield packet_bytes
```
For more sophisticated generators that handle multiple input types (files, sockets, bytes) and provide progress tracking, see the implementations of the built-in generators in `space_packet_parser/generators/`. These demonstrate best practices like using `_setup_binary_reader` utility for handling different data sources and optional progress bars.
### Filtering Packets
For generators that expose packet metadata (like `CCSDSPacketBytes` with its `apid` property), you can filter packets before parsing to improve performance. A code
example of this is linked in [examples](examples.md).
## Error Handling and Debugging
When parsing packets, you may encounter situations where packets cannot be parsed successfully. The low-level API provides direct control over how to handle these cases.
### Handling UnrecognizedPacketTypeError
If a packet doesn't match any of the defined packet structures in your XTCE definition, an `UnrecognizedPacketTypeError` will be raised. You can catch this error to examine the partially parsed packet data for debugging:
```python
from space_packet_parser import ccsds
from space_packet_parser.exceptions import UnrecognizedPacketTypeError
with packet_file.open("rb") as binary_data:
ccsds_generator = ccsds.ccsds_generator(binary_data)
for packet_bytes in ccsds_generator:
try:
packet = packet_definition.parse_bytes(packet_bytes)
# Process successful packet
print(f"Successfully parsed packet with APID: {packet.binary_data.apid}")
except UnrecognizedPacketTypeError as e:
# Handle unrecognized packet
print(f"Unrecognized packet type")
print(f"Partial data: {e.partial_data}") # Contains any successfully parsed fields
# Continue processing other packets or handle the error as needed
```
## Packet Objects
The object returned from `parse_bytes()` is a `SpacePacket`. This object subclasses a python dictionary and behaves as a dictionary. To retrieve
a parameter value from the parsed packet, you can iterate over its `items()` or you can access individual parameters
by name.
```python
from space_packet_parser import ccsds
ccsds_generator = ccsds.ccsds_generator(data)
packet_bytes = next(ccsds_generator)
packet = packet_definition.parse_bytes(packet_bytes)
my_param = packet["MY_PARAM_NAME"]
all_param_names = list(packet.keys())
```
## Parameter Objects
The parameter values within the packet are subclasses of normal python data types:
`int`, `float`, `str`, `bool` and `bytes`. The objects behave exactly as the python data types except that they all
contain a `raw_value` attribute, which contains the value generated by the data encoding parser, before being passed
through any calibrators, enum lookups, string parsing, or boolean evaluation.
```python
print(my_param) # prints the most derived value available - str, int, float, bytes, or bool
print(my_param.raw_value) # prints the "raw" encoded value parsed by the low level data encoding
```
Space Packet Parser returns the following types for parameters within a packet. They behave just as their Python
base classes (`int`, `float`, `str`, `bytes`, and `bool` respectively) except that each contains a `raw_value`
attribute that contains the encoded value before applying any calibration or other derived processing of the value.
The primary value of each parameter type is the fully parsed (calibrated, enumerated, string-parsed, etc.) value.
- `IntParameter`
- `FloatParameter`
- `StrParameter`
- `BinaryParameter`
- `BoolParameter`
### Numeric Calibration
Int and float parameters can be calibrated on the fly during decoding. These calibrators are defined on the data
encoding XTCE element and can transform the raw encoded value to a calibrated value, e.g. via a polynomial. Calibrated
values are always floats, even if the raw encoded value is an integer.
For example,
```xml
```
in this encoding definition, the raw encoded value is a 16bit unsigned integer that is calibrated by a polynomial
to produce a calibrated value, which is always a float. In this case `value = .012155 * raw_value + 2.54`.
### String Parsing
Strings are encoded as a buffer of determined size (either fixed length or dynamic based on previous parameter). The
raw buffer includes any additional string data such as a leading size integer or a termination character. If a
leading size or termination character is specified in the XTCE definition, the parsed string value is returned as
the value of the parameter and the buffer is returned as the `raw_value`. If no termination character or leading size
is specified, the value and `raw_value` are the same and both refer to the raw string buffer.
For example,
```xml
```
in this encoding definition, the size of the raw string buffer (number of bytes in the packet) is defined by a
parameter named `STR_SIZE`. The value stored in `STR_SIZE` is given in number of bytes so it is multiplied by 8 and a
constant base length of 27 bits is added to the final buffer size. So if `STR_SIZE` encodes the value 4, the raw string
buffer width in the packet is 59bits. This is an odd size for a string because it is not an integer number of bytes
but that's because it includes a 3bit unsigned int in front of the string data that specifies the size of the string,
in bits, making the raw string `[3 bit uint | 7 bytes]`
In this case, the `raw_value` of the parameter will contain the full string buffer as an 8 byte string,
padded on the RHS with 5 zero bits. We have to pad it because you cannot create a byte string from a non-integer
number of bytes (59bits). The `value` of the parameter will contain the fully parsed `str` object based on the value
of the leading size. If the leading size uint3 represents the integer 4, the `value` of the parameter will be a `str`
that is made of the first 4 bytes of data in the raw buffer following the leading size.
Termination characters work similarly.
```xml
32
0058
```
In this case, the raw buffer is a fixed length (32bits).
The parsed `StrParameter.raw_value` will be the full string buffer, including the termination
character and any additional following bytes. The `value` of the parameter will be a `str` based on all the encoded
bytes preceding the termination character. In this case, the raw string buffer _will_ always be an integer number of
bytes since a termination character is always an integer number of bytes, so no padding of the raw value is required.
### Enumerated Lookups
Enums are defined by lookup tables in the XTCE, which are converted to dictionaries internally. Once the raw value
from the data encoding is parsed, a lookup is made to the lookup table and the final string label is returned.
Note that the final label from enumerated lookups is always a string. The raw value used in the lookup table is
interpreted based on the data encoding for the parameter. Integer encoded enum values are ints, float encoded values
are floats, and string encoded values are used as the raw string buffer from the encoding.
Only raw values may be used for enum lookups.
Calibrated numeric values cannot be used for enum lookups from numeric encodings. For string encoded parameters,
only raw string buffers may be used for lookups (not fully parsed strings).
For example,
```xml
```
the encoded value (`raw_value`) is a uint8 integer but the value returned for an enumerated parameter type will
be a `StrParameter` containing the label string associated with the integer value.
### Boolean Evaluation
Booleans behave nicely for integers and floats where zero is False and everything else is True. For string and binary
encoded values, the only falsy value is an empty string, which is kind of silly to encode. XTCE is not specific on the
interpretation of string and binary encoded values for boolean parameters and there is no generally accepted
interpretation, so we default to Python's `bool`, which interprets any non-empty string as True.
Only raw values may be used for boolean evaluation. Calibrated values are not considered.
For example,
```xml
m/s
```
the encoded value (`raw_value`) is a single bit interpreted as an integer but the value returned for a boolean
parameter type will be a `BoolParameter`, evaluated over the encoded integer value. `False` if the integer is 0,
`True` otherwise.
## Parsing from a Socket
The input data object to `XtcePacketDefinition.packet_generator` need only be a binary filelike object from which
bytes can be read. This means the packet generator is not limited to parsing data from files! In an effort to support
development of quicklook type tools, we provide an example of parsing data streaming through a socket in
`parsing_and_plotting_idex_waveforms_from_socket.py`.
The example mocks the behavior of an instrument sending packet data asynchronously
through a socket in chunks of inconsistent size. The packet parser reads bytes from the receiver side of the socket
and will read data repeatedly until there is sufficient data for the full packet. Once it has a full packet
(as determined by the packet length in the CCSDS header), it cranks the generator and yields a parsed packet.
You'll notice that the example ends with a timeout error. This timeout can be controlled when creating the socket
connection with `receiver.settimeout(timeout_seconds)`.
## Variable Length Packet Fields of Explicit Length
Flight software engineers often need to downlink data (usually binary blobs) of variable length. The length of these
fields is often specified in a _previous_ telemetry point in the same packet, and you have to fetch
the length by referencing that previous field.
### Explicit Variable Length Example
Suppose the variable length field is called `SCI_DATA` and is a binary blob (e.g. of compressed data).
The length of this field is specified earlier in the packet in a field called `SCI_DATA_BYTELEN`, specified in
number of bytes. To define the type for `SCI_DATA` in XTCE, you could use the following (snippet):
```xml
```
This tells the parser that the size in bits of data type `SCI_DATA_Type` (the type of `SCI_DATA`) is the raw value
encoded in the parameter `SCI_DATA_BYTELEN`, multiplied by 8 (to convert number of bytes to number of bits).
## Variable Length Packet Fields of Implicit Length
In some circumstances, flight software teams define a packet field that simply fills up the "remaining space" in the
packet. The length of this field is usually implicit but can be computed by subtracting the combined length of all
fixed length fields in the packet from the total length of the packet specified in the CCSDS header.
The `PKT_LEN` field is the length of the packet user data, in bytes. This field:
- counts from zero
- does not include the header data (always 6 bytes)
Thus, you can determine the length of your field dynamically from the packet length in the CCSDS header:
$$len_{var} = 8 \times (len_{packet} + 1) - \sum_n len_{fixed,n}$$
where
- $len_{var}$ is the length, in bits, of the variable length field
- $len_{packet}$ is the packet user data length in bytes (from the CCSDS header)
- $\sum_n len_{fixed,n}$ is the combined length of all fixed length fields in the packet user data
There are some limitations to this. If your FSW team is violating these limitations, they are making your life
extremely difficult, and you have my condolences.
- You can only have a _single_ "remaining packet length" field in a given packet definition. Encoding more than one
such field makes it impossible to determine the length of the fields.
- All other fields in the packet _must_ be fixed length. There is no way that I know of in XTCE to calculate a
dynamic length that is an arbitrary function of multiple previous length specifier fields.
### Implicit Variable Length Example
Packet Definition:
```text
"VERSION" : 3 bits
"TYPE" : 1 bits
"SEC_HDR_FLG" : 1 bits
"PKT_APID" : 11 bits
"SEQ_FLGS" : 2 bits
"SRC_SEQ_CTR" : 14 bits
"PKT_LEN" : 16 bits
"SHCOARSE" : 32 bits
"SID" : 8 bits
"SPIN" : 8 bits
"ABORTFLAG" : 1 bits
"STARTDELAY" : 15 bits
"COUNT" : 8 bits
"EVENTDATA": variable length
```
To calculate the length of `EVENTDATA`:
```{math}
len_{var} &= 8 \times (len_{packet} + 1) - (&&len_{SHCOARSE} + len_{SID} + len_{SPIN} + \\
& &&len_{ABORTFLAG} + len_{STARTDELAY} + len_{COUNT})\\
&= 8 \times (len_{packet} + 1) - (&&32 + 8 + 8 + 1 + 15 + 8)\\
&= 8 \times len_{packet} - 64 &&
```
This equation can be implemented in XTCE by referencing the packet length field as follows:
```xml
```
## XTCE Document Validation
Space Packet Parser provides comprehensive validation capabilities for XTCE documents to help ensure they are correct and will work properly for parsing packets. The validation system operates in three modes: "schema", "structure", and a default mode of "all" (both schema and structure validation).
- **Schema Validation**: Validates the XML document against the in-document referenced XTCE XSD schema
- **Structural Validation**: Validates XTCE-specific structure and reference integrity
Schema validation requires correct namespacing declarations at the top of your XTCE document.
e.g.
```xml
```
### CLI Validation
```shell
spp --log-level=DEBUG validate my_xtce.xml --local-schema my_xsd.xml --level all
```
### Programmatic Validation
```python
from space_packet_parser import validate_xtce
# Validate an XTCE file against the referenced schema
result = validate_xtce("my_xtce.xml", level="schema")
if result.errors:
for error in result.errors:
print(f"Error: {error}")
else:
print("Document is valid")
# Validate an XTCE document structure to check for
# unused Parameters ParameterTypes and nonexistent references
result = validate_xtce("my_xtce.xml", level="structure")
if result.errors:
for error in result.errors:
print(f"Error: {error}")
else:
print("Document is valid")
# Comprehensive validation (both schema and structure)
result = validate_xtce("my_xtce.xml", level="all")
print(f"Validation completed in {result.validation_time_ms:.1f}ms")
if result.errors:
for error in result.errors:
print(f"Error: {error}")
else:
print("Document is valid")
```
## Troubleshooting Packet Parsing
Parsing binary packets is error-prone and getting the XTCE definition correct can be a challenge at first.
Most flight software teams can export XTCE from their command and telemetry database but these exports usually require
some fine-tuning.
`UnrecognizedPacketError`s are raised during parsing of an individual packet when either:
- a) multiple child containers are valid inheritors of the current sequence container based on
restriction criteria evaluated against the data parsed so far
- b) no child containers are valid inheritors of the current sequence container based on
restriction criteria evaluated against the data parsed so far
and the current container is abstract
To aid you during development, `UnrecognizedPacketError` exceptions generated during parsing can be returned
alongside any valid packet objects by setting `yield_unrecognized_packet_errors=True`.
These exception objects are not raised so that the generator may keep parsing. Instead, they
are yielded from the generator with a `partial_data` attribute for user examination. This partial data allows you to
see how far it got through a container inheritance structure before failing to determine the rest of the structure.
## Common Issues and Solutions
### Parser Generator Completes without Yielding a Packet
This can occur if your data file contains only packets that do not match any packet definitions in your XTCE document
and `yield_unrecognized_packet_errors=False` (the default). This could mean that your data file actually contains only
APIDs that are not covered in your packet definition, but usually it means you have incorrectly defined
restriction criteria for SequenceContainer inheritance.
For example a restriction criteria element that requires an APID which does not exist in the data.
```xml
```
### Only Packet Headers are Parsed
If you observe that only packet headers are being parsed but no exceptions are being raised (you may be seeing a
lot of length mismatch warnings if you have logging set up), it likely means that
you have forgotten to set `abstract="true"` on your non-concrete sequence container elements.
For example
```xml
Super-container for telemetry and command packets
```
will parse as a complete packet, containing only `VERSION` and `TYPE` instead of searching for inheriting sequence
containers. To define the container as abstract, change the first element opening tag to
```xml
...contents
```
## Optimizing for Performance
The logic evaluated during packet parsing is largely reflective of the XTCE configuration being used
to define packet structures. The more logic in the XTCE, the more logic must be evaluated during
parsing. Below are some common ways to reduce complexity and speed up parsing:
1. **Remove `RestrictionCriteria` Elements:** If your packet stream is a single packet structure,
there is no reason to require the evaluation of a restriction criteria for each packet.
2. **Remove Unnecessary Packet Definitions:** Even in a packet stream with multiple packet formats, if you only
care about one packet type, you can remove the definitions for the other. By default, the packet `Parser` will
catch unrecognized packet errors and skip to the next packet. This skips the parsing of packets
for which a valid definition cannot be determined.
3. **Reduce Container Inheritance:** A flat container definition structure will evaluate restriction criteria
faster than a nested structure. Each instance of nesting requires an additional `MatchCriteria.evaluate()`
call for each packet being parsed.
4. **Reduce Complex Items:** Parameter type definitions that contain calibrators or complex string parsing
(especially variable length termination character defined strings) add significant evaluation logic to
the parsing of each parameter, as does any parameter type that is variable length.
Removing them can speed up parsing.