Maintaining product quality while rapidly adding new features and scaling is a challenge. At Fiddler, our mission is to build trust into AI through our AI Observability platform for LLMOps and MLOps. To improve product quality while rapidly shipping new features at scale, we rely heavily on Pytest fixtures.
In this post, we’ll explore how Pytest fixtures are integral to our testing tactics and serve not only as the backbone of our feature delivery but empower our engineers to iterate quickly.
What are Pytest Fixtures?
Pytest is widely regarded as the most flexible Python testing framework, making it popular for handling everything from simple unit tests to complex end-to-end test scenarios. Its power lies in streamlining test creation and minimizing repetition, primarily through the use of fixtures.
Fixtures are crucial for setting up and tearing down resources (or states) needed for tests, enabling developers to create reusable, modular setups across multiple test cases. This eliminates the need to duplicate setup code and allows developers to focus on the core test logic while ensuring a consistent, correctly configured environment.
Why Pytest Fixtures Matter in Product Quality?
Fiddler supports a variety of data modalities, tooling integrations, and LLM/ML use cases. Ensuring that our features are thoroughly tested in various scenarios (e.g., different models, environments, datasets) is essential to product quality. Pytest fixtures help us:
- Encapsulate Setup Logic: For every model, dashboard, or observability feature, there is often specific setup logic needed. For example, connecting to external databases, initializing LLM/ML models, or creating test data. Pytest fixtures allow us to encapsulate this logic and reuse it across hundreds of tests
- Enable Modular, Reusable Tests: The Fiddler AI Observability platform works across different platforms and configurations. By using scoped fixtures, we tailor test environments for different platforms without rewriting or duplicating tests. This modularity speeds up the test process, as Pytest only sets up what's necessary
- Isolate Test Cases for Reproducibility: Fixtures help isolate test runs by managing environment setup and teardown, making it easier to detect shared state and allowing us to reason explicitly about state management, organizing the test environment while intelligently sharing state when necessary
Sample Fixture in Action
Here’s an example of how we use Pytest fixtures to manage function caches in our tests:
# filename - conftest.py
import pytest
from fiddler2.modules.authorization.handlers import get_token
@pytest.fixture(autouse=True)
def clear_fn_cache() -> None:
"""Clear function cache after execution"""
yield
get_token.cache_clear()
This clear_fn_cache
fixture, defined in the conftest.py file, clears the cache for the get_token
function after each test run. The autouse=True
attribute makes this fixture automatically apply to every test, so it runs without needing to be explicitly referenced in each test case.
Unit Tests: Choosing Functionality Over Mocks
Unit tests often start with mocking, which can limit the confidence in how the actual functionality performs. While unit tests with mocked code can verify interface contracts, they don't execute the real code paths. Functional tests, on the other hand, ensure that those paths are fully executed and validated. At Fiddler, we prioritize functional tests over heavily mocked unit tests to provide more realistic and reliable validation.
Here’s an example of testing a database update using a unit test:
# file name - alerts.py
from database import Connection
from orm_models.alert import Alert
def pause_alert(alert_id: int) -> None:
alert = Alert.get(alert_id)
alert.update(pause=True)
Unit tests usually do not have access to the database and any calls to the database will be patched (so that they are fast). Sample code:
from pytest_mock import MockerFixture
from alerts import pause_alert
def test_pause_alert(mocker: MockerFixture) -> None:
alert_id = 1
alert = mocker.MagicMock()
mock_get_fn = mocker.patch('orm_models.alert.Alert.get')
mock_get_fn.return_value = alert
pause_alert(alert_id=alert_id)
mock_get_fn.assert_called_once_with(alert_id)
alert.update.assert_called_once_with(pause=True)
While this unit test achieves full code coverage, it may give a false sense of security because actual database operations are mocked. Functional tests can elevate our confidence via Pytest fixtures.
Functional Tests: Real-world Functionality Validation
Functional tests focus on executing functions without relying on mocks, so the actual functionality is tested. Although the initial setup can require effort, Pytest fixtures significantly simplify this process. Fixtures can be defined to set up and tear down databases, tables, and other necessary resources, with mocks reserved for handling error scenarios beyond the scope of the function being tested.
At Fiddler, we build functional fixtures for every core service — such as Postgres, Redis, ClickHouse, and Blob storage — so that all backend services run high-fidelity, real-world functional tests.
Here’s an example of a functional test:
import pytest
from orm_models.model import Model
@pytest.fixture
def model() -> Model:
_model = Model(name='test_model')
session.add(_model)
session.flush()
yield model
@pytest.fixture
def alert(model: Model) -> Model:
_alert = Alert(
name='low traffic alert',
metric='traffic',
model_id=model.id,
compare='<',
threshold=100,
pause=False,
)
session.add(_alert)
session.flush()
yield alert
def test_pause_alert(alert: Alert) -> None:
assert alert.pause is False
pause_alert(alert_id=alert.id)
assert alert.pause is True
With basic fixtures in place, writing test functions becomes significantly simpler. You can craft various validations and conditional flow tests, catching potential issues early and reducing complexity during integration and end-to-end testing.
Integration Tests: Covering External Service Integration Points
Certain functionalities, such as interaction with third-party services like PagerDuty or Slack, require integration tests. These tests use mocked responses to simulate scenarios without relying on live services.
Here’s an example of a PagerDuty notification function and how to test it:
# file name - pagerduty.py
def trigger_pagerduty_event(
payload: dict,
event_action: str = 'trigger',
):
try:
response = requests.post(
'https://events.pagerduty.com/v2/enqueue',
json={...},
)
if response.status_code == 202:
logging.info(f"Incident created with dedup key - {response.json().get('dedup_key')}"
)
else:
raise PagerdutyException('Triggering the incident failed')
except ConnectionError as ce:
logging.exception('Failed to establish connection')
raise PagerdutyException from ce
except Timeout as toe:
logging.exception('Request timed out with message')
raise PagerdutyException from toe
Here’s how you can write an integration test for the PagerDuty notification:
import responses
@responses.activate
def test_pagerduty_success() -> None:
mock_200_response = {
'status': 'success',
'message': 'Event processed',
'dedup_key': 'srv01/HTTP'
}
responses.post(
url=https://events.pagerduty.com/v2/enqueue,
json=mock_200_response,
status=202
)
trigger_pagerduty_event(...)
@responses.activate
def test_pagerduty_failure() -> None:
mock_error_response = {
'status': 'Unrecognized object',
'message': 'Event object format is unrecognized',
'errors': ['JSON parse error']
}
responses.post(
url='https://events.pagerduty.com/v2/enqueue',
json=mock_error_response,
status=500,
)
with pytest.raises(PagerdutyException):
trigger_pagerduty_event(...)
Integration tests using the responses library can mock different API responses, making it easier to test various scenarios.
End-to-End Tests: Ensuring Comprehensive System Behavior
End-to-end (E2E) tests validate the entire user flow, ensuring that all external-facing functionality works together when integrated.
Here’s a sample code snippet that demonstrates how to test an entire API endpoint:
def test_get_server_version( flask_client: FlaskTestClient, token: str) -> None:
response = flask_client.get(
'/v3/server-version',
headers={'Authorization': f'Bearer {token}'},
)
assert response.json == {
'data': {
'server_version': server_version,
},
'api_version': '3.0',
'kind': 'NORMAL',
}
Utilizing a Pytest fixture, such as flask_client
, provides a convenient Flask test client to make API calls directly, ensuring comprehensive coverage of API functionality while maintaining an isolated test environment.
Key Takeaways for Improving the Quality of Product Features
At Fiddler, our mission is to deliver an AI Observability platform that standardizes LLMOps and MLOps processes for our customers. As engineers committed to excellence, we recognize that robust feature and capability development is only as successful as the testing strategies that support it. To empower our customers and ensure the reliability of our platform, we emphasize the following key insights:
- Prioritize Functional Tests: Functional tests should take precedence over heavily mocked unit tests to validate real-world functionality
- Be Wary of Mocked Unit Tests: Mocked unit tests can create a false sense of test coverage. Use them cautiously for core functionality
- Utilize Response Mocks for Integration Tests: Test interactions with third-party services by leveraging response mocks
- Reserve End-to-End Tests for Full System Validation: End-to-end tests should focus on validating customer interactions and overall system behavior
- Harness the Power of Pytest Fixtures: Pytest fixtures are a crucial tool for building scalable, reusable test setups that improve developer efficiency and reliability
Interested in defining the industry AI Observability platform? Join our team at Fiddler!